Why don’t you talk with Google Glass?

Good to see you, this is YOSHIMURA (@alterakey) researching wearable devices at ATL.

As we say a wearable device, there are many kinds of devices. Among them Android typed devices such as Google Glass and Android Wear (LG G Watch, SAMSUNG Gear Live, Motorola Moto 360, etc.) seem to have an edge. Since I had a chance to experience Google Glass among the devices, I made a Glassware.

Google GLASS

As you have already known Google Glass, it is a glass-shaped AR device, which is presented as a Project Glass at Google I/O 2012. This is a lightweight device driven by Android 4.4 with clear display, gyro, touchpad, camera, proximity sensor, microphone, bone conduction transducer, Wi-Fi, Bluetooth Smart, and CPU/GPU/RAM equivalent as one in the Nexus S. The device is actually an Android but it does not have the touching UX as one in a normal mobile phone. It has the UX of the “Now this time” information support which gives feedback to users by starting the application (Glassware) based on voice commandｓ and by inserting eventｓ (Card) into timeline.

chatterbot

By the way, do you know a chatterbot? A chatterbot was popular before, and it is a comparatively simple AI that can answer when we talk with natural language. Actually, there is some AI that we cannot call it as AI. Although I say “Talk”, since there was no sufficient voice recognition or voice composition at that time, the interface was to return the answer in text when we typed a question or something with keyboard. This time I tried to make it with Android voice recognition or voice composition.

There are many kinds of chatterbot, and one of the famous bots is ELIZA written by Professor Joseph Weizenbaum in MIT from 1964-1966. Eliza has an easy logic to add and output comparatively simple conversion to input sentences.　When the professor, who was eager to develop artificial intelligence at that time, had some experiences with scripts (DOCTOR) like a clinical psychologist, too many students were hooked by this against his expectations. So, he was shocked and then took a critical side to the progress of the artificial intelligence development after that.

Architecture

Port ELIZA to Android, and activate it by voice recognition and composition on Glass. For voice recognition, it needs consecutive voice samplings and streaming to cooperate with Google Server recognition, so it consumes a high proportion of electric power and network bandwidth. For this reason, it is a theory to activate the voice recognition feature on and off when it is needed, but this time we are going to drive it continuously and compose ELIZA’s reply while users are replying. Therefore, we take no account of consumption of electric power and network bandwidth.

Porting ELIZA

Let’s start with porting ELIZA. As you see in Wikipedia, ELIZA is implemented by many kinds of languages. This time we are going to use Perl Chatbot::Eliza as an original implementation, and we will port it to Java. Write a dummy implementation, going on the assumption that we use Android TextUtils because we use Join. The full source code is kind of long, so you can download it from the repository. Basically port it one by one. Just for reference, this is how the dummy implementation of TextUtils looks like.

pre class=”lang:java decode:true ” >package com.gmail.altakey.eliza;
…
public class TextUtils {
public static String join(final String glue, final List iter) {
final StringBuilder sb = new StringBuilder();
if (iter.size() > 0) {
sb.append(iter.get(0));
if (iter.size() &> 1) {
for (int i=1; i<iter.size(); ++i) {
sb.append(glue);
sb.append(iter.get(i));
}
}
}
return sb.toString();
}
public static String join(final String glue, final String[] iter) {
return join(glue, Arrays.asList(iter));
}
}

Demonstrate it with textbase. This is the rough how-to build because we do not write even build file.

taky@florence:src $ find . -name '*.java' | xargs javac && java com.gmail.altakey.eliza.Eliza
Eliza:  Please tell me what's been bothering you.
you:    What? Are you a computer?
Eliza:  Don't you think computers can help people ?
you:

taky@florence:src $ find . -name '*.java' | xargs javac && java com.gmail.altakey.eliza.Eliza

Eliza: Please tell me what's been bothering you.

you: What? Are you a computer?

Eliza: Don't you think computers can help people ?

you:

Glassware

As I explained in Architecture, activate ELIZA as asynchronous as possible.

In addition, to separate the voice related processing and AI related processing, use two services of MainService and PersonalityService, and bind them by Local Broadcast. After MinService runs through composition of LiveCard and voice-related formatting, it activates PersonalityService and enter recognition loop (recognition -> message). On the other hand, PersonalityService formats ELIZA after activation, and just conveys the message and replies to it. This also has a long source code, please refer to the repository but I will briefly explain the voice recognition and composition because they are very important.

Voice recognition and composition

Use English for Composition related processing, and point it to the bone conduction transducer of Glass so that we can use it without hooking up a headphone. Cancel the last sound and take priority over the last one when several sounds overlap. Close Glassware when ELIZA makes an end greeting by the user’s command.

private class Talker implements TextToSpeech.OnInitListener, TextToSpeech.OnUtteranceCompletedListener {
        private final String FAREWELL = "final_";
        private TextToSpeech mmTTS;
...
        public void init() {
            if (mmTTS == null) {
                mmTTS = new TextToSpeech(MainService.this, this);
                mmTTS.setLanguage(Locale.US);
            }
        }
        public void shutdown() {
            if (mmTTS != null) {
                mmTTS.shutdown();
                mmTTS = null;
            }
        }
        public void talk(final String msg, boolean final_) {
            final HashMap<string, String> params = new HashMap<>();
            params.put(TextToSpeech.Engine.KEY_PARAM_STREAM, "STREAM_NOTIFICATION");
            if (final_) {
                params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, FAREWELL);
            }
            mmTTS.speak(msg, TextToSpeech.QUEUE_FLUSH, params);
        }
...
        @Override
        public void onUtteranceCompleted(String utteranceId) {
            if (FAREWELL.equals(utteranceId)) {
                stopSelf();
            }
        }
    }
}

private class Talker implements TextToSpeech.OnInitListener, TextToSpeech.OnUtteranceCompletedListener {

private final String FAREWELL = "final_";

private TextToSpeech mmTTS;

...

public void init() {

if (mmTTS == null) {

mmTTS = new TextToSpeech(MainService.this, this);

mmTTS.setLanguage(Locale.US);

}

public void shutdown() {

if (mmTTS != null) {

mmTTS.shutdown();

mmTTS = null;

}

public void talk(final String msg, boolean final_) {

final HashMap<string, String> params = new HashMap<>();

params.put(TextToSpeech.Engine.KEY_PARAM_STREAM, "STREAM_NOTIFICATION");

if (final_) {

params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, FAREWELL);

}

mmTTS.speak(msg, TextToSpeech.QUEUE_FLUSH, params);

}

...

@Override

public void onUtteranceCompleted(String utteranceId) {

if (FAREWELL.equals(utteranceId)) {

stopSelf();

}

For recognition related processing, it also formats ELIZA in English free writing mode, and it replies to ELIZA at the recognition closing. It responds “…” in inability of recognition as far as the cutting context goes. So, when activating Glassware without emitting any word, it keeps telling that it does not understand what you are saying and to say it one more time.

private class Talker implements TextToSpeech.OnInitListener, TextToSpeech.OnUtteranceCompletedListener {
        private final String FAREWELL = "final_";
        private TextToSpeech mmTTS;
...
        public void init() {
            if (mmTTS == null) {
                mmTTS = new TextToSpeech(MainService.this, this);
                mmTTS.setLanguage(Locale.US);
            }
        }
        public void shutdown() {
            if (mmTTS != null) {
                mmTTS.shutdown();
                mmTTS = null;
            }
        }
        public void talk(final String msg, boolean final_) {
            final HashMap<string, String> params = new HashMap<>();
            params.put(TextToSpeech.Engine.KEY_PARAM_STREAM, "STREAM_NOTIFICATION");
            if (final_) {
                params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, FAREWELL);
            }
            mmTTS.speak(msg, TextToSpeech.QUEUE_FLUSH, params);
        }
...
        @Override
        public void onUtteranceCompleted(String utteranceId) {
            if (FAREWELL.equals(utteranceId)) {
                stopSelf();
            }
        }
    }

private class Talker implements TextToSpeech.OnInitListener, TextToSpeech.OnUtteranceCompletedListener {

private final String FAREWELL = "final_";

private TextToSpeech mmTTS;

...

public void init() {

if (mmTTS == null) {

mmTTS = new TextToSpeech(MainService.this, this);

mmTTS.setLanguage(Locale.US);

}

public void shutdown() {

if (mmTTS != null) {

mmTTS.shutdown();

mmTTS = null;

}

public void talk(final String msg, boolean final_) {

final HashMap<string, String> params = new HashMap<>();

params.put(TextToSpeech.Engine.KEY_PARAM_STREAM, "STREAM_NOTIFICATION");

if (final_) {

params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, FAREWELL);

}

mmTTS.speak(msg, TextToSpeech.QUEUE_FLUSH, params);

}

...

@Override

public void onUtteranceCompleted(String utteranceId) {

if (FAREWELL.equals(utteranceId)) {

stopSelf();

}

At last, format and close it in response to service activation and closing.

public class MainService extends Service
{
    ...
    private Talker mTalker;
    private SpeechListener mListener;
    ....
    @Override
    public int onStartCommand(Intent intent, int flags, int startId) {
        ...
        if (mTalker == null) {
            mTalker = new Talker();
            mTalker.init();
        }
        if (mListener == null) {
            mListener = new SpeechListener();
            mListener.init();
        }
...
    }
    @Override
    public void onDestroy() {
...
        if (mListener != null) {
            mListener.shutdown();
            mListener = null;
        }
        if (mTalker != null) {
            mTalker.shutdown();
            mTalker = null;
        }
...
    }
...

public class MainService extends Service

{

...

private Talker mTalker;

private SpeechListener mListener;

....

@Override

public int onStartCommand(Intent intent, int flags, int startId) {

...

if (mTalker == null) {

mTalker = new Talker();

mTalker.init();

}

if (mListener == null) {

mListener = new SpeechListener();

mListener.init();

}

...

}

@Override

public void onDestroy() {

...

if (mListener != null) {

mListener.shutdown();

mListener = null;

}

if (mTalker != null) {

mTalker.shutdown();

mTalker = null;

}

...

}

...

Summary

This is all you need to do and it is a quasi-thing, but I am very fascinated with chatterbot designed with original means of communication by controlling it with voice. I think ELIZA can be put into practical use as a personal therapist in the English-speaking world by adding a new feature of pausing and nodding because ELIZA makes a response like a clinical psychotherapist who does mainly client-centered counseling, although ELIZA used here is very simple. Since ELIZA was used to hooking MIT students on with text chatting, don’t you think there is another potential use for ELIZA to connect people on with voice chatting?

It is also possible enough to make ELIZA a personal assistant although it can only make dialogue. While Google Glass only can perform a search in Google by “ok glass, google…,” I think it very interesting to research and save unknown words by combing this with some DB and search API. As such bases accumulate, it will be possible to speak to ELIZA.

We have been doing this in English, but how about in Japanese? While Android basically supports Japanese voice recognition, Google Glass does not support it as it is. In Text-To-Speech (TTS), since even Android does not support Japanese in default TTS, it needs to add Japanese TTS engine such as N2TTS, but Google Glass does not support it, either. Besides that, because ELIZA is a chatterbot designed mainly for English sentences, it is assumed that it needs lots of remodeling to be able to recognize Japanese sentences.

Like these examples, I feel that there are lots of challenges for translating into Japanese, but I want to try this in the future.

Finally, all source codes used this time can be obtained in the repository, so please try it once if you have Google Glass.