Hi, this is ITOH (takahi_i) from ATL.

I talked about Ichiro PANDA in the previous post. In this post, I will provide you more details on how a line account “Ichiro PANDA” generates a reply (dialogue).

Features of Ichiro PANDA the line account

The main features of the line account Ichiro PANDA are to be talked to (to receive a message) and to answer it (to reply the message). In the following example, Ichiro PANDA is asked what the weather is like, and it replies with the weather information.

写真 3

Perspective and components

The dialogue part of Ichiro PANDA has broadly two components:

  • Message processing component
  • Dialogue sentence generation component
  • A message processing component processes a message (input sentence) to Ichiro PANDA typed by a user. Based on the information that a message processing component extracted, a dialogue sentence generation component generates a replying message (a sentence). Next, I will cover the processing by each component.

    Message processing component

    A message processing component extracts the information from a message (input sentence) typed by a user. I adopted a tool called Apache UIMA for extraction of the information.

    A user can define the extracting method of processing and the information for non-structured data (document) using Apache UIMA. As the characteristic of Apache UIMA there is a point that the processing for the input document is defined as a pipeline. “The processing is defined as a pipeline” means that it is expressed as a series of stages independent from input processing (a component performing uniprocessing).

    One of the reasons why I adopted UIMA is that we can develop the processing for the non-structured data called a message (natural language sentence) separately at every stage of the pipeline. If you do the input processing by one big program without using UIMA, the program grows big by all means and influences future conservatism and extensibility.

    Because I separated processing this time as a stage of UIMA, I was able to build a system as a program set of small loose couplings. As a result, it makes easier to update easily each stage in the future.

    For example, it becomes the following pipelines in Ichiro PANDA.

    linepanda.001

    As you see in the above figure, Ichiro PANDA performs the following processing.

  • Understanding of intention
  • Extracting of hours
  • Extracting of places
  • Extracting of numerical value expressions
  • Each current component is implemented by a simple pattern, and it is relatively easy to update it because each component becomes loose couplings by UIMA.

    The information extracted at the message processing component and the message from a user are handed out to the 2nd component, a dialogue sentence generation component.

    Dialogue sentence generation component

    A dialogue sentence generation component returns an answer based on the information extracted with a message and the message processing component from a user.

    A dialogue sentence generation component decides what kind of talks are carried out based on the input message and extracted information.

  • Asking time
  • Asking weather
  • Searching a part time job
  • Asking the schedule of a part time job
  • There is a generation rule of the answer sentence for each decided intention pattern and based on the rules the generation of the answer sentence is carried out.

    The hard point to generate the answer reply is when it does not understand the intention of the user. I will cover the processing when it did not understand the input intention of the user as follows.

    In the case of not understanding intention

    A dialogue sentence generation component generates an answer sentence by intention, but there are some cases that it does not understand user’s intention and that the rules to generate the answer sentences are not defined to the intention.

    In these cases, it has to generate some replies. If it answers like “I do not understand what you are saying.” every time and if it answers at random, it cannot be a natural account. We handle this problem with easy heuristic.

    To be more specific, we prepare sentences that we think users will type and the pair sets of their replies. Then, it replies when there is a degree of resemblance of the patterns of assumed input by a user.

    Point where I did not talk this time

    It can make a reply to the input from a user with the method I mentioned this time but more complicated processing really exists. For example, when a user is looking for a part time job, it needs to ask him/her again because it needs the information about what areas he/she is looking for a job in; the current version of Ichiro PANDA is partially implemented with such a system requirement for “asking again.”