Growing popularity of messaging apps and conversational bots for Facebook Messenger, I’ve decided to do a little experiment and test my own chatbot, with minimum custom code. Here are the results.
Last year, Uber got integrated into Facebook Messenger, allowing people to order a ride while chit-chatting with their friends. The vision of even more services following this path is becoming increasingly realistic.
Recently, Chris Messina, developer experience lead at Uber, called 2016 the year of conversational commerce, and not without a reason. Facebook Messenger Platform is now in a beta version; Messenger itself boasts almost a billion users — and it’s only a fish in the sea of instant chat applications.
Given the popularity of instant messenger apps and the growing popularity of chatbots, the next step may as well be for conversational bots there to eventually replace many currently available solutions.
The potential possibilities for chatbot applications are endless:
- help desks,
- data gathering and retrieval,
- external service integration, etc.
Think of it as Google Search acting in a semi-human way, gathering data about your request through conversation and providing better answers in the end.
How about jumping on this trend and writing your own conversational bot?
Since I wanted to test if I could quickly make a very simple chatbot for entertainment with minimum custom code, I decided to try out three solutions that use Facebook Messenger. This meant I didn’t have to provide my own application for text conversations and could focus on comparing the functionalities of the platforms.
I have chosen three solutions for comparison to gain a deeper insight into how can this task be done. The first one is wit.ai and it is recommended by Facebook on their Messenger Platform home page. It is free and the only catch about it is that the creators can analyse your data. The second one is a commercial service with a free plan called api.ai. I have stumbled upon it by viewing their pizza bot example. The last one is Bot Framework that comes from the company behind the Tay bot, Microsoft.
I have found more services but those listed appeared to be the most promising.
Messenger bots — my experiment
Tim Berners-Lee, the creator of World Wide Web, was once asked a question about what he did not expect to become a reason to surf the Internet. His funny answer was kittens.
Being a cat liker myself, I decided to make a conversational bot that utilises the image-sending capabilities of Messenger to serve cat images. It was supposed to be a tiny program that I could implement on the aforementioned frameworks to learn the concepts they utilize.
I defined my bot by 5 functionalities:
- serve a random cat image,
- serve a random cat image from a category requested by user,
- list the available categories,
- react to “meow”,
- count the images served.
The images were obtained at random from the Cat API. Replying with static text should be super easy but I was not sure how the image sending process would work with the bots. For counting, I needed to hook it up in the right places in my little server code.
The solutions I have used all advertise as being Messenger-integrated, so I would think that having a way to define responses in the online interface (apart from Bot Framework) also meant that I could define a rich-text message containing more than just the text with the same ease.
But in the end, I had to do this in my code.
Only the Microsoft’s solution has a different way of handing responses that I will write about later on in this article.
Essentially, the program consisted of four elements:
- A developer platform for writing chatbots.
- The Facebook Messenger Platform allowing to interact with the Messenger users.
- Publicly visible server.
This meant that I needed keys and secrets for the bot and Facebook API. These came from my Facebook application which was hooked up to the Facebook page I also owned. There should be no problems with the setup because it is well documented in many places.
You may think that the end result would be something like Cleverbot or Tay, but too much free will is essentially not in the interest of the clients that would like to buy a conversational bot.
Consumer software should be predictable and polite and, well, human conversations sometimes are not.
This is probably why the Microsoft Tay bot gained so much attention by cursing, being rude, and impolite. Most probably you do not want your bot to use the 4chan user dictionary. IBM had the same problem when Watson indexed Urban Dictionary.
Getting to know the Messenger Platform
Before I started writing conversational bots, I coded a small echo program for Messenger Platform.
When it comes to the code, I had two functions:
- one to handle incoming messages and send responses
- and another for creating messages.
My response was created by using Facebook API and sending it as an HTTP request. I specified the recipient (same as the request sender) and the message (which is an object with a text field). My configuration consists of the port I was running my server on, a verification token, and a page access token obtained from Facebook application developer dashboard in the Messenger tab.
When I had all set up and running, my Facebook page messaging functionality responded with the same sentences I sent to it. At this point, you may think “I could write a chatbot myself using some text matching, regular expressions, parsing, NLP techniques”. Sure you could, but do you have the time to fight with the problem somebody else dedicated a huge part of their life to solve? If the “do not reinvent the wheel” approach holds for you here, keep reading.
Conversational bots up close
I tried to write my programs using documentation and examples that the service providers created, modifying them only when needed. You can grab the sources and see the original examples on my Github.
When I first tried wit.ai, I instanteously liked its well-designed user interface and intuitive naming conventions. There are six tabs on the dashboard.
Most data you will input belongs the Stories view. You can have many stories consisting of “what the user says”, your text response, the context consisting of entities whose values are derived from the user utterance, and actions interacting with the context. The word entity does not say much but you can think of getting its values as extracting data from the user input. The simplest storyline possible consists of a user message, your response and the default merge action which takes care of the entities used on your backend, doing nothing by default. When you modify your story, it takes some time to update the bot model; in my application it was quick enough for it not to bother me. Actions list your text responses and the functions for which you can write implementation. You can choose what entities should be in context for them to be run. In understanding, there is a list of your entities and their search strategy which can be either a trait (something inferred from the whole sentence, like sentiment), free-text (part of the user message), or a keyword (word from a list). You can read more about it in the documentation. The logs view shows all the things that happen during the conversations with your bot like user requests, fired actions and bot responses. Inbox is the place where you can tweak the incoming messages to be recognized better. Finally, in the settings you will find your application id and tokens.
I decided that I would attach an isImage entity with the text value true whenever I needed to send an image instead of a text message to Facebook. This was the first natural thing that I had in mind — define the response content on the web interface. (This could be done differently, for example by not setting the say method parameter as the image URL and defining an action that would do the image sending.) What I had to do on the server side was to read the isImage entity value in the merge method and store a boolean value in the context passed to other methods. I also wrote the counter action implementation. It took the session id, context, and callback function as the arguments, and I ran the callback at the end of your function. I also needed a modified function for sending messages that would send an image instead of text when I wanted. To send an image, the message object had to consist of an attachment object, which in turn had a type with a value image and a payload object with URL field linking to my image.
The first thing that I did not understand about it was that the story recognition worked way better if I marked an entity in the user message. Initially, I was not sure why the phrase recognition feature was bad; only selecting one key word in a sentence and creating an entity, even if I did not plan on using it elsewhere, would fix the problem. The built-in test chat stopped if I had an entity that was not derived from the sentence but just had a value assigned when the sentence is chosen. What I wanted to have at the time of writing the first conversational bot was to define the same response, entities and actions for different story texts (think of it as of synonyms) and it was not possible.
I think you could say that api.ai is the commercial equivalent of wit.ai in some ways. When I started using it, I noticed similar concept names. The chatbot is called the agent and it has intents, entities, logs, and domains. Of course, you can use more features by paying for the non-free ones, such as speech recognition, offline solutions, and the so-called fulfilments, which fetch data from external services and make it available to your bot.
After the initial look, the first difference in the interface in comparision to wit.ai I noticed was how everything is structured in lists whose items I could inspect in the other views. It certainly helps when I was building bigger agents. It also utilises the concept of entities which you extract from user messages.
Domains are like knowledge bases your agent can operate with: for example, it can be aware of simple mathematical expressions, small talk conversations, or external application interaction intents and help you gather data from the conversation. Entities contain list values, which in turn can have synonyms, which I think is great. Logs are simpler than the ones in wit.ai. They list requests and responses along with intents the message was categorized to.
Let’s look at the most important part of the web application — the intents view is quite similar to the one in the first tested solution. There is the user message, but in api.ai you can even attach many phrases to the same intent! There are also actions which take parameters with values and those are also based on your defined entities. What is new in actions is that you can define prompts that ask the user if the parameter is not found in the user input. Obviously, at the end, you can define a response, which might be selected from a list, too!
The context concept is kind of treated differently than in wit.ai. It consists of a list of identifiers that the intent takes as the input or the output. The inputs define the prerequisite context for the intent to be run and the outputs enable the other intents to fire when the expected context is present. As you are in the data-passing world now, I would like to mention that you can also test the intents on a fake chat and later easily view the contents of the JSON that will come to your server.
The server example uses express as well. It operates very similarly to the first one — wait for a Facebook message, send it to api.ai, send the response to Facebook. Actions only come as strings and parameters and you have to handle them with your own implementation. I sent the isImage string as a parameter which serves the same purpose as the context variable in wit.ai in my implementation. Then, of course, I was passing a message object to the function that sends messages consisting of a text field or image to Facebook.
The Microsoft’s solution for conversational bots is very different from the two I have already introduced.
Firstly, you need two different packs of software — the Bot Framework SDK for your language of choice and the LUIS application which is hosted on a different website. I did not like the web user interfaces in those solutions at all, but do not be discouraged. The Bot Framework website allows you to create a bot, setup your server endpoint, read secrets, change two bot settings, test the bot in a chat, enable integrations and, finally, there is the inconvenient review feature that you need to go through in order for your integrations to work.
In LUIS, on the other side, you have the reasoning engine where you can define the intents and entities. To my disappointment, entities can only consist of lists of 10 values. When I looked at the intents, I was shocked that I could only name them and add one action with a list of parameters which map to entities (and might be optional). Finally, you have the utterance training view which is, well, obligatory for your bot to recognize anything. Hopefully, there is a method to do this in batches that I did not found. It seems like this is all bad, but there is a bright side to it as well: the code.
It is the most concise among the solutions I have tested: it takes 5 times fewer text lines to implement. What does it do? It builds a bot object, either a text one or one utilizing the Bot Framework connector (which delegates our responses to integrations), hooks it up to LUIS, defines what happens when certain entities come from LUIS, and then starts either a terminal-based bot or an HTTP server consisting of one route. The framework even has a built-in method for adding attachments to the message! (I hope it really works with Messenger; waiting for a review for so long is a pain.) Everything in what? 54 lines of code? And somehow, from a programmer’s perspective, the separation of response actions from the web interface felt good.
I am really surprised how easily one can build a conversational bot for free.
Of course, you will need more code for it to do anything useful but the hardest part — the text analysis engine — is implemented by a third party. Dealing with a black box might be confusing, but it is better than not having it or having to write the engine yourself (and that takes money and time). It is great to see how many phrases are automatically recognized as the one you intended for the correct response.
I hope that the examples will become clearer over time as more developers try it out. The same goes for the rich-text messaging options.
Pros: free; good user interface; rich context handling; entity search strategies
Cons: text-only entities
Pros: lists for user utterances and bot responses; synonyms for entity children; extra paid features; limited context handling
Cons: text-only parameters
Bot Framework & LUIS.ai
Pros: concise code; good documentation
Cons: fixed-length entity children list; manual utterance training; never-ending review procedure — inability to test on Facebook; bad user interface
As always, I encourage you to test it out yourself, write some code, and evaluate. In the end, the best solution is the one that fits your requirements the best.
Now go and write something entertaining for your Facebook page.