Machine Learning Expertise at Sibedge

ML (ENG).jpg

​​Sibedge engineers have good experience around various technologies. One of these is machine learning. Recently we developed an AI-powered platform for virtual assistants. In this post, we will cover the platform creation process and its benefits that make it possible for millions of people to navigate the data ocean.

Development and technologies

The story begins in 2019 where a small team started developing a system for quick internal documentation search. At some point, we realized that virtual assistants are the most useful for data collection. The first version of our system was rule-based. If the search query contained a word that fully or partially matched what the user was looking for, the system returned an answer. If the word did not match, the search yielded no results.

At that time the system was inconvenient and inefficient so we turned to machine learning to advance the development. Instead of complex search requests, the user could query in natural language. As a leverage to create such a system, we chose NLU (Natural Language Understanding) and NLP (Natural Language Processing) technologies for better natural-language understanding and interpretation.

While getting on with our work, we encountered our first barriers and issues. It turned out that in certain languages it is quite challenging to build semantic models. When the machine tries to understand the request, it converts it into vectors, passes it through the neural network and makes an assumption about what the person meant. It was important to choose the right learning model, fine-tune it and train the neural network in the right semantic circle. A poorly trained system will produce incorrect answers.

We chose two neural network training models for our system. The first one is called ELMO. To achieve natural interaction with people, the model was trained on the dialogues from Twitter. ELMO understands slang and phrases very well. It can be taught something new in a matter of minutes using a minimum of data since the work is carried out only with the upper layers of the neural network. The disadvantages include the fact that ELMO can get confused in some situations due to redundant information or semantic similarity of different intentions.

The second model is BERT. It was trained on the Wikipedia articles. BERT copes well with formal queries and its results are highly search relevant. The limitation is that BERT is trained on a large amount of data and the process is time-taking. BERT and ELMO are mutually exclusive models, depending on the task, we use either one or the other.

As a result, instead of a virtual assistant for internal use, we developed a constructor. We registered intellectual property for the platform and received several patents. Our solution helps create smart virtual assistants, configure and train them to work in various subject areas: medicine, education, energy, law, and many others. Thanks to a flexible API, our product can be integrated into various applications such as instant messengers, web widgets, and websites. The platform development process took eight months.

Platform Benefits

Market research has shown that most of the existing virtual assistants comply with prudential rules. They rely on trigger words or tags which greatly limits their functionality. Our platform allows you to create intelligent assistants that determine user intentions by semantic features.

Another advantage of our solution is the ease of scenario setup. Even a novice user can create their own virtual assistant, write a basic script for it, and start testing the question-answer system.

The BERT model allows our system to determine user intent with 90% accuracy which is above the market average. Sibedge virtual assistants are easy to customize: we select the optimal training model, integrate the system with corporate messengers and CRM systems. Customization is the influence factor for the customers when they choose a platform.

The Sibedge platform also stands out for offering privacy and information security. The virtual assistants our clients create run on their own servers. Personal data of users are not transferred to third parties and are not accumulated in other people's cloud storages.

Practical situation

A virtual assistant created on our platform was integrated into the web portal of a government agency. Initially, the client planned to use a rule-based assistant. But they quickly realized that it would not solve the problem. The development of a virtual assistant was entrusted to Sibedge.

The client provided us with a database of 15 000 most frequently asked questions from users about labor law. Our data analysts marked up the data and entered it into the semantic core of the system. They chose to work with the BERT model. The first stage of neural network training, excluding the time for the data markup, took 24 hours.

Sibedge DevOps engineers integrated the product into the client's infrastructure using the classic CI/CD methodology. We deployed the core on the customer's servers according to the set hardware requirements. Deployment, subsequent configuration, testing and launch took about a month.

After the system was launched, the administrators of the web portal trained it further. They monitored and processed user requests as well as filled the intents with examples. This improved classification accuracy and kept the knowledge base up-to-date.

Previously, web portal employees spent 33% of their working time answering typical user questions. At the same time, 32% of questions remained unanswered. Now that the virtual assistant classifies routine requests automatically, the percentage of unprocessed requests has dropped to 4%. On average, the neural network processes 4000+ requests per day. In 2021, the virtual assistant helped 1.5 million people find the information they needed.

The customer was satisfied with the outcome. The virtual assistant made it possible to almost completely unload the first line of support. Operators no longer have to study all the labor laws, instead they can switch to other important tasks. The system transfers users to the second line of support only in non-standard situations.

Conclusion

Many do not yet fully understand the full benefits of machine learning technology. Clients tend to choose simpler and cheaper rule-based solutions. But the future lies with the use of intelligent systems as they demonstrate higher accuracy and efficiency in communicating with real people. Our platform is a clear example that such products can be affordable, easy to set up and easy to learn.