• Blog
  • The rise of LLMs sparks the idea of real time translation

The rise of LLMs sparks the idea of real time translation

Last updated 20 November 2024
Technology

We already had the message, why not translate it.

Goal - a natural conversation between parties speaking different languages

What if the conversation between the end user and an agent in a customer service support system could flow naturally even when they had no knowledge of each other's languages? For years companies have had contact centers in other countries to pick up the conversations from virtual assistants when a human is needed. A high performing virtual agent offers the opportunity to engage with the end user in whichever language is needed, but human agents are restricted to the languages they know, and may need to default to English if they are located in other countries. This does not create the best user experience as the end user has to switch languages when they are escalated from the virtual agent to a human agent. Or the customer has to maintain teams of customer assistants in relevant languages to accommodate groups of end users speaking a plethora of languages. The hypothesis prompting this project was that with the new large language models (LLMs) it would be possible to get good quality translations of both end user and agent messages quickly enough for them to conduct a normal conversation even though they were speaking in different languages.

Challenges - Speed, quality and configurability

The project presented a lot of opportunities to use existing infrastructure and knowledge to build a simple solution, yet there were some major challenges. Speed being the first one. Were the translations offered by the LLMs quick enough to provide naturally flowing conversations? LLMs are not necessarily known for their speed and other simpler tools - for example Google Translate - are generally thought to be much faster. However, messages sent between end user and customer service agents are usually quite short messages, and what constitutes a long time in a programmer's world might not be detrimental to a normal conversation.

The second challenge was quality control. By just creating some test conversations we could get an idea where the land lay, but how would our customers be able to ensure the quality of the conversations in a production environment with actual end user conversations? Even if the translations were only one way - that is, just translating messages from the end user to the agent - the agents were not going to be able to understand the original message from the end user. After all, the problem driving this feature was that the agent could not understand the language the end user was talking. It is also quite difficult to quality control what a good translation is, certainly for a developer not versed in these matters.

A third challenge was structure and to balance the future needs of a fleet of customers with what is sensible to implement. As a developer it's easy to get excited by all the functionality that can be configured and tweaked, and be tempted to offer all of this in the first beta version. However, configuring this, already being duplex communication in real time, quickly becomes overwhelming for a mere human. What really stands out in this project is the duplicity of everything. It seems like the functionality should just be duplicated but the functionality is not actually the same in both directions. There are differences in the details between the translation of the end user message and presentation and the agent message and translation and this complicates and confuses greatly. How to handle this and make it consumable was something that also had to be considered from the start of the project.

Requirements - Data security and cost

The need to inform the end user (and agent) about the functionality and the transfer of end user (and agent) data is of course fundamental to this feature, just like for every feature that involves AI and processing of end user data. The expectations are also quite different in different directions. It's not as likely that an agent will be upset that their advice will be translated as long as they are made aware of this when they sign up or even at the start of their workday. However, the possibility that the end user will get frightened or upset when suddenly being presented with the possibility of their data being sent off to the unknown is significant. This leads to a structural change between the two directions of communication, and even though it is not complex in itself to inform the user, it adds a layer of complexity to the feature.

We all want to live in a world where AI is assisting and not damaging us. Companies using this technology also need to think about the cost of the feature. Therefore several measures had to be taken to allow translation to be flexible and monitored. The need to be able to use different models, and prevent certain messages from being translated at all arose quickly, again increasing the complexity of the task.

Implementation - What we did:

The project was implemented with a team consisting of a developer, a product manager and a UX researcher, supported by software testers and project sponsors and advisors. The implementation was split into phases to allow testing at certain intervals.

First phase - Quality and speed:

The first phase was to compare LLMs to well known conventional tools for translation, that have been around for a while. In our case we choose Google Translate. This allowed us to estimate the speed of translation and make quality comparisons between tools. The development in this phase included implementing API calls from our backend to the different tools and presenting it to technical testers in an efficient way. It was not very complicated technically but already presented us with a lot of choices on how to present disclaimers and considerations around customer and end-user needs. The testers and researchers then performed detailed analysis of the results of many conversations in many languages (luckily we have a diverse workforce from many countries speaking many languages) and presented the results with analysis. As initial developer testing had suggested the results from the LLMs vastly beat the results from conventional tools. The quality was simply a lot better especially for complicated or unclear messages. The speed of conventional tools (Google Translate) was higher but the speed of LLM responses was still high enough to cover the needs in the case of messages in a conversation - at least in our testing rounds. It was therefore decided that we could continue the project in phase two only looking at LLMs. Conventional tools just were not up for the task.

Second phase - Configurability and weaknesses:

In phase two we looked at what part of the LLMs we wanted to compare and more generally what sort of challenges LLMs handled well and what was not handled as well. The LLMs offered today can be configured in multiple ways and some have little impact on the task at hand, others might have more. It was important to establish which had an impact and which we needed to disregard to lower the complexity of the project. LLM parameters such as prompt, temperature and top_p were considered. Our researchers found that temperature and top_p was not important and could be disregarded for now. The choice of model and prompt was deemed important. Leaving it up to the customer to choose a specific model for translation allows flexibility, cost savings and the opportunity to change strategies over time as more data becomes available. There were no major pitfalls found with the fundamental guardrails implemented already in our system. However, larger scale testing will be necessary to confirm this. After the second phase we allowed certain customers to play around with the feature and provide feedback before it was generally released.

Third phase - what does the future hold:

The initial results of using real time translations are very promising, including quality, latency and usability. More needs to be done on UX and GDPR and a lot more can be done in extending the solution itself and supporting it with additional features such as conversation summaries, tools for cross team collaboration, e.g. different groups of customer agents working in different languages, testing of new models and different use cases for different models. More work is also done on keeping product, model and brand names out of the translation and looking at whether we can translate to languages not currently supported by our VA solution. It's also possible to give the end user a lot of influence over what is going on, but is that something our customers want to give them? As with other AI related projects there is a lot more that can be done, we are only just starting.

Conclusion:

LLMs open up for functionality that previously could only be dreamed of, such as the opportunity for people to engage in natural conversation using different languages. To a developer the functionality is now easily accessible and the role of a developer in a project like this is creating the bridge between the LLMs APIs and the product users. Only by doing extensive testing and carefully and slowly introducing the functionality into production, can the quality of these new tools be controlled and the right guardrails be put in place. Any feature where AI and LLMs in particular plays a part needs to consider ethical consequences of the use of the technology. The developer might not have much of a say in whether a feature is created, leaving most of the responsibility to the company creating the feature. However, the developer is the first person to test and perhaps realize some of the pitfalls or concerns the feature presents. These concerns then need to be brought to the attention of the product team immediately and the developer needs to attempt to mitigate them at an early stage. To real time translation of messages some concerns will be irrelevant as it will not flood the web with synthetic data, nor will it -with a very low room for creativity from the model (low temperature) - create harmful or biased data. The concerns raised have mainly been the need for privacy and control of end user data and the loss of jobs for existing customer service personnel. Privacy needs to be maintained in this case with strong security measures in place, warnings to the end user and agent that data is shared and the opportunity to opt out of the translation if the end user is not comfortable sharing. In regards to customer agent jobs the end goal would be that the customer agents would be divided into groups according to what their skills and interests are and not what language they are proficient in. Whether this is the end result remains to be seen.