[Music]
Ìý
Rebecca Parsons: Hello, everyone. My name is Rebecca Parsons. I'm one of your co-hosts for the ºÚÁÏÃÅ Technology Podcast, and I'm joined today by my colleague, Ken.
Ìý
Ken Mugrage: Hi, I'm Ken Mugrage. I'm one of the other regular hosts of the podcast.
Ìý
Rebecca: We're joined by two of our colleagues who have been involved in a fascinating project that we'll be talking about. The first one is Vinod.
Ìý
Vinod Sankaranarayanan: Hi. Good to be here. I'm Vinod. I lead our Digital Public Goods and Infrastructure work out of ºÚÁÏÃÅ, India.
Ìý
Rebecca: And Prathamesh.
Ìý
Prathamesh Kalamkar: Hi, I am Prathamesh Kalamkar. I'm a Lead Data Scientist at ºÚÁÏÃÅ India. Glad to be here. Thank you.
Ìý
Rebecca: The project that we want to talk about is an umbrella project called Jugalbandi. First, I want to congratulate you all on the fact that this has been accepted into the Digital Public Goods Alliance. That's something that we are quite excited about. Can you tell us what Jugalbandi is about, where the name came from, and how this piece of work has come about?
Ìý
Prathamesh: The name is very interesting. The name originates from, actually, Indian classical music. Indian classical music, we have this concept that two artists are playing together and they're lead artists, not one on the percussion, but there are two artists, and then they're playing some thoughts. That's what you call Khayal Gayaki. If you take that concept that if you're two artists are talking to each other, it's similarly in Jugalbandi, we have multiple AI systems talking to each other. You have a lot of speech models coming in from speech-to-text translation and from text-to-speech again. Again, you have elements on the other side and you have multiple search engines.
Ìý
There are multiple AI components that are talking to each other. That's the real meaning of Jugalbandi — that there are multiple AI systems talking together, creating a beautiful symphony out of it. That's where the name came in from. When this project started, this was initially conceptualized to be an Indian government welfare schemes chatbot. That means that ordinary citizens can just talk to this bot in their own native language and can get their responses back so that even illiterate people can get access to this information. That's the central theme behind Jugalbandi. It's a suite of solutions, which breaks up one important thing, that is breaking barriers to information access, and that it does leveraging state of art AI systems. That's where the name and the concept comes from.
Ìý
Rebecca: Excellent. Let's take those pieces and talk about them separately. Tell me a little bit more about this speech-to-text translation, because I understand that it's one of the very few systems that exist supporting many of the non-English languages on the Indian subcontinent.
Ìý
Prathamesh: This whole project started actually at ºÚÁÏÃÅ itself. We started with this project called Vakrangee, where we tried to develop models in an initial few Indian languages, which can do speech-to-text conversion, and then doing a translation from one Indian language to multiple others. After that, the Ministry of Electronics and IT of the government of India came over and then took over that project and then made it into an offering, which is called Bhashini.
Ìý
They are hosting some of these models that we are currently consuming, which can do lots of these things, starting from speech to text, text to speech, translation. They are hosting these models and making it available in the form of APIs, which we are consuming. Bhashini is currently hosted at the government of India, and we are the main users or advocating users of Bhashini through a lot of building these applications.
Ìý
Vinod: Prathamesh, the advent of Vakrangee itself is actually a fascinating story. When we started Vakrangee, we did not know that in two and a half years we will have 22 languages with the level of precision that can rival anything across the world. When we started, the going standard was for someone to get a speech-to-text with a level of precision that is acceptable, it would take one and a half years.
Ìý
Our team was able to bring that down to one week. Now, our team can actually create a new speech-to-text for any language within one week, given, of course, the data is made available. There were so many breakthroughs that happened through that journey. That's probably for another discussion, but that itself is a fascinating story. Vakrangee also is a Digital Public Good.
Ìý
Rebecca: What percentage of the Indian population do you think is currently covered now by the 22 languages that you do support?
Ìý
Vinod: The 22 languages are definitely the major languages that all of India uses. Of course, India has about 800 dialects, but these 22 languages I would think they should be covering more than 80% of the Indian population today. I can put it the other way. I would say most of the people in India would know at least one of these 22 languages.
Ìý
Ken: If I may, a question. Can you give me just, like, a person in India wants to discover a government scheme: what does the interaction look like? Describe to folks what they're actually doing with the government chatbot.
Ìý
Prathamesh: Earlier, it was mainly happening through an offline medium of communications, that is through printed media or maybe something in the newspaper, on television, or through the service center, that isn’t a very effective way of propagating this information. Again, that had issues in terms of sometimes they used to get inadequate information, they couldn't have ability to say that, "Okay, this is fine, but how would I make sure that, in my case, whether I'm eligible or for my need, how do I go and make sure that this information is actually relevant to me?" There used to be even these intermediaries who used to charge money just to give this information.
Ìý
There is a recent news article that was published, and the title said that here is a chatbot that doesn't take money to give you the right information. It's that value proposition that you actually have a chatbot that you can talk to in your own language, describe your need in your own language, because if you go to a government office or a service center, they'll ask you very technical questions and people get scared. "I don't understand what is this, even. I don't understand these terminologies, so how do I get the right information?"
Ìý
We try to break that using-- you ask your question in your natural language, the way you use it in daily life, and then it'll do all the translation at the backend. It will search for the right keywords and come up with the right kind of schemes that are available and even explain to you some of these keywords that people struggle to understand. If some people understand it, then it can use those contexts. For certain people who don't understand those terms at all, it can even simplify: ‘okay, this is the meaning of it.’ As you keep on asking it more and more clarification, it give you that and then bring it back to the original flow that they're supposed to have. It's that way that you can talk to it and in your own language.
Ìý
Ken: If I may, just to be explicit for listeners that aren't maybe aware of Jugalbandi, it's doing this in audio, it's speaking to the person. It's not defining it on their phone in a literacy they can't read it anyway.
Ìý
Prathamesh: Correct. It's actually a WhatsApp chatbot. One of the interesting things in India is that the penetration of WhatsApp is surprisingly high, right? Even if you go to the villages, with the advent of smartphones and as well as the connectivity in 4G and 5G networks, you'll be surprised to see that people are using WhatsApp as primary mode of communication. The simplicity-- they're used to it because if you want to talk to someone, I will just go on to WhatsApp, record a voice note and send it to you. It's so much easier than just typing in there and doing something in English or maybe in their own language because typing is difficult. They're very much used to conversing in this format of sending voice notes. We thought that the most effective way of reaching the bottom of the pyramid is through this channel.
Ìý
Rebecca: Excellent. Can you tell me a little bit more about that backend piece. We've talked so far mostly about the speech-to-text and text-to-speech and such. You alluded to, yes, we find the keyword and we look things up. Can you tell me a little bit more about how that backend works for this government scheme and in terms of getting the right information to them and following on that flow that you were talking about?
Ìý
Prathamesh: This whole flow is orchestrated using something that we call a finite state machine, that means we track the user state using multiple finite states. As the conversation moves on, the user conversation moves from one state to another. These transitions from one state to another is where we are actually using OpenAI large language models. That is one thing. The other one is also for doing the right search, that means given the information that you're looking for, how do you make sure that you're searching for the right schemes.
Ìý
For there, what we had done is we scraped the information from the Indian government website, and they keep on updating this website on a regular basis. We scraped that information, we take that, we pass them through something which is known as embedding models, and then store those vectors. When the user query comes, we try to do a semantic search, that means it's not a keyword based search, it's a meaning based because many a times the words that are used maybe in a government website versus what people are using can be very, very different, but semantically, they mean the same thing.
Ìý
It's important to do this semantic search. Nowadays embedding models are getting better at embedding the meanings or encoding the meaning. We do that search and then fetch the right information that, okay, based on your thing, this looks like, okay, we believe that these schemes look relevant to you, or even ask a probing question that it looks like your search is too wide — maybe can you narrow down that, what kind of information you're looking for. It does that clarification. And then it gives you the user, we believe that these are maybe five most interesting government welfare schemes as per your need.
Ìý
Gives us a one line summary of each one of them, and then tells you, which one you want to know more about. That's when the state transition happened, that from a searching you're moving now to the detailing, that you're going into specifics of one particular welfare scheme. Then that's where this transition happens in the finite state machine. It gets more details about, now what is your eligibility criteria? What benefits does it provide? What documents are needed, et cetera. There's that back and forth conversation and then gives back replies in the native language. This is how the flow is structured.
Ìý
Rebecca: We hear a lot about large language models and hallucinations. Obviously, given your target audience, the level of frustration that might come from, "Wait a minute. Why are you telling me this bizarre thing?" How have you handled that hallucination problem?
Ìý
Prathamesh: Basically, to answer your question, hallucinations is definitely a very important problem and the most practical hindrance in terms of adoption of LLMs. Literature has shown that using retriever augmented generation is an effective way to control hallucinations. What we do is we tell LLM that, hey, there are two sources for knowledge for an LLM. One is the parametric knowledge. That means whatever it has learned from internet during the pre-training plan. That's one knowledge source, and the other knowledge source is what you give it as an information in the input prompt.
Ìý
People think that prompts are only supposed to be meant for asking questions. That's not correct. Actually, you can also give it a lot of information into that which fits in that window, and then ask that using this particular information, can you answer me the question? That's the crux of RAG. Post-RAG, what we have done is we have created six layers in which each of these concepts are formalized. It starts from UI layer, then speech handling layer, then how do you plan for what information is needed to answer a query, actually getting that information, putting it into the LLM and getting the answer back.
Ìý
There are these six layers that we have formalized in post-RAG architecture. Essentially, how do you control hallucinations is two, three things. First is using RAGs, and second one is using write prompt engineering, saying that you should draw the knowledge boundary very clearly, because for most of the practical applications, you would want LLMs to focus on certain types of problem. You don't want answer for anything under the sun.
Ìý
Let's say you are working in retail and you see your-- just your domain, if you are working in, for example, in fashion, where it just about what's the latest trends, et cetera. Every like chatbot or customer facing application will have that boundary. It's very important to draw that boundary using prompt engineering so that if anything outside that comes, you should politely say that, "Hey, I don't know this instead of trying to answer that." Those are the two, three things that effectively reduce hallucinations.
Ìý
Rebecca: Excellent.
Ìý
Vinod: If I may, Rebecca, it's also interesting in terms of the risks that you perceive with regards to hallucination and the use cases that you start adopting. While Jugalbandi for schemes and benefits is one, we also got a request from one of the law enforcement personnel of a state to adopt this technology for helping women and girls who are undergoing harassment. The risk associated of hallucination on the second is much, much larger.
Ìý
If you give the wrong answer about a government scheme, they go check, they got the wrong scheme, they come back. It's perhaps a few hours wasted. In the second case, the case of harassment, if you give the wrong answer, it could be a matter of life and death. We are navigating through those aspects as well in terms of what is acceptable in this type of use case and what is not acceptable in a different type of use case.
Ìý
Rebecca: That actually leads nicely to my next question, which is what are some of the other applications other than the government schemes application that you have already implemented or are looking to implement under this Jugalbandi architecture?
Ìý
Prathamesh: We started off with this government scheme chatbot, but we soon realized that it has potential to be used elsewhere. One of the things that we felt immediately was that, hey, now that you have a chatbot working on the schemes, is it possible that I can swap schemes? Let's say I put my own documents, can I swap out that scraped information about schemes and then plug it back with my own information, which could be anything that's specific to my domain? Then you have a chatbot working on top of it.
Ìý
That's where we created a solution called as Generic Question and Answering which says that talk to your own documents. The concept is essentially the same, that you upload a corpus of your own documents. Then basically you, again, index it, store it somewhere, and then you can do a conversation on those documents. This, while on top level, it looks very simple, but I think it breaks barrier to innovation for a lot of players, especially in social sector because they don't have a technical capacity in-house to build such things. That while at a high level everyone is talking about chat GPT, et cetera, but they were under constant questioning that, how can you make the answers more authentic or make them conditioning or condition those answers based on certain type of data that I wanted to look at?
Ìý
Then that's where it helped them. It immediately acted as a starting point, as an innovation trigger for a lot of people to just quickly come and then put those documents, and we created an API on top of it. Everything was self-hosted and it acted as a starting kit for anyone who wants to create a chatbot. Then this become an instant hit. We tried this in multiple places. A lot of NGOs showed interest, actually. Bandhu is one of-- a great example of that, that they have created a chatbot using this starter kit. Basically, their problem statement was how do you help migrant workers in India who migrate from the villages to the cities to find the right shelter, because many of time these shelters for blue-collar workers are not listed.
Ìý
They have created something which acts as a listing for such blue-collar workers. This became an instant application of such thing wherein they took Jugalbandi question and answering as a starting kit, uploaded their own data into it and then built a chatbot for them in a matter of couple of weeks. Then now they have taken it forward and then built more sophistication on top of it. These offerings really act as a starting point for a lot of such innovation use cases. That was one.
Ìý
Then we took it to even courts, Indian courts. They had very specific requirements in terms of they wanted to search for the right information but they didn't want any paraphrasing aspect. That means I just want to get the answer as is. We did some changes to them. How do you make sure that the right information reaches and at the same time it's traced back to the original source? While an ordinary citizen is-- or he wants paraphrasing, on the other side of the spectrum are judges who says that, "Okay, I just don't want to answer, I want to trace it back to the original source from which it came."
Ìý
We built something for them where you can actually trace back the answer to the original thing and that it doesn't paraphrase too much, and use that actual language that was used in the original documents. A lot of this spectrum was quite large on this. Even though we have a lot of adapters from social sectors, NGOs, but also now we are working very closely with UMANG that is one of the app that we have under government ministry.
Ìý
Wherein they have a lot of APIs that are integrated into the UMANG, and now they're adding one more feature to UMANG which says that talk to government. Talk to government, not just government schemes, but talk to a lot of other government offerings that are there. One of those things that we are seeing is push this, all of government towards citizen using technologies like this to make it more accessible.
Ìý
Vinod: Just to explain, UMANG is almost like the super app of the government of India where they try to bring all features that citizens need to interact with the government. It could be about your driving license, it could be about your power, it could be about your school, hospital. It could be about any of those things. This is supposed to be a portal from where you can go and get all those benefits, and they're now looking to integrate this chatbot into that, which I think is incredible.
Ìý
Ken: That's great. I'm curious, of course, the AI space and generative AI, et cetera, is fast moving. I think you probably have noticed. Also, you've talked about the demand here. Are there strategies you've taken from an architecture perspective or from a rollout perspective to make up for that, that, "Okay, it's a very, very fast-moving technology, demand is very fast moving, next week's not going to be the same as last week?" Are there strategies or are you just rolling with it, or what are you thinking about there?
Ìý
Prathamesh: When we started, we just were following the trend, like, "New models are coming up every week. Maybe new methods are being discovered of writing prompts. How do you cope up with this?" I think initially when we started, we were just following the trend, keeping an eye on what's happening, integrating that into what we can build. I think then later on we felt a need that there is a need to formalize some of the learnings that are coming up consistently.
Ìý
That's where we came up with this post-RAG architecture where some of the very common needs are being actually documented. The common patterns that we saw across. For example, people constantly were struggling with having leaking of sensitive information outside organization boundaries. That's one of a very common example. The data privacy layer in post-RAG tries to handle that.
Ìý
Just like we try to create guardrails in terms of what information should go in, go out, or go where. Not just that. A common pattern that we have seen is that you need to understand that LLM can't do everything on their own. They need to pick and choose tools for right set of tasks. Obviously, they're good at certain things like reasoning, having conversations, et cetera, et cetera, but let's say you want to do something else, then there are specialized tool. If they can do it, then there should be a provision that it should be-- You should call that tool rather than doing that thing on your own. We provision that using the data planning layer wherein it's plug and play. We try to formalize our learnings in that architecture. Yes, and as you said, it's a constantly moving space and we are trying to--
Ìý
Vinod: If I may add, I just wanted to, Prathamesh, take you back to maybe about 10 months when OpenAI suddenly became big. Prathamesh and I were having this discussion, and Prathamesh you were saying, "I don't know what I'm going to do because over the last 12 years, I've been specializing in something and suddenly this has come." We discussed and we said, "Hey, if you can't fight it, join it."
Ìý
We pivoted and we said, "Hey, rather than saying everything that we've done so far is perhaps irrelevant, let's see what we can do by adopting LLM." It's actually interesting because I think OpenAI became more visible last November, and by March, Jugalbandi was out and perhaps one of the most visible application that used LLM out there. I think we were pretty good with moving with the flow. Of course, it's evolving, but it's rapidly evolving. I think we are just trying to stay along with it.
Ìý
Rebecca: Excellent. I mentioned at the beginning that this was accepted into the Digital Public Goods Alliance. Can you tell me a little bit about what that means and some other activities we have surrounding that alliance?
Ìý
Vinod: Digital Public Goods are essentially software content protocols that are aimed at supporting the 17 sustainable development goals that United Nations has. That's the broad theme, but one very key aspect of a Digital Public Good is that it is open source. This alliance is a body of several organizations including United Nations and other foundations, even countries today, India, for example, is one of the alliance members, et cetera. ºÚÁÏÃÅ is the only IT services company who's been invited to be part of the alliance.
Ìý
We are very proud and privileged to be in that group. The reason why we were invited was because out of about 150 certified Digital Public Goods out there, when we counted, we realized that we have contributed to about 14 of those. About 9% of the entire Digital Public Goods that are out there, we have had a role to play. Of course, we've been nurturing two, which is Bahmni and Cloud Carbon Footprint. That's ThoughtWorks' association with the Digital Public Goods Alliance.
Ìý
I think Vakrangee, Bahmni, Cloud Carbon Footprint, there are a number of Digital Public Goods that we work with. It's humbling as well and matter of privilege because some of these solutions that we work on help people at scale and make benefit the last person standing on the in a matter of speaking. For example, Bahmi is very focused on health, and so you can look at it from that domain, but something like Vakrangee or something like even Jugalbandi in this case, so they're domain agnostic.
Ìý
I can now use Vakrangee for health, for justice, for finance, for education. I can use any of these tools across the board. That's where we are also seeing a lot of innovation. There is this concept called combinatorial innovation. Prathamesh spoke about Jugalbandi from the concept of saying, "Hey, we've got different AI models talking to each other and providing this." That's the technology space.
Ìý
You can also look at innovation happening where you have Vakrangee or a Bhashini that you can then combine with something like Bahmni. Bahmni is a health product, where doctors, rather than having to type and give notes for patients can just speak, and then the notes are taken automatically. You just increase the productivity of doctors that much more. This concept of combinatory innovation is starting to really take off, and we are very happy that we are behind in the midst of all that exciting work.
Ìý
Rebecca: What comes next? Are you just mostly waiting for requests to come in? I think this whole concept of talk to your documents, for example, has such potential for knowledge management and many other things, but I'm interested in where you two see this going.
Ìý
Prathamesh: First, I think we get many requests for this every week. Actually, we are getting so many requests that the current Jugalbandi team is no longer able to address all of those. We thought of there should be another Jugalbandi, that how do you use Jugalbandi, something like that? The essential thing is that Jugalbandi is already open source and people can use it. The main pain points that we see is they don't have technology capacity to implement those.
Ìý
How do you move from the current point where you need to have some technical capacity in-house, to more like a local or one click deployment solution. That's what we are working, one area. The other one is looking at some very big implementations, which are citizen scale. One that I spoke about is UMANG. The other one is Department of Justice. These are two very big implementations that we currently have in place, which will use Jugalbandi as a starting point, and then they really take them to the citizen scale. Those are the areas. Apart from that, we are also continuously changing how Jugalbandi is implemented, because now with all these latest changes, strengths that we see, the architecture behind Jugalbandi needs a lot of changes. How do you keep up with those? How do you make it work better? That's something that we are continuously improving on.
Ìý
Vinod: Could you also talk about the G20 session and what was being demonstrated there in G20?
Ìý
Prathamesh: We were invited at G20 to talk about, "Hey, why don't you talk about what we have already built and show it to people across the globe?" This year, G20 happened in India, New Delhi. Basically, the objective was, can you create a chatbot around G20 thing, or like all the information or all the things that are happening at G20? What had really happened was just all the input document-- ingested documents where all the tracks that are there in G20, what are the conclusions that were drawn or what are the discussion points, et cetera, what are the trends, et cetera.
Ìý
Then we ingested those documents, and in just less than three days, we were up with a running bot on which you could talk to in your own language about what's happening at G20, et cetera, and all the information related to that. That was seen as a big success. We had international media, and then even Bangladesh Prime Minister, Sheikh Hasina, showed a keen interest in implementing this in Bangladesh. The team have contacted us, and then we are following up with them. Bangladesh government is really keen to implement this at their own level, similar to what we have done. They want to implement a schemes bot in Bangladesh. That was a key takeaway for us.
Ìý
Vinod: It's very interesting to see that in places when people are not digital natives, the government or the tech community there are actually making rapid strides, use some of these technologies like AI so that the non-digital natives, so to speak, are now able to start accessing a lot of these digital capabilities. It's a bit like, people in the [unclear] countries, they never had a wired phone. They just directly went to the mobile phone because it was much easier to just set up mobile phone towers and then get everyone a phone rather than put all these wire. I definitely see that deep tech is starting to get utilized a lot more in the global south now.
Ìý
This sophistication of technologies that are occurring here are definitely something that I think the entire world should watch out. We've had conversations with several other-- some of our colleagues in other countries as well about some of these technologies including Jugalbandi and Vakrangee, and they're definitely keen to see how some of these can be adopted. The good thing is these are all open source, and so there is really no barriers to adoption. I think it'll be really interesting for people across the world, the international population, to start looking at some of these technical tools that are being put out as open source and see how that can be adopted and utilized.
Ìý
Prathamesh: I just want to summarize this. Someone said this to me, which is very-- that 2023 was year where prototypes were being built, and the next year, '24, is the year when it'll actually be deployed on ground. That's what we see even for us, that there will be so many implementations and then the rubber hitting the road, and then lots of things happening. We are excited to see all of that happening.
Ìý
Rebecca: Excellent. Well, thank you, Vinod. Thank you Prathamesh, and thank you, Ken. As I said, again, congratulations on the Digital Public Goods Alliance. That's somewhat old news, but I think it's still worthwhile highlighting that. I look forward to a bright future of knowledge management and talking to my documents. Maybe we can sort out some of the speech-to-text features of some products that we all know and love. Thank you, everybody.
Ìý
Prathamesh: Thank you.
Ìý
Vinod: Thank you.
Ìý
[Music]
Ìý
[END OF AUDIO]