
Code & Counsel
"Code & Counsel" is a dynamic video podcast where technology meets law. Each episode delves into how digital innovation, particularly AI and machine learning, are revolutionizing the legal landscape. Join us as we explore practical applications, discuss ethical considerations, and unravel the future of law through the lens of cutting-edge technology. Perfect for legal professionals, tech enthusiasts, and anyone interested in the intersection of code and counsel. Brought to you by Quoqo (www.quoqo.com).
For the video version of this podcast, visit https://www.youtube.com/@quoqo5750
Code & Counsel
Democratizing AI: Exploring Accessibility and Innovation in Emerging Models Part 1
Part 1 of this podcast unpacks the transformative developments in artificial intelligence as of 2025, focusing on innovations like DeepSeek and products like QFIN. We explore how emerging models influence business operations, efficiency, and the landscape of AI integration within enterprises.
• Introduction of significant AI advancements in 2025
• DeepSeek's emergence and implications for traditional models
• Overview of financial applications via QFIN & Synapt
• Discussion on AI democratization and accessibility
• Evaluation of LLMs as commodities v. custom applications
• Insights on enterprise adoption of AI tools
• Challenges of deploying LLMs locally
Here are links to AI products mentioned in the podcast:
1. Q-Bot: https://quoqo.com/products/qbots
2. Q-Fin: https://qfin.guru
3. Syn-Apt: https://syn-apt.ai
This episode of Code & Counsel is brought to you by Quoqo, the AI-powered legal tech platform that streamlines contracts, NDAs, and digital signatures with cutting-edge automation. Transform your legal operations today! 🚀 Learn more at quoqo.com.
Need NDAs fast? With Quoqo NDA, generate, review, and sign NDAs in minutes—powered by AI. Spot risks, ensure compliance, and protect your business with smart contract analysis. Perfect for startups, enterprises, and legal professionals. Start now at nda.quoqo.app.
Thank you for tuning into Code & Counsel, powered by Quoqo. If you found value in today’s episode, subscribe and share it with your network. Ready to revolutionize your legal operations? Explore Quoqo’s AI-powered solutions at quoqo.com.
For more insights and discussions on the intersection of technology, law, and business, subscribe to our podcast and stay updated. Connect with us on social media for live updates, behind-the-scenes content, and more. Thank you for listening, and don’t forget to share your thoughts and questions in the comments or reach out to us directly!
Mail us at hello@quoqo.com or visit our website at www.quoqo.com.
So welcome to the 2025 podcast. This is our first podcast in 2025. So is Krithi and this is Chetan and Guru, so we'll be starting off with the podcast. So, sir, there's been a lot happening at Quoqo, and especially outside religious space. So basically, the QFIN and the SYNAPT seem to be making more of this. So what's the perspective of yours on this, sir?
Chetan:I think, before we get into this thing, guru, I think 2025 started with a big bang, correct, both in terms of AI and the events across AI and also events at Quoqo, correct? And I think we should sort of give viewers a perspective about what's happening in the AI world. I'm sure people have heard of DeepSeek and what's come out of China and how it is open source and is cheaper and can be so many things and things like that. So I mean, where we left off with code on console in 2024 was, uh, we were, we were not even on O1 level, correct, and it came to sort of open ai and things like that. But I mean just to give viewers a perspective in terms of large language models in the size of these.
Chetan:Uh, if you go to hugging Face, uh, Guru, or if you generally investigate, you'll find a few large players, right? You've got, obviously, openai, which is open or closed source depends on whom you ask, right? If you ask Elon Musk, he will say I'm sure it's open source, but anyway, there's Claude, Anthropic, the 3.5 sonnet, is superb LLM , right. And then you've got Mistral. Then you've got Microsoft's Phi some math-specific modules, but there are many, many correct for all of these things, and then you have the ability to run some of these locally, you know, using Ollama or whatever you need Qwen, I mean.
Chetan:By the way, china also has its own models. You know you've got Qwen, you've got DeepSeek V2, you know which came out, and there were hints around some of these things as well. But what's happened, I believe, is, you know, depending on how you encode the LLM right, it can either be a text generation or a reasoning LLM, and most of the open AI ones are reasoning LLMs and things like these right. So, and what Deepseek has come up is potentially an O1 competitor and you remember Guru, we you know I told you like the day it got launched, saying that I tested this and I thought that you know it may or may not be production ready yet.
Chetan:At least you know from where we are looking at things and we might need something much more you know, and I thought that, you know, my first inclination was OpenAI still had a lot of edge over some of these things from a production perspective, in terms of our products and things like that.
Chetan:What I have in this context is how things have changed. You know, you've suddenly gone from a paid model and deployed only on Azure and to a fully open source model where you can deploy on your servers, provided you have the ability to sort of like deploy these. But you know, most people don't also realize saying that, if I want to, I mean you have different quantized versions of DeepSeek as well, correct, you may be able to run like a 1.2 billion on a local comp with, say, an 8-core GPU. But if you want the full length, if you want like the 64 billion whatever they've done it with, correct, or 72, you need some really powerful, really powerful machines, correct? So it's not something which I'm saying is it production ready? Is it something that it can, that you can sort of like immediately adopt? It adopt? I would not necessarily sort of like immediately jump at some of these things, correct? It really depends on the context, correct? You know.
Guru:There you go. It definitely takes some time to understand how the models are performing, how it can be used for certain applications, etc. But at least DeepSeek has been groundbreaking and it shackles everything, even the mindsets of people oh, you need a lot of resources to train a model. Oh, it's not possible for anyone outside the Europe or the USA to develop products. That's something that got shattered Going in that direction. I think no.
Chetan:Yeah, but you know I was talking to a VC recently and I mentioned this thing and I had read somewhere saying and it was a proper sort of like encapsulation of the moment saying an LLM has become a commodity, correct, and it's basically on top of what you build is what really matters, and on top of what you build is essentially like a wrapper. But you know you don't need a VC to run that wrapper right and you don't need it. That's what I think, that's what you've done for Quoqo for a long time. But does it change the mechanics overnight? Saying, does it make it accessible? Is it something that I can do it Before Deep Seek?
Chetan:Maybe not at a level of a reasoning model, but there were others as well which did the tasks pretty competently correct. We have a bunch of math. We have got our own QLLM which we did it. We did not need a huge context module to be utilized. We trained on some things and we thought that whatever the current state of requirements are, you know, from a user in a production scenario perspective is more than sufficient with what we have now. Now, if you're, if you're looking at a PhD level, just yesterday Openai, came up with something where, if you're a GPT-pro user, I'm spending like $200 a month, you've got like a PhD level output. Where is that?
Guru:In your research reports, and all this.
Chetan:Yeah, but then human ingenuity is something that you may not be able to replace. And I think DeepSeek, if you read that paper and you know the way that they've gone about, sort of like you know adopting a non-CUDA architecture to be able to sort of even sort of like develop that model. You know gone about sort of like use a very different way of looking at things, correct like use a very different way of looking at things, correct, as you know, like if you're using GPUs, these are usually NVIDIA ones. The reason why AMD has also been left behind is, you know, the world adopts CUDA architecture for base level programming and you don't find the same adoption in AMD or, say, an Intel GPU. Kind of like a scenario, right. So now we've gone about and created something else right which can work in parallel. You know it looks at CUDA as a separate or if it looks at a GPU as a separate, sort of like an entity from which it can tap in and tap out. I mean, this is at a very simplistic level, not that also that I'm an expert at how this, how they have come about doing this, but this involves sort of like a some some element of deep research kind. Uh, but how does this all sort of like add up? And to kriti's question, correct, saying got a lot of things that are coming up, uh, and what is it? You've got some new products also coming up, correct? And what is in that?
Chetan:You mentioned Qfin and SynApt. So just to give users an overview, if you go to our website, www. quoqo. com, then at that point in time there are links to non-legal tech applications. So Qfin is a set of financial agents almost 30 or 32 of them that we built using our expertise on Qbot. That solves things like financial consolidation across subsidiaries, as an example. It's a huge problem and it also, especially in large enterprises, it also works with auditing requirements in mind. So there are quite a few. You can also visit the sub-site Qfin. guru in Guru's honor. So, and then look at it. Synapt is also an agentic AI process. We can assist with some custom development of age ntic AI very specific to a company's requirements as well. During development of Qbot, if you remember, qbot is specific to legal tech in terms of having a conversational interface for legal specific data sets, right? So you've got QFIN, which is also agentic AI , not exactly in the same lines of QBOT, but Synapt goes a little bit further, correct, and uses different models and different agents to be able to accomplish a number of different things as well. Qfin is advanced in the sense that it contains a lot of technology and a lot of different LLMs to be able to accomplish what it does.
Chetan:When you're doing financial apps, as you know, guru, or agents, the accuracy is a lot more important, yeah, important. You try and run a lavati point to uh, a latest on and mathematical stuff like reasoning. You know it. You know it doesn't do well, correct? So, um, and it's not.
Chetan:I mean, you mentioned llama, but it depends on on, and we have our own thoughts around these.
Chetan:I don't want to make these because on these proprietary and how we test on these things as well, if you remember, in the early days of ha GPT-2, you could convince the GPT that 2 equals 1 or 2 equals 0, and you could just load it up with so many things that it would be there. But I wanted to give users the ability to try out Synapt and QFIN. These are cutting-edge agentic AI and fintech-specific products. These are as cutting-edge probably Deepseek probably, you know, when it comes to functional sort of like use cases, and we believe like we are the first person in the world to sort of like create all of these things as well and, by the way, with Qfin we can integrate into a number of different architectures in terms of also integrations, things like these, so that you can just plug and play. So this is our new thing, saying Qfin and Synapt are new things for Quoqo, and utilizing or realizing the know-how of how we do things on the legal tech side.
Guru:I wish some of you and, by the way, we want to speak- about how enterprises are, sort of like adopting, uh, these two things as well very recently yeah, definitely demo to the company when they were looking at various use cases where they can read documents, extract information, how to put them into their own systems, etc because most of these activities have been in manual for them and they have seen the demo some of these applications and they have seen the demo for some of these. You know applications and have found them to be very interesting.
Chetan:Yeah, yeah, and that was on QFIN site, correct, and that is probably going to the department soon, as we know, correct, and also for.
Chetan:Synapt, we had the opportunity to demonstrate to a very large financial services company that the work being done by over 24 analysts is essentially sort of like, can be automated to an extent with more efficiency and so urgency, as is required. It's put tasks within, say, a 20-25 minute window, which is like an eternity in computer processing times, correct, which people would take like 24 people would take like a month to execute. Essentially right. So it's reaching a phase saying, you know, there's possible to sort of bring in a lot of efficiency, cost savings right into the process, not that the human element is taken out of the mix, but the human element can be focused on better tasks of analyzing what comes out of it, rather than trying to dig for the information yourself. I think the better way to do this is, if you rem ember altavista guru it was the for runner of Google, right? You know, if you try to run search on altavista, it used to be a big thing, right, saying that you are able to look at stuff and search for it.
Chetan:I think that is the use case in many enterprises right now. So the way they do things is at least about 35 to 40 years old. It has not been changed and, uh, for a long time. And when you expose all these things, it's basically like it's like uh, uh, it's a paradigm shift in thinking, saying, oh, there are better ways to do things because, as technology advances, reallocate resources, time and money, correct for all of these things as well. So that is how some of the AT&T KI systems are also sort of like scaling up, but sorry, it's a long-winded. And sort of like a long, but sorry, it's a long winded, and sort of like a long monotone from my side.
Guru:Just also want to add how much we are using these applications in our day-to-day activities.
Chetan:Yeah, that's an important thing to highlight also, I think within Quoqo and Kriti you need to speak how you're using some of the agents that you have. Correct in your day-to-day work as well, correct?
Krithi:so that's it. Like you mentioned, like deep seek and o1, so I am hearing many things from them. Many people, like you know, they're comparing both, both of them, like what's the main difference between both deep seek and o1?
Chetan:I think. Basically, deepseek is also trained on OpenAI LLMs or responses. They claim that only about 830 of them were used to train. 830 responses from an API was used to train deep seek correct or to the level correct. They use a different way in which they use reinforced learning correct.
Guru:They use uundefined method called undefined they used a method call mixture of experts. Basically, they develop a small. It's like developing small LLMs and integrating them into a bigger umbrella and, based on the question that you asked, it basically activates a particular agent which, uh, gives you the more pointed response.
Chetan:That's how so what would be right to say. Guru is, like you may remember, root learning nowadays in schools it is not encouraged but let's say you were to root, learn tables from 1 to 20, correct, correct.
Chetan:That is used to basically develop your first language model, correct. And then you know how to multiply, right. Then you teach a child or a computer to go to the 21st table or the 35th table correct. And then you extend it, you know, and then it triggers out that you can do it. You know for infinity out that you can do it. You know for infinity, correct as well, correct, so. And the way that you can learn the table is multiple ways. You can do an addition, you can do a multiplication correct in new squared methods, you know, depending on. So there are, these are. So our Deep seek. Basically, it has explored one of the methods similar to this, in which it's like a thinking method to is that right, guru? Is that the right?
Guru:I mean, as you're looking at a dish, let us say, there's a closed room in which there are multiple experts. There is a finance guy, there's a marketing, there is a kind of no, or is the no? All these different types of people. Now you're asking question. I was standing outside the room and you'll get the correct answer because the person who is actually having expertise in that field is going to answer you.
Chetan:So that's basically how you are able to start off like yeah, but you know earlier you would use instruction based, sort of like, you know you would. You would have to change the context for an llm through prompting to sort of like put it in a different frame and make sure that. But now even in the LLM, also sort of like, you have the ability to add memory. So it's all. And also one of the things is you know, uh, if you use, for instance, like 3.5, the early version of the another turbo one, correct uh, as an example from open ai correct, you know you would it would quickly lose context, right, and then it forget what. So some of those things are being tackled on a uh, much better, on a much better latest version yeah, and I was larger context windows.
Guru:They have better output, output, uh, you know okay, size.
Chetan:Okay, we have come to that.
Krithi:I have my own things on that connect with yeah yeah, it's like is it very hard to implement in the local production?
Chetan:Yeah, I have my own views on this, correct? Do you want to go first, guru? No, go ahead. Okay. So see, when you deploy locally, correct? You're assuming that you have servers right which are capable of running an LLM. So let us say, you remember the logarithmic scale in school, correct? So you have.
Chetan:Let's take the number two. You have two squared, two cubed, two to the power of four, two to the power of E, so on. Right? So you should take two squared. You know you get a smaller number. You would take two to the power of 8, it's only large. If you take 2 to the power of 0, it becomes infinite, correct? Or 1 to the power of 0, 0, whatever thing is so.
Chetan:But the point being that these are all quantizations, correct. So the larger. So you can take a very big model and take a sample of it and you can run it on a smaller level computer. It only is, like I only know, two to the power of two, correct? But if you take two to the power of 64 or whatever and this simplifying this you have a very large number to be able to compute two. You do two to the power of 64 on your calculator app, on your computer the given error, it'll give you an infinity error, correct? So because beyond a certain point, the floating points it goes beyond, and if you know it goes beyond the fpu's ability to solve, like, create such a large number, crash. But what happens is at that scale you have a lot more information easily which is not compressed. So for, from the l perspective, you can access that information. I love there was a visualization from China about how an LLM sort of like operates different layers right and how it's able to train and figure out what it is right.
Chetan:In an uncompressed model it has access to all of these and it basically depends on the gpu's power to sort of like chunk out and be able to arrive at a computation that makes sense, correct? So it is like dealing with uncompressed files and dealing with compressed files. Yeah, and if you need to deal with a compressed file, it's going to take time for you to extract, open it. It may not have the same color accuracy. It may have removed a bunch of repeating logos, repeating colors, as an example in a graphic, and things like that, but in an uncompressed format it's what everything is. Every pixel is a representation of an actual graphic. The color video editors. As an example, color accuracy is very, very critical for them, so they'll have to deal with uncompressed formats. And when you actually view this podcast, it's a compressed version of whatever it is. It's not the original right, similar to LLMs. You'll have differences in quality when you run all of these things as well.
Chetan:Now back to your question saying is it easy to run all of these things? Why I said this? If you want to run the uncompressed version, it's going to take some very serious hardware. The infrastructure has to. You need to have the proper servers, the GPUs, the data storage, and it's also a question of how fast all of these can do a token output Correct. If you are in a production-grade environment where people sort of, let's say, you get a token output of 20 tokens, correct, but your output window is 6,000 tokens, how long is it going to take you to get 20 tokens per second? To figure out something which is 6,000 tokens and each token, let's say, is about 13, 14 letters on an average. So it's going to take some time.
Chetan:If you use our products, if you take Quoqo products, it's literally instantaneous, correct, because we have the infrastructure to run some of these things as well. But the short answer is unless you have the hardware necessary to run all of these things and the hardware keeps progressing correct. It's not that older hardware can't run or newer hardware is on the way forward. It really depends on what kind of application that you want and to whom you're catering to. You know, for someone who is sort of you know is able to wait correct, then you know it is not mission critical, then you can wait. You know, maybe it will land in your inbox one hour later as an example.
Guru:Or run it over right and do it in the reserves, maybe the next day or so.
Chetan:That may be a case where you may want to run this on older servers, lower servers, things like this, but then if you want to run this production-ready environment and conforming to genZ standards of running everything instantaneously, then you need something very, very spontaneous and very, very powerful. Yeah, yeah, I think that's very much so. And also to add to that okay, sorry, but see, depending on what you run, you know, if you run things locally, you're also limited by something called an output window size. Correct, it really depends, once again, on the infrastructure. You can change that. But most stuff running locally if you use open source stuff to be able to run and lend locally limited about 2,000 tokens, whether 2,000 tokens takes forever to generate, also depending on the hardware. And if you want an open AI research tool, correct, to generate like 15 pages, these want to take, like even for OpenAI it'll take time correct, it's easily some 15 minutes and minutes.
Chetan:Even for them with those, yeah, and if some of those things I don't think locally can run, you know you'll need like an entire server farm to be able to sort of like be able to handle complex tasks. Also, you know, for shorter things, like you want a conversational bot as a fancy to run locally. I think it's great to have a locally running LLM. But anything more where you know, as an example, correct, an example. We have like diligence project, where we have product called diligence, where it can review so many things together and basically, you know, come at a conclusion or help you arrive at a conclusion or an analysis on a transaction.
Chetan:As an example, correct, it may not be possible to run it with current output windows. You know what I'm saying. Know, it's like it's like getting cut off mid-sentence. Okay, so you'll get a report, but you know what's going to happen is the computer is going to wait for some time for this uh output to come through. Then, uh, you know the computer will wait for the time allotted and then say, okay, this is it, and then it has to complete the task. You're dealing with, at the end of the day, a machine. The machine has a start loop and an end loop, and this is essentially how it works in real life. But that's what I think, guru, but Guru is a lot more. Completely agree with you.
Guru:So different use cases. It all depends on the use cases that you are working with. Yeah, so different use cases, it all depends on the use cases that you are working with. You want a repository kind of a tool where an AI repository, like our legacy, where you can upload your documents and you expect the documents to be available maybe next day or probably later, so we can use local models with a very high level of quantization, correct. But then if you want instantaneous results, such as board or some of those other use cases, some of these things may not be suited for the use case. Basically, Correct.
Chetan:So maybe that is a long shot of it saying the higher the quantization level, the lower the quality that it becomes. And when the local hardware owners