
Sidecar Sync
Welcome to Sidecar Sync: Your Weekly Dose of Innovation for Associations. Hosted by Amith Nagarajan and Mallory Mejias, this podcast is your definitive source for the latest news, insights, and trends in the association world with a special emphasis on Artificial Intelligence (AI) and its pivotal role in shaping the future. Each week, we delve into the most pressing topics, spotlighting the transformative role of emerging technologies and their profound impact on associations. With a commitment to cutting through the noise, Sidecar Sync offers listeners clear, informed discussions, expert perspectives, and a deep dive into the challenges and opportunities facing associations today. Whether you're an association professional, tech enthusiast, or just keen on staying updated, Sidecar Sync ensures you're always ahead of the curve. Join us for enlightening conversations and a fresh take on the ever-evolving world of associations.
Sidecar Sync
Fargo vs. Klarna & The Rise of Reasoning Models | 81
This week on Sidecar Sync, Amith Nagarajan and Mallory Mejias explore Wells Fargo’s virtual assistant “Fargo” and how it stacks up against Klarna’s AI tool from a year ago. With 250 million fully automated interactions and measurable impact on customer engagement and bias reduction, Fargo offers a powerful case study in applied AI. Amith reflects on what’s now possible for associations, why a narrow pilot project is a smart first move, and how “human in the loop” isn’t just a safety net—it’s strategic. The duo also breaks down Microsoft’s new Phi-4 reasoning models, which pack PhD-level performance into incredibly compact packages that can run on your phone. If you're wondering where the AI trend line is heading, this one’s for you.
🔎 Check out Sidecar's AI Learning Hub and get your Association AI Professional (AAiP) certification:
https://learn.sidecar.ai
📕 Download ‘Ascend 2nd Edition: Unlocking the Power of AI for Associations’ for FREE
https://sidecar.ai/ai
📅 Find out more digitalNow 2025 and register now:
https://digitalnow.sidecar.ai/
🎉 More from Today’s Sponsors:
CDS Global: https://www.cds-global.com/
VideoRequest: https://videorequest.io/
🛠 AI Tools and Resources Mentioned in This Episode:
Fargo ➡ https://sites.wf.com/fargo/
Klarna AI Assistant ➡ https://www.klarna.com
Microsoft Phi-4 Reasoning Models ➡ https://huggingface.co/microsoft
Chapters:
00:00 - Introduction
03:47 - Meet Fargo: Wells Fargo’s AI Assistant
05:59 - Comparing Fargo with Klarna’s Assistant
08:57 - The State of AI Agents in Associations
13:05 - Event Support: A Smart Use Case for AI
15:00 - Human-in-the-Loop: Not Optional, But Essential
23:44 - Private AI: Local vs. Cloud Deployment
26:46 - Microsoft’s Phi-4 Models: Small and Mighty
32:50 - Why Small Models are a Big Deal
43:54 - AI Trendlines and the Future for Associations
🚀 Follow Sidecar on LinkedIn
https://www.linkedin.com/company/sidecar-global/
👍 Please Like & Subscribe!
https://x.com/sidecarglobal
https://www.youtube.com/@SidecarSync
https://sidecar.ai/
More about Your Hosts:
Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.
📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan
Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.
📣 Follow Mallory on Linkedin:
...
The most important thing for all of you associations to note is that you have options. You have ways of doing secure, private AI inference. There's a number of ways to do this and you can even do it locally, on-device.
Speaker 2:Welcome to Sidecar Sync, your weekly dose of innovation. If you're looking for the latest news, insights and developments in the association world, especially those driven by artificial intelligence, you're in the right place. We cut through the noise to bring you the most relevant updates, with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amit Nagarajan, Chairman of Blue Cypress, and I'm your host.
Speaker 1:Greetings and welcome to to the Sidecar Stink, your source for content at the intersection of all things artificial intelligence and the world of associations. My name is Dmitry Nagarajan.
Speaker 3:And my name is Mallory Mejiaz.
Speaker 1:And we are your hosts and, as always, we've prepared an awesome episode for you guys to get some really interesting topics at the forefront of AI and we're going to talk all about how they apply to you in the world of associations. So excited to get into that. But first of all, Mallory, how are you doing today?
Speaker 3:I'm doing pretty well myself, amit. It's a nice chilly day in Atlanta, so I'm enjoying that. Been getting outside a lot recently, since the weather's been mostly warm and, yeah, I've had some fun auditions come through on the acting front, so it's been a good productive weekend for me.
Speaker 1:What about you Fantastic? Well, you know I joke around a lot of times when I'm in New Orleans, which is home base for me, that that's the center of the universe for associations. Of course it really isn't no-transcript. Yeah, I had a breakfast chat with somebody and have a few more meetings lined up meeting with some of our team members across our company, so it's always a productive time in DC. It's pretty much nonstop from early morning till late in the evening when I get into town.
Speaker 3:Yeah, I was saying to Amit before we started recording. I didn't know how he was possibly going to squeeze in this podcast with his schedule today of all these meetings, but you showed up, Amit, I'm really happy we're here.
Speaker 1:Well, episode 81,. We got to keep the streak going and this is so much fun to record and I'm always interested in making time for it. My audio quality may not be as good as normal, unfortunately for this episode, so apologies in advance if that is the case and that is your experience. But I'll be back to a normal recording session shortly, but for now I am on the road and doing my best.
Speaker 3:We take the sidecar sink all over. I don't know if we've ever done it internationally yet. I'm trying to think on my end.
Speaker 1:I don't think I've ever recorded in another country. What about you, Amit? I don't believe so, but that sounds like a challenge.
Speaker 3:I think I need to book a flight somewhere. Yeah, hey, let's do it. We'll do like maybe a Mexico version of the sidecar sink. That'd be fun. Well, today, as Amit mentioned, we have some exciting topics lined up for you. We're going to first be exploring this Wells Fargo AI assistant and then doing a little bit of a reflection on an episode we did it was actually episode 21 where we talked about Klarna's AI assistant, just to do a little compare and contrast. And then we will be talking about the latest Microsoft's PHY4 family of models with some great naming conventions, as we always chat about on the Sidecar Sync podcast.
Speaker 3:So, first and foremost, the Wells Fargo AI Assistant is called Fargo. It's an advanced virtual assistant integrated into the Wells Fargo mobile app that helps customers with a wide variety of banking tasks through both voice and text interactions, from checking balances to processing payments and handling refunds, and also providing personalized financial guidance. Fargo serves as a 24-7 banking assistant for Wells Fargo customers. The assistant uses a model agnostic architecture and it employs different specialized LLMs for various tasks. So that's that multi-agent framework that we talk about often on this podcast. It has a privacy-first design, so no personally identifiable information is exposed to external language models and sensitive data is processed locally before any cloud interaction. They're seeing some impressive results so far. So there have been 245.4 million interactions with the assistant in 2024, which is actually double what they projected and these are interactions entirely without human intervention. So around 250 million interactions without human intervention. They're seeing deep engagement with their AI assistant, so 2.7 interactions per session on average with their AI assistants. So 2.7 interactions per session on average, and across the board with their AI initiatives they're seeing a 3 to 10x increase in customer engagement and something that's also interesting to note. We've talked about bias that's built into AI models because of the material that it's trained on. We've also talked about bias with humans right, because when we're making decisions, we're pulling on all of our previous experience as well. Something that's been interesting with their AI initiatives at Wells Fargo is they're seeing some bias reduction in certain areas, so the AI has led to fairer lending decisions when it comes to loans, which I think is quite interesting to note. Behind the scenes, pega is the company particularly their customer decision hub behind all these AI initiatives at Wells Fargo, and it helps them analyze billions of interactions to determine the next best conversation for each customer, making Fargo's responses highly personalized and relevant across channels and, as I mentioned in episode, so it was episode 21. Right now we're recording episode 81. And, as I mentioned in episode, so it was episode 21. Right now we're recording episode 81.
Speaker 3:So a long time ago, 60 weeks ago, we talked about the Klarna AI Assistant and I wanted to do a little bit of a reflection. There aren't a ton of stark differences, but there are a few. So, despite serving fewer customers, wells Fargo has about 70 million customers. Klarna has 150 million. So substantially different has about 70 million customers. Klarna has 150 million, so substantially different.
Speaker 3:Wells Fargo handled that 250 million interactions that I mentioned in the whole year of 2024, compared to Klarna's 2 million-ish in its first month. They haven't published their full number for 2024. But even comparatively 2 million in one month if it continued on that trend or even considerably increased each month in one month, if it continued on that trend or even considerably increased each month 250 million interactions at Wells Fargo is pretty impressive for their 70 million customer base. Klarna also publicly stated that its AI assistant was doing the work of about 700 full-time customer service agents, handling two-thirds of all customer service chats. Wells Fargo has not published a specific equivalent number of agents replaced.
Speaker 3:But given Fargo's scale as far as exceeding Klarna in both total interactions and per customer engagement, I would say it's reasonable to infer that Fargo automates work that would require potentially thousands of agents. And something also worth noting is the feature evolution in both. So the Klarna assistant has expanded from customer service to shopping recommendations, personalized shopping feed, multilingual support and chat GPT integration for shopping advice. The Wells Fargo assistant has added AI driven spending insights, actionable financial tips, improved money movement and financial insight summaries. So going beyond this basic customer service, routine interaction and really providing further value to the consumer, which I think is quite interesting. So, amit, you've been talking about virtual assistants, ai agents, really from the beginning, from the beginning of this podcast, for sure. How have you seen that conversation evolve, particularly over the last year?
Speaker 1:You know it's interesting, you mentioned episode 21 versus 81. So it's exactly 60 episodes, or roughly 60 weeks ago. When we talked about Klarna, I think back then both of us were really excited and impressed by what Klarna had achieved with, at the time, a fairly early model I mean it was GPT-4, if I recall correctly but it was, compared to what we have now, a very rudimentary model, and what they achieved was pretty remarkable. And so now it is good to have that perspective because in 60 weeks we've had roughly a little bit over two AI doublings in power, so a lot of fascinating things to unpack here. So to your question of what I've seen evolve feature set increase is definitely something I think that makes sense, because if you can engage people in a way that they find pleasing, that they find useful, they'll come back more. And if you have more functionality to offer, then you can go deeper. So you know, if Wells Fargo is able to, for example, provide spending insights directly in their platform, that could be really useful for a lot of people, especially if you have a credit card and a bank account with Wells Fargo or maybe some other things. That broader set of insights that you could get from your bank could be pretty powerful and pretty helpful. It could help you make better spending decisions. It could help you make better decisions with respect to investing, even and those are things that third-party apps have been doing for a while products like Mint or a number of others Rocket Money is another one that have some AI features, but this is an opportunity for a platform like a bank to bring some of that engagement back to the bank as the core platform for most people's primary financial interactions. So I think that's interesting.
Speaker 1:In my experience, the association community has been moving a little bit slower than I'd like in terms of member service agents. Overall, people have been doing bits and pieces. We have, in our own family of companies, a number of groups that are working on things that are in this space, one of which is obviously Betty, which has about 100 associations working now and growing quickly, and Betty is definitely in this realm as the knowledge agent, the expert agent in terms of all things association knowledge. We've mentioned previously on this podcast. We're launching something specifically for member service that deals with routing incoming asynchronous messages like emails and SMS and so forth, but you know, I'd say that we're still in super, super early innings. So if you're an association that's thinking about this, saying, hey, we'd love to have something like the Wells Fargo assistant or like the Klarna assistant, you've still got plenty of time ahead of you. But I wouldn't, you know, spend all year thinking about it. I'd run an experiment. To me, this is.
Speaker 1:What's so powerful about this particular use case is it's both sides of the value equation. The one side is cost reduction or efficiency improvements, but the other side is improving the value to the customer, which is the biggest thing. When you see people using a service more and more, that should light up a light bulb for you. It says, hey, there's something good here.
Speaker 1:When we see, for example, engagement in a web-based search tool compared to a web-based knowledge agent, where the knowledge agent has literally 50x longer session times than a search tool, that should tell you something about the value you're creating. It's not that it takes 50 times longer to get the information. It's quite the opposite, in fact, the knowledge agent is much, much faster at getting people the information they want. But rather, because people found value and it's low friction, low time to value for the customer, they come back more. So if they come back more, there's more opportunities to engage more opportunities to create value and have a reinforcement cycle. So I find it to be a really, really exciting area for associations to jump into, but, as I said, I think it's still super early.
Speaker 3:Okay, I was going to say I'm sure we have some listeners thinking, well great, wells Fargo did it and Klarna with their 70 million and 150 million customers respectively, that's feasible for them. You said it's still early stages for associations. Can you contextualize what you mean by that? So what would you say is currently feasible right now for a pilot project with a member service agent?
Speaker 1:I think you could stand up a member service agent over the next three to six months in your association a number of different ways. There are a number of tools you could use for that, using either off-the-shelf tools and just string them together with different kinds of agent frameworks. You could certainly partner with companies that specialize in this, either in the association market, like our companies, or companies that are outside of the association market who do this kind of work. There's companies that are focused on kind of large enterprise, like the one that you mentioned. There's also a company called Decagon, and Spiera is another one, that do customer service agents kind of at the very high end of the market, and people in the association market, I think, are gonna have association specific solutions more and more. Obviously what we're focused on is that, but you're gonna see more and more choice there. So I think there's off the shelf stuff you deploy and you can also build something in this space.
Speaker 1:I think this is a great opportunity for an experimentation round where you could do something really, really small. Don't try to boil the ocean and solve all customer service or member service inquiries. Focus on a pain point. For example, many associations have a highly seasonal volume of activity that comes in around their annual conference. So prior to the annual conference they might have a fairly reasonable inflow of inquiries, but right before and during and after the conference they might have, let's say, a 30 or 60 day window of time on the calendar where it's just completely crazy. Well, what if we could put in place a great member slash event service AI that could help field 50, 60, 70% of those questions that are fairly repetitive? That's a super achievable thing, and within the narrower context of events, the domain of questions usually are far narrower. So I think that's an easy thing to go experiment with.
Speaker 1:Overall, what I'd say is, to me, the thing that you have to remember is yes, you're an association, you're not Wells Fargo. Yes, you're an association, you're not Amazon. But the technologies have come down so much in cost, they're so much more accessible and they're so much more powerful that not only can you do this as an association, but you're going to be expected to. Your members don't care that you're not Wells Fargo or Amazon or Netflix or Karnak. They just expect the same quality of experience from you that they expect from their largest consumer experiences. And it may not be fair, but fairness doesn't really matter. If the eye of the consumer, the expectation and the bar has been set at this level, they're going to expect it soon enough from you, so might as well get ahead of that and provide them something slightly before they might expect it from the association.
Speaker 3:And then you can provide that additional value, those insights, things that really, really help your members in their profession or industry to further create that value-based relationship.
Speaker 1:You know, I'd say this is also a great time to reinforce a concept we've talked about on the pod Mallory a number of times, which is how do you prioritize your energy? Your energy might be classified as human labor, like your team's time, your volunteers' time, also your dollars. That's part of the energy flow, right? Where do you invest? And a lot of people are saying, well, our infrastructure is so terrible. We've got ancient systems. You know, we've got a really old MS and we've got to replace that thing or an old LMS decent chance they'll still work this year and next year.
Speaker 1:And the question is instead of replacing a major system like that, which is, you know, significant effort, sometimes takes 18 to 24 months to fully go through a process like that, sometimes longer. What if you didn't do that? Right, and you said, hey, we're going to deprioritize some of those classical association IT things and instead invest a few dollars and some time, time being the most important ingredient to experiment with this use case. Right, go figure out how to make a member services agent work for you as your priority. Let's say you did that for the next six months and you hit the pause button on a pending AMS selection or AMS implementation. The amount of value you create for members from this technology is so much higher. It's dramatically different than what an internal system replacement might yield. So again, I'm not suggesting that you work with an unstable, shaky foundation, with ancient technology forever. But if you have to choose between something like this and infrastructure improvements that, frankly, nobody's going to really notice on the external side, I'd focus on this and maybe you don't have to choose between the two in your group and your association, but most people do have to choose between those kinds of priorities.
Speaker 1:So I decided it'd be a good time to remind people that you can attack these things if you're willing to say no to stuff. You just have to draw a line in the sand and say you know what. We're going to put a pause on all these old, classical types of systems and projects and we're going to keep them running, obviously, but we're not going to invest big dollars and big energy in these older technologies. Instead, we're going to focus on making these new AI things work. Last thing I'll say about that is once you do these kinds of new projects, you'll actually reframe what you think you need.
Speaker 1:When it comes time to replace some of that infrastructure, you might think you know what you want in that next generation AMS, but frankly you probably don't. When you build an AI technology or two and deploy it into production, you will get so much better of a sense of where your members want you to go, and that might change the requirements for what that new AMS is going to do. And then the last thing related to that, by the way, is the AMS vendors are also figuring that out. Whether you're talking about a traditional AMS vendor in the space or some other type of solution, everybody in these types of database applications is working really hard right now to figure out how to AI enable their systems, so I'd give them a little bit of time too. I think you'll have better choice and you'll have better visibility into what you're actually going to get.
Speaker 3:Mm. Hmm, hearing you talk about that. We've just hit the one year mark of moving to Atlanta and it made me think of our experience last year of moving into an apartment we had never seen and trying to furnish it before we were there and realizing sometimes you just need to be there physically in the space before you realize, oh OK, we need this size couch, we need a TV right here. It makes me think of you or of associations specifically trying to replace their AMS and then perhaps getting to that point and thinking, oh gosh, now with AI, we realize we need all these other features and all this other infrastructure. So I think it's a really valid point, amit, and I want to talk a little bit about this pilot project that you mentioned.
Speaker 3:I can definitely resonate, having been the primary point person at Sidecar who would take in a lot of inquiries approaching digital. Now the conference, before the event and after the event. However, I would think if you came to me and said we're going to, you know, roll out this AI agent and there's going to be potentially no human intervention, right, we're just going to roll it out, I would be intimidated by that and, to be honest, scared that it wouldn't work. So and I'm sure our association listeners feel the same way what is your thought on the pilot project of trying to roll out an agent that has no human intervention versus trying to roll out an agent that does the routing, like you kind of briefly mentioned earlier? Is no human intervention the goal? Talk me through that.
Speaker 1:I don't think that's the goal at all in almost all cases. I don't think that's the case either for Klarna or Wells Fargo, as I understand their models. It's more about making available instant and high-quality responses for most things but at the same time being able to interact with a human agent when appropriate. And this might sound like we're trying to find a silver lighting in terms of the employment side of the equation here in saying that the humans can focus on higher value activities. That's oftentimes consultants speak for, saying they're going to be laid off. In reality there's some of that that might happen in the association market, probably not so much, but in the broader market, if you have 10,000 people in a call center, maybe you don't need 10,000, maybe you need 2000,. But you need your best 2000 people. So there's some issues there for sure, when you think about that across an entire sector. But for the association world, I think of it this way that you know your member services folks, your event services folks. They have a lot more to offer than just answering rote inquiries, asking, like people say hey, when do I need to register? Where can I check in? Can I bring my spouse to this particular function? What's the guest registration fee. You know where can I find this particular article, All these kinds of basic help, desk questions, and AI can nail all those things and those people who are asking those questions are going to be happier with a better answer. That's nearly instant. But those member services reps, those event folks can have conversations with people, can learn more from those members, can take time to actually have live synchronous phone calls and video calls, to really be the concierge, to help provide an experience, so that it feels like you're checking into the four seasons when we come to the Euro event rather than checking into the Red Roof Inn. So you know, the whole idea is that you want to level up the caliber of service and the quality of service that you provide, and you can do that. You can, you know, punch way above your weight class by using AI to take care of the rote stuff.
Speaker 1:Coming back to your point, that's where this concept in agentic systems, called human in the loop, is so critical. That's for key decision-making, but it's also for escalation, where the AI should be trained, and can easily be trained, to be smart enough to not try to take on everything right, when you can tell the AI hey, for these three or four different kinds of inquiries. We can answer it in these different ways. These are the ways. These are the tools that are available. You might have knowledge agency. You might have capabilities around database lookups. There might be two or three different things that the agent is really good at, but we can tell the agent to err on the side of getting a human involved.
Speaker 1:If there's any question as to the value of the quality or independent of the objective purpose of the call or the inquiry, let's just say that the AI detects a tone of frustration. Let's say that there's two or three iterations of emails and the AI detects that the person's just not particularly happy. You know, AI is really really good at reading into the emotion from just plain text, and that's even more true with audio. If you were to do this with audio capabilities and then to be able to detect that and say, hey, you know what I think? Mallory is not super happy with me right now, it's the AI. I'm going to forward this message to somebody else, for a human in that case, right To help Mallory out.
Speaker 3:Yep, as you said, ai is pretty good at detecting sentiment. It's not something you think it would be good at, but, like word choice and especially if it has more information through audio video, it does a pretty decent job at it.
Speaker 1:Well, you can also tell, like, how someone, if someone's coming back two or three times and they feel like they're asking the same thing repetitively and they they like, or even like using a simple like phrase, like as I said. Right, when I say as I said, I feel like I'm repeating myself, and I find myself doing that with customer service reps in that, you know, kind of ongoing, infinite loop of emailing people who really don't have a great idea of what I'm after but are there to kind of, you know, address my issue in some way. So I think there's a lot of opportunity here. But yes to your point, mallory, you make a really important one. I wouldn't try to like just hand this over to the robot and say good luck and hope to see in the future, because I don't think that's a complete solution. I think you have to level up what the humans do in this equation.
Speaker 3:My last question here, amit, is that the ability to process sensitive data locally before any cloud interaction is a major privacy advancement and is, of course, essential for things like banking or buy now, pay later. When you're dealing with people's payment information, do you think that this is a necessity for associations?
Speaker 1:I think it's an important concept that associations should be aware. A lot of people make the assumption they'll say to me, for example, oh, I love the idea of whatever the application is, but they'll say, I have all this sensitive data, or it might not be sensitive like patient data or banking data or something like that, but it might be just. We have a lot of content in our private knowledge repository. We don't want to send that to chat, gpt or to cloud. We just don't trust them and that's a reasonable concern. But people make the assumption that that's a dead end. Right, that that's the end of the conversation.
Speaker 1:Whereas there's both ways of doing private deployment in the cloud of your own models where you could say, hey, I'm going to run Lama or a number of other models in a private cloud deployment and to what you specifically brought up, these models are shrinking, their capabilities are growing and they're shrinking in size. You can actually run them locally on a phone, in a web browser and in ways that also provide additional privacy. So I don't know exactly what Wells Fargo is doing, but Apple's strategy around this sounds similar. I don't know exactly what Wells Fargo is doing, but Apple's strategy around this sounds similar. What they'll do is on the phone itself, the LLM that's running locally a very, very small LLM will try to get the essence of what you've asked and then determine if it can answer the question locally or if it will need to promote a portion of that information.
Speaker 1:Abstracting out anything personally you may have shared with, just the general concept get higher order knowledge from a remote LLM, also operating in a secure manner, and then pull that back to the local LLM to synthesize a response that then reintroduces your personal information. But the personal information never really left the local environment. The most important thing for all of you associations to note is that you have options. You have ways of doing secure, private AI inference. There's a number of ways to do this and you can even do it locally on-device, and that's going to continue to be the case. There's all this growing collective body of language models that you can run that are smaller and smaller, that run extremely efficiently on desktop computers and laptops and even on phones.
Speaker 3:Well, you really set me up perfectly there, amit. To go to topic two, which is Microsoft's 5.4 models small language models with big reasoning power. So Microsoft just released a new family of 5.4 models, including these great names 5.4 Reasoning, 5.4 Reasoning Plus and 5-4 Mini Reasoning. Those aren't too bad. I've seen worse, I would say, come out of OpenAI. The 5-4 Reasoning models are very much a part of the broader trend toward reasoning or thinking models, called either-or, that can perform advanced reasoning, an ability to analyze complex scenarios, apply structured logic and solve problems in a way that resemble human thinking. So, to break down that 5-4 family, we've got 5-4 reasoning, which is a 14 billion parameter open weight model, fine-tuned for complex reasoning, math, science and coding tasks. It uses supervised fine-tuning with high quality, curated data enabling it to generate detailed reasoning chains and match or surpass much larger models on benchmarks. Then we've got 5.4 Reasoning Plus, which builds on that 5.4 Reasoning model that I just mentioned, further trained with reinforcement learning and able to use 1.5x more tokens for even higher accuracy. It matches or exceeds the performance of much larger models like DeepSeek R1, which we've covered on the podcast, which, as a note, has six hundred and seventy one billion parameters compared to the 14 billion parameters of this model and OpenAI's O3 Mini on several key benchmarks.
Speaker 3:And then we've got 5.4 Mini Reasoning, a compact three point8 billion parameter model optimized for mathematical reasoning and educational use, suitable for deployment on resource-limited devices like mobile phones and edge hardware. So Amit was already kind of gearing up to mention a lot of the practical benefits of smaller models. They can run locally on PCs, mobile devices and edge hardware. They're also designed for offline use in co-pilot and PCs and, of course, there are lower computational requirements that make them more accessible and cost effective. All three of these models are openly available under permissive licenses and they can be accessed through Azure AI Foundry and Hugging Face. So, amit, what are your initial thoughts on the 5-4 family of models?
Speaker 1:Well, first of all, let's spell this out for folks, because 5-4 might be pronounced or spelled differently. It's P-H-I dash number four, and that's, I think, part of what makes it so hard to pronounce is there was five, three, and then it's almost like you're saying five, but yeah, five, four you're right but yeah, that's what I always think.
Speaker 1:When I first started hearing this but it's p-h-i-p-h-i-dash-four my thoughts are, wow, this is really exciting. So what you said that's part of many really interesting comments is that, not across all benchmarks, but across several important benchmarks. 5.4 Reasoning Plus, which I'll talk about in more detail in a second, matches or exceeds DeepSeek R1, which, if you recall, r1 shook the world back in January February timeframe. People freaked out because it was as performant as OpenAI's then most powerful AI reasoning model, the O1 and O3 mini capabilities. So here's the deal. Here's the way I think about this is that this is a tiny model 14 billion parameters and today's model size is really small, capable of being run, probably on some phones, but definitely on a PC or a Mac on some phones, but definitely on a PC or a Mac. And one of the ways they're able to make it perform, as well as by giving it more time to think when you ask the question.
Speaker 1:So, reasoning slash thinking models it sounds like some new category of model. It's this really cool, complex thing. In reality, actually, it's not all that different from the models we've had in the past. It's essentially saying hey, model, I want you to spend time to think about this problem, to be able to spend time just like going deeper and thinking through the problem, breaking it down step by step into small chunks and then compiling the results of each of those sub steps into an answer. Another way to think about it is that the model is able to revise something that it thought about previously. When we interviewed Ian Andrews from Grok Grok with a Q, he used an analogy that I love and I've repeated this a number of times which is that it's like giving the model a backspace key where the model can edit its prior response as opposed to simply writing as fast as it can. So that's what these reasoning and thinking models do, and the way to think about it for you is that you know you have access in a 14 billion parameter model is something that previously, literally two months ago, required a 671 billion parameter model, which makes it possible to run all sorts of workloads on smaller and smaller devices.
Speaker 1:So, and then the mini model is a much smaller model. It's, you know, a quarter the size of the main 5.4 model, but it also is trained to use more compute resources when you ask questions, so it can reason through problems, and that model is suitable for running on edge hardware, which would include phones and other devices that have much smaller memory and computational ability. So I find all of this to be super exciting. It just reinforces the trend line of what we talked about.
Speaker 1:I've been saying this for a few years now that I'm actually more excited about the compact, small, super efficient, lightweight models becoming smarter than I am about frontier models like Cloud37, sonnet, gemini, p5 Pro. Those are awesome. The fact that these super powered models that run only in the cloud are getting smarter is, of course, exciting, but the fact that these small, really efficient models can do so much more is just stunning. I mean, what you have in Phi for Reasoning Plus is better than what you had six months ago in the very best models in the world, and you now can run that on your computer for free. That's a pretty stunning advancement in a very short number of months.
Speaker 3:Mm, hmm, I know you, you and I we like to geek out about all the minute details of all these models, because that's part of our job and I think we just enjoy learning about it. But you mentioned the trend lines and I always think it's important with these model conversations to zoom out a little bit and look at the bigger picture. So what you just said is really profound. But what do you think this trend line means with the smaller, more powerful models, specifically for associations?
Speaker 1:think this trend line means with the smaller, more powerful models, specifically for associations. Well, I think you know. Going back to the last conversation, if there are certain types of data that you have in your organization that you're not comfortable sharing with any of the other any of the model providers Anthropic or Google or OpenAI you can take this FI model. You can run it, even on your own physical hardware if you want to, or you can run it in a virtual private cloud environment one of the major cloud providers, where it's completely contained and as secure as anything else any other computer program that you run. Most people have gotten pretty comfortable about secure private cloud deployment where, in a cloud like Google or AWS or Azure, you can set up resources that are 100% secured and private and, by most measures, far more secure than computers. You run physically, like on your own physical hardware and run whatever programs you want. Right, traditional computer programs and an AI model is just a computer program. It works differently than a traditional computer program but it is a computer program. It works differently than a traditional computer program but it is a computer program and you can run it on hardware. You have absolute control over right. So if you have that ability. It opens up a class of applications that associations have often told me that they are uncomfortable with, which is things related to clinical data that they might've access to if they're a healthcare association, if they're a financial association association, maybe there's benchmarking data that they receive from some of their members that they don't feel comfortable passing to open ai, or anybody else for that matter. These kinds of applications now can be brought into a totally secure environment and run with incredible accuracy. So it opens up a ton of doors. If you're worried about passing your content to an AI system because you worry that they'll somehow subsume your content into their corpus of training data that they'll use for future models which, by the way, little bit skeptical, even if the legal agreement says that it can't be used in certain ways, you might say, okay, well, I'd rather just be totally sure and I'm just going to run this type of model on my own. So it opens up a lot of doors.
Speaker 1:The other thing to think about is independent of the privacy security conversation. Smaller models run faster with less energy, less resources, and are cheaper to run. So if you have a little model like this, that's as smart as what was previously requiring a giant model and you can now run it with a really cost-effective small model. You can do more, right? You might have a hundred million documents that go back to the beginning of your association's formation and you might like to analyze them in all sorts of new ways that you previously would have thought to be totally unattainable. Because you might have said well, we have this idea in mind, where you know, we have a million documents of every paper we've ever published and every opinion that's ever been written on every paper, and we would like to ask certain questions of every one of those papers. Right, have a detailed analysis done of each of those papers in order to capture, like some metadata or some structured insight from all those papers.
Speaker 1:And let's just say, a year ago you thought about this idea. It was a cool idea. But then you're like, yeah, it would cost like between two and $3 to do that per paper and we have a couple million pieces of content. That's just not going to scale. But now if you have a 97, 98% cost reduction, which is basically what you get here, that might cost you a few thousand dollars, right? Or maybe $10,000. You might say you know what? That's actually pretty reasonable. And if you wait six more months, it might be basically zero. So the cost curve compression is really compelling, as well as the privacy it opens up the door to just use way, way more of this inference that we keep talking about opens up the door to just use way, way more of this inference that we keep talking about.
Speaker 3:Amit, this is just not something I'm up to speed on, so I'll ask, in case we have some listeners as well that have the same question. But when you talked about running models privately in your own cloud environment versus running them locally, is one as secure as the other, or is one more secure than the other?
Speaker 1:You know there's pros and cons to each approach. So let's say that I have the old school way of doing it, that in my office I have a computer server and I run that physical server. I am responsible for site security to make sure no one physically enters that location. I'm responsible for network security. I'm responsible for the whole thing. Right, and so traditionally IT departments and associations did that. They would run, you know, they'd have server rooms where they'd have you know, racks of these servers and they'd run them and they were responsible for all of that.
Speaker 1:You know the site security and the digital security and I would argue that, generally speaking, that is going to be less secure than a modern cloud provider that have rigorous, tight, military-grade physical security around their sites way, way more than any association is ever going to have. And from a digital security perspective, implementing your own approach to cyber security is really important for your own resources. But cloud service providers tend to have really really good built-in security architectures that are a good starting point. So I generally am a skeptic of anyone who tells me that they can run a more secure local environment than a well-implemented cloud environment. I think most security experts would tend to agree with that. Certainly for smbs, like small to medium-sized businesses, which associations fit into, there are exceptions to every statement. Obviously, there are some organizations who would argue you know, we have even stronger site security and visual security than any cloud provider. Certain information we have justifies this, and sure, there's always exceptions, but for the vast, vast majority of our listeners in this market, cloud-based deployment is going to work really, really well. It just has to be well thought out. You could create a cloud-based resource that you leave a wide open backdoor to without thinking about where you're just like oh, I'm just going to post a password to my website on Reddit and let anyone log in. I mean, that sounds totally stupid, but the reality is there's all sorts of human factors that go into compromising security all the time. That can affect you either way.
Speaker 1:Local inference on a device that an end user uses, though, is actually really interesting as a complement to that, because, let's say again in the Wells Fargo case, I'm talking to my banking assistant on my phone and I have to get all this information about and I'm talking about, like, my salary or my investment strategy and my net worth and all this other information. Maybe that information isn't really what that local AI about, like my salary or my investment strategy and my net worth and also their information. Maybe that information isn't really what that local ai needs help with. Maybe it needs to help. It needs to help reasoning through some general ideas.
Speaker 1:So, then, guide what the local llm does. So, instead of sharing my salary and my net worth from the local conversation, let's say with the remote ai, what it does is, hey, I'm with a consumer, they're working through these kinds of problems, Can you give me some general guidance on A, B and C? And then the remote LLM throws way more compute at it and comes up with a stronger answer, feeds it back to the local LLM and says, hey, here's the direction you should go with. And then the local LLM then takes that private data, infuses it back into the answer from the remote LLM and gives me an experience that's really really high quality in my phone right and my personal data never left my phone. And the same thing can be done for healthcare, and that could be a compliment that associations can take advantage of.
Speaker 1:Let's say that you're a medical association and you want to provide capabilities for your members to have chats with you that are specific, down to the case level on a particular patient they're working with. You probably don't want any of that. You know healthcare data to ever come back to you, right? So what if you have local LLM that did part of the processing and, just like I said, abstracted out the problem removing patient-specific data, then got a knowledge agent that has a tremendous amount of content and compute capability to formulate an answer and then re-infuse that back with the local data? There's ways to do that as well, and there's applications for associations for sure.
Speaker 3:Yep, I've mentioned my husband's in healthcare and he is just waiting for the day that he has exactly what you mentioned, where he could drop in some patient info and get that resolved with better accuracy perhaps than he could have found doing some searches online.
Speaker 1:I was just going to say one other thing, mallory, that I think our listeners might find interesting. For those of you that have heard me talk about the Acquired podcast before I heard Mallory mention it, we're big fans of the work that those guys do. That's a long form like business history style podcast. There's a new episode they just dropped as of the recording of this podcast in spring of 25. They dropped an episode on Epic.
Speaker 1:Epic is a software company in the healthcare space and what's super interesting?
Speaker 1:There's a lot of interesting things about that particular episode, but there's a lot of talk about AI and a company like Epic and what they're going to do in the healthcare field, and they're by far the dominant player in providing tools like MyChart that patients use and the EMRs, ehrs and billing systems and so forth that hospitals and health systems use, and certainly those kinds of tools will likely soon feature AI capabilities for doctors to use.
Speaker 1:So you might ask the question well, what is the role of the association? How do we provide value when the hospital might have an AI bot built into their secure EHR EMR system? And the answer in my mind is to complement that where you have certain things that nobody else has, particularly your content, and over time you can capture other forms of experiential data that would be unique to you, that could be complementary to what people get out of an EHR EMR. So I think there's actually a very bright story. Whether those things can interoperate and integrate with the experience that a doctor or a medical practitioner member may have is a big question, because companies like Epic, specifically, are famously very guarded about integrations. But I think there's an opportunity here for many associations in a similar capacity outside of healthcare as well, to do things where you can complement a line of business system that your members use every day.
Speaker 3:I'm going to have to tell Bailey about that episode. I've actually gotten him on the Acquired podcast. He listened to the Costco episode as well and really enjoyed it, so he will certainly enjoy the Epic one, amit. My last question was about this trend line. Again, it seems like when you zoom out, we're seeing smaller and smaller models become more and more powerful In your mind. In the next five years, if you could zoom out, do you think we'll be looking at tons of models that are, you know, millions of parameters and like more powerful than we could possibly imagine, I guess. Are we trending toward creating as small and small models as we can, or is there a place for the giant ones and the small ones as well?
Speaker 1:I think it's both. I think that you know if you can further compact these models down to the point where, let's say, we could come up with 12 months from now. You said five years. I don't know that I can think that far out. I think that's next year.
Speaker 1:You're right, that was a hard question say that an equivalent model to the 5.4 reasoning model is available in 100 million parameter or 200 million parameter model. Right, like you know, 10x or even 50 or 100x smaller than the current 5.4 model. That could run in a web browser, that could run on really, really lightweight phones not even like an iPhone 16, but something much smaller than that. And so if that's the case, then now you have really high end reasoning capability in a super compact form. You know you could have it running pretty much everywhere. You can have that capability in your earbuds, you know. So those capabilities becoming smaller and smaller is good. I think what you're also going to see is that the state of the art frontier models will keep getting smarter and smarter.
Speaker 1:You know, one of the stats that I think has been missed by a lot of folks is I think it was the most recent O3 release by OpenAI, and Gemini 2.5 Pro and Cloud 3.7 in extended thinking motor are kind of similar in terms of where they're at, but this benchmark shows that O3 is approximately on par with about the 80th percentile of performance of PhDs across all disciplines.
Speaker 1:So let's unpack that for just a second 70, 80 percentile of PhDs. That means that if you put the average PhD, which is nose-louch, typically right in the middle, that's the 50 percentile. So O3 is at the 70 to 80 percentile of performance of PhDs, and not just in one field but across a number of different disciplines, ranging from history to philosophy to various forms of science and engineering. So it's pretty stunning what you have, and that's an O3, which is a big, heavy, expensive reasoning model. But if you can have that capability distilled down into smaller and smaller and smaller models, even if these models didn't get any smarter right, that's pretty darn smart. And if you make it super, super fast, small, cost-effective, energy efficient, the doors that open up are really compelling.
Speaker 3:That is a great place to wrap up this episode. What would you do if you had all those PhDs at your fingertips running on your phone and your earbuds? I don't know, we might be there pretty soon. Everybody, thank you for tuning in to today's episode and we'll see you all next week.
Speaker 2:Thanks for tuning in to Sidecar Sync this week. Looking to dive deeper? Download your free copy of our new book Ascend Unlocking the Power of AI for Associations at ascendbookorg. It's packed with insights to power your association's journey with AI. And remember Sidecar is here with more resources, from webinars to boot camps, to help you stay ahead in the association world. We'll catch you in the next episode. Until then, keep learning, keep growing and keep disrupting.