Sidecar Sync

Generating Instant Avatars with HeyGen & Anthropic’s New ‘Computer Use’ Feature | 53

Amith Nagarajan and Mallory Mejias Episode 53

Send us a text

In this episode of Sidecar Sync, hosts Amith and Mallory dive deep into two groundbreaking AI innovations—HeyGen’s interactive avatars and Claude’s new computer use feature. They explore how these tools could revolutionize the way associations handle meetings, customer interactions, and even website management. With live examples and hands-on experiences, they assess the future of AI avatars in the business world and break down how Claude’s ability to control a computer can streamline tasks. Tune in to hear how these advancements could transform your association’s operations and member experience.

digitalNow Conference 2024
🔗 https://www.digitalnowconference.com/

🛠 AI Tools and Resources Mentioned in This Episode:

HeyGen ➡ https://www.heygen.com
Claude ➡ https://www.anthropic.com/index/claude

Chapters:

00:00 - Introduction
02:07 - digitalNow Preview
06:38 - HeyGen’s Instant Avatars
10:20 - Real-Time AI Translation and Dubbing Tools
18:21 - Applications of AI in Learning and Onboarding
21:03 - Claude’s New Computer Use Feature Explained
28:52 - The Future of AI-Powered Assistants

🚀 Follow Sidecar on LinkedIn
https://linkedin.com/sidecar-global

👍 Please Like & Subscribe!
https://twitter.com/sidecarglobal
https://www.youtube.com/@SidecarSync
https://sidecarglobal.com

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.

📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.

📣 Follow Mallory on Linkedin:
https://linkedin.com/mallorymejias

Speaker 1:

Now we're shifting to the point where the computers can understand us. As part of understanding us and gaining these new capabilities, they understand our world too. Welcome to Sidecar Sync, your weekly dose of innovation. If you're looking for the latest news, insights and developments in the association world, especially those driven by artificial intelligence, you're in the right place. We cut through the noise to bring you the most relevant updates, with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amit Nagarajan, chairman of Blue Cypress, and I'm your host. Hey everybody, and welcome to the Sidecar Sync, your source of all things AI for the association community. We're excited to be here again. My name is Amit Nagarajan.

Speaker 2:

And my name is Mallory Mejiaz.

Speaker 1:

And we are your hosts. We can't wait to get into our topics for today, which, as usual, are interesting things happening in the world of AI, but how they apply specifically to the world of associations and nonprofits. Before we dive in, let's take a moment to hear a quick word from our sponsor.

Speaker 2:

Digital Now is your chance to reimagine the future of your association. Join us in the nation's capital, washington DC, from October 27th through the 30th, for Digital Now, hosted at the beautiful Omni Shoreham Hotel. Over two and a half days, we'll host sessions focused on driving digital transformation, strategically embracing AI and empowering top association leaders with Silicon Valley-level insights. Together, we'll rethink association business models through the lens of AI, ensuring your organization not only survives but thrives in the future. Enjoy keynotes from world-class speakers joining us, from organizations like Google, the US Department of State and the US Chamber of Commerce. This is your chance to network with key association leaders, learn from the experts and future-proof your association. Don't just adapt to the future, create it at Digital Now. As you all heard from our sponsor, digital Now is upon us next week. October 27th is the official kickoff date. Amit, how are you feeling about that?

Speaker 1:

I'm feeling fantastic. It's so exciting. We've got record-breaking registrations, we've got an awesome venue, amazing speakers lined up. It's going to be a lot of fun.

Speaker 2:

It's been so neat to see it all come together. I'm sure a lot of our listeners can relate. Events can be very time intensive to plan. At the beginning you don't see how it's all going to come together in the end. But, being a few days out, I'm really excited to see our audience, to see all the relationships we've built throughout the last few years and to celebrate the biggest Digital Now we've ever had.

Speaker 1:

It's going to be awesome, yeah, and when you run an event for association people and knowing that association people are professionals in event planning, it raises the bar a little bit in terms of expectations. But it's also awesome because people who attend the event tend to know how challenging it can be to pull together an event even at the scale of something like Digital Now, compared to many of the associations who listen into this podcast or watch us on YouTube have events that are 10 or 50 times the size of Digital Now. But you really want to have every detail nailed down and AI can help with some of that. But at the end of the day, it's people helping people at an in-person conference, which you know.

Speaker 1:

Just as an aside, I'm really bullish on in-person conferences for associations. I think it's one of the things associations do an incredible job with and no matter what happens with technology like biologically, we're wired to want community and connection and having that in person, you know, is something that's kind of roared back to life after COVID and I think is going to continue to be a strong suit. That doesn't mean that associations don't have to adapt their conferences to take into account AI augmentation in various ways. Of course they should and they will, but I'm really pumped about it. I think conference season is just a fantastic time of the year.

Speaker 2:

I agree and I know this week, Amit, you're doing a lot of thinking and reflection about your own keynote. Are there any details or sneak peeks you'd want to share with our listeners?

Speaker 1:

Well, I have to figure out what I'm talking about first. You know it's you know as you know, because you're helping me put it together. It's like I don't like to prepare a keynote too far in advance of when I'm going to be on stage somewhere, because it just ultimately things are changing so fast and some of the things we're going to be talking about next week during my keynote to open up the event are things that have happened in the last week or two. So you know you can definitely count on it being focused on the latest and greatest in AI generally, but what I plan to do is talk specifically about agents. So most of my keynote is about how exponential change is driving the rise of agents and how agents can dramatically affect customer service, both from the perspective of the association in terms of the workload, but also from the viewpoint of the member receiving a far superior experience to what they've ever envisioned possible. So that's part of what I'm talking about and hopefully I'll be setting the stage for keynotes and other speakers to follow me that will be able to weave in other specialized topics, but I think that the world of agents is an area that we've just got to pay more attention to.

Speaker 1:

So much focus goes towards hey, there's a new model released and this model is really cool. Like yesterday, cloud 3.5 Sonnet update came out, including computer use, and that's really cool. But that's just a model, and so I say just a model. It's incredibly powerful and very exciting, but with Agents you can build all this other software around these models and do things today, even with last year's models that are just stunning that people don't realize. So we're trying to cast a bigger spotlight on agents and that's what my keynote aims to do. So if you have not yet registered for Digital Now, you're a slacker, but you can still come, so sign up today.

Speaker 2:

You are a slacker, but you're not the only one. We've seen, as we've mentioned, a lot of registrations come in in the past few weeks. We still have some room. I mean, we're not capping digital now this year. So if that sounds of interest to you, please join us at the Omni Shore Room Hotel October 27th. And, as a reminder, we will be recording the keynote sessions as well and putting those into our AI Learning Hub if you can't make it or are joining from a faraway place.

Speaker 2:

If you can't make it or are joining from a faraway place, today we are talking about two topics. First and foremost, hey Jen's Instant Avatars. I think that'll be a fun conversation. And then we're also talking about exactly what Amit just mentioned Claude's computer use feature that they just rolled out. So hey Jen Instant Avatars. To remind you, haijin is an AI-powered video creation platform that specializes in generating realistic video avatars. If you have joined us for our Intro to AI webinar, you'll know that I typically show a video from Haijin that actually Amit created, translating his video and his voice into different languages. Their interactive avatars create AI-powered digital versions of individuals that are capable of joining Zoom meetings and other live interactions, mimicking the user's appearance, voice and decision-making style. More on that later. The avatars are designed to look and sound like the user, with the ability to think and respond in a way that reflects the user's personality and preferences. Users can input specific information, like company data or brand guidelines, to ensure the avatar accurately represents them or their organization. These avatars can be used for various applications like customer support, online coaching, sales calls, job interviews, language learning and even therapy sessions. This comes from Hajin, just so you know. The avatars integrate with Google Calendar for scheduling and can connect to large language models via API for enhanced interaction capabilities. So I'll let you all in on a secret.

Speaker 2:

Our plan for today's episode was actually to have one of these interactive avatars join us on this Zoom call. I tested it out and I'll let you all know. We were not impressed enough to actually include that on the pod, so I didn't create an avatar of myself or Amit. I thought that would be a little too weird. I used one of their avatar templates and I trained it up on the Sidecar website and I provided it a prompt that I came up with using Claude. Actually, that said, you're a seasoned technology strategist specializing in AI implementation for professional and trade associations, so on and so forth with different areas of expertise. I then had the avatar join a Zoom call with me and then recorded and shared that with Amit, but overall not super impressed.

Speaker 2:

I was able to interact with the avatar and say, you know, welcome to the Sidecar Sync podcast, and it said, oh, I'm glad to be on this podcast. And then it was able to talk about AI for associations, able to discuss some of the sidecar offerings around AI education. I can surely say this is not going to replace me or any of you at this moment on a Zoom call, on a customer service call, on a sales call or a coaching call. That's not to say that won't be the case in a few months, maybe not weeks, but maybe a few months. So, amit, I'm curious, what do you have to say about Hajin's interactive avatars?

Speaker 1:

Well, my point of view is that the I mean what you shared the other day is that little video of you having a chat with the avatar through Zoom was really impressive, on the one hand because it was near real time video generation responding to you.

Speaker 1:

I thought the quality of the content in terms of what it was saying was reasonable. They're obviously using some strong LLM underneath the hood to do that. I thought the quality of the synthesized audio was pretty poor and it also is not really optimized for giving shorter and sweeter responses, as opposed to more of like a text-based response where they ask a question like hey, what's the best way for my association to start using AI? And the avatar just kind of goes on and on and on. So I think that that part of it was less conversational and more speechy and the quality of the audio wasn't as great as like what I've seen from 11 Labs or ChatGPT's Advanced Voice Mode, which is also available through their real-time API. Now, that being said, hey, jen's video is head and shoulders above everybody else's in terms of real-time time avatar like stuff. It's not trying to compete with video models like Runway or things like that or Sora. It's not in that realm, even though it's video. It's more about, like you know, business use cases where we're saying, hey, we're going to create an avatar or an avatar based on a video we provide. Right, create an avatar or an avatar based on a video we provide. So it's really really good at that the fact that it's real-time now versus previously. If I wanted to get my AI avatar to say something, I'd give it a text script and then I'd wait somewhere between 10 to 30 minutes and then HeyGen would say, hey, it's ready. And then you'd download it and you'd view it, and it was usually fine A little bit glitchy here and there, but it was usually fine. Now, a year later, you know you can do near real time conversations. That what it really tells you is the pace of progress is crazy. To your earlier point, mallory, I anticipate in the next three to six months that the quality of all the stuff that I just criticized will be radically improved.

Speaker 1:

You know, when you have something like 11 Labs, for example, I'm a big fan. I use their tools all the time when I want to either do text to speech generation or speech to speech dubbing, where, for example, I record something but, for whatever reason, I don't want my voice to be the voice in a presentation or a video or something like that. Well, I can take just the audio clip, upload it to 11 Labs and get an awesome dubbing in a synthesized voice, and the same thing with text-to-speech. It's really, really good. The advantage, by the way, of doing audio-to-audio is the timing, the intonation, all that stuff is coming from the information in your voice and then it's just translated into somebody else's voice essentially, or an AI's voice, else's voice essentially, or an AI's voice, whereas with text to speech, you can't really give it that much information to generate the right timing and the right sequence and the right intonations and all that.

Speaker 1:

But in any event, the point about it is 11 Labs is a super specialized audio platform and it's awesome. It's way better than HeyGen's audio. But it's kind of like saying listen, do we have specialization, so much so that we need to use these different things and stitch them together? I suspect that because everything is improving so fast, heygen, being a video platform, will get very good audio. It's kind of like saying this like you know, if you were to say OK, well, boeing has an airplane, that's a jetliner that cruises at 550 miles per hour, a jetliner that cruises at 550 miles per hour, but Airbus has a hypersonic plane that can get you from Paris to New York City in two hours. You know there's a marked difference there between the two. It makes the Boeing jetliner seem like it's super antiquated, even though it's, you know, at the moment state of the art right.

Speaker 1:

But the point would be that, you know, similarly, the audio in HeyGen's product seems clunky compared to 11 Labs. But you know, if all the airliners are kind of cruising at Mach 5 or whatever, then it's going to be very different and like, the incremental difference is so small you wouldn't notice. So my point is, heygen's audio is going to get a hell of a lot better really fast. I would say that for now I wouldn't probably invite a hey Jen avatar into my conversations people. But that's exactly where this stuff is going to go Within the next few years. Certainly you're going to have all sorts of AI to avatars joining meetings. You might even have an avatar of yourself joining a meeting on your behalf, which is maybe scary, but that's all going to be happening.

Speaker 2:

Certainly exciting stuff on the Hagen front. In terms of creating the avatar, it was a bit like creating a custom GPT up. That said, if you assign your avatar a name so let's say Susan, for example, every time your avatar responds it might say my name is Susan and here's everything about AI for association. So it just seemed a bit clunky all the way around. Our avatar was named Wayne. I did not change the name, so we didn't encounter that issue. But across the board, I think the only thing that was surprising to me and I don't know what your thoughts are on this, amit were just how all of the marketing around it was so intense. Like you're going to send this avatar to a Zoom meeting on your behalf. It's going to make decisions like in your decision-making style. I think we're a ways off from that.

Speaker 1:

Yeah, I agree, we're, you know, two or three kind of leaps away from that which you know might be two or three kind of AI cycles, so maybe 18 months, 24 months, something like that. And then there's even if the technology was there, if you just kind of like, let's just say that we presuppose that in 24 months time we have an AI avatar version that is as good as you could possibly want, right In terms of quality and just accuracy of answers, all that stuff. The question then is do you want that or not? Do you actually want that? Do you want to use it as an additional participant in the meeting? Do you want it to attend a meeting on your behalf? If you can't make it, I don't know. I think those are really interesting questions to ask.

Speaker 2:

Well, that was actually my next question for you, amit was in this ideal world, latency is reduced. Quality of the avatar shoots up. Do you right in this moment, do you see that as a path that business leaders will go on? It seems kind of hard to grasp that I would send an avatar to a meeting in my place.

Speaker 1:

I could see myself having an avatar for an AI assistant, where there's some kind of visual representation of my AI assistant that knows me really well, but it's clearly a different entity than me. I don't personally think I'll ever feel comfortable with the idea of a version of me interacting with people, Maybe if it's known to be like, hey, this avatar can answer questions or whatever. But it's clearly just an AI that's trained on content. But you know, people very quickly get over, lose track of the fact that they're dealing with AI and you know if they see you and they hear you, they're going to quickly think it's you, and so I'm not comfortable with that. No matter how good the AI is, I'd rather be in the meeting, and so I'm not comfortable with that. No matter how good the AI is, I'd rather be in the meeting.

Speaker 1:

But I do think that what's going to happen is AI avatars of your assistants. You'll be able to kind of shape and mold them to whatever you want. They could be some alien looking creature or they could be a person or an animation or whatever, and you could include those in your meetings. And one of the things that may have to happen is if you want to sell me something. Instead of getting a hold of me, you'll always have to go through my gatekeeper, which is my AI assistant that knows me really well and the goal of that assistant isn't to block all potential opportunities, but to really intelligently filter things out and knows me well enough to know what I actually want. To take a look at.

Speaker 2:

I agree with that. I think if I ever were to meet with a Meath avatar at this point in time I wouldn't take anything it said seriously because I'd be like, well, what does a Meath really think about what a Meath avatar has to say? So I agree with you. Or perhaps an interesting use case in the future would be some sort of like an avatar that takes the counterpoint to whatever you're talking about in your meetings, just to help you kind of brainstorm and flesh out ideas. That could be interesting.

Speaker 1:

Totally, and then an avatar for, like, a really great facilitator who can ensure that everyone in the conversation has had an active opportunity to participate, using things like personality profiles, for example, to employ techniques like saying, hey, the person that has, you know, usually it's the people that have kind of the you know, high extroverted kind of character traits that tend to be the ones that talk the most in a meeting. That doesn't mean they have the best ideas, but a lot of times they just consume all the oxygen in the room just as soon as they walk in, and so potentially, there's help on the way for that. If you have an AI avatar, that's job is to facilitate the meeting and try to pull ideas out of everyone. That could be super powerful and valuable too, just to remind everyone that there's more people in the room that may, you know, jump into the speaking ring right away. I think it's an interesting technology also in the world of associations, in the context of learning. So you think about, like Sidecar's AI learning hub and think about all the different courses that are there, and then think about, like, if you had an avatar available throughout that experience, that you could just have a voice video to video, voice to voice conversation with about the content, about any topic, and it was an avatar of the instructor for that particular course. That could be very interesting.

Speaker 1:

Now, it may be better to choose an avatar that's not the instructor, but some animated cartoony thing that's like hey, I'm the AI assistant to this instructor and I'm here, I know all the material, I'm here to help you, and that way it's clearly not that person. That's probably more comfortable, but still very useful. So you know, I think that's a way of adding a real time element to asynchronous learning. That could be very powerful. It could also be something where it's like hey, I'm having a hard time with this particular concept. I don't really understand how vectors work. Can you walk me through it?

Speaker 1:

And then that avatar is in the context of your learning experience, knows where you're at in your learning journey based on your LMS data, knows all the curriculum in the LMS and can give you really guided help. Knows all the curriculum in the LMS and can give you really guided help. And we talked about similar ideas in the past in the world of AI education or AI for education. I should say where you think about what the Khan Academy has done with Conmigo. It's actually somewhat similar. That's just a text thing versus video to video.

Speaker 2:

Oh, that would be an excellent use case for us to roll out an avatar for each course. I'm going to add that to the docket.

Speaker 1:

I bet most LMSs that are investing in R&D anyway which is not every LMS, but most LMSs that are looking ahead are going to add features like this.

Speaker 2:

For all of our listeners and viewers. I would say at this point, unless you just want to try it out for fun, heygen Interactive Avatars is probably not something you're going to be using in your meetings regularly, but certainly something to keep an eye on in the next few months.

Speaker 1:

Yeah, I agree with that. I would just say I do think checking out HeyGen and there's a few other tools like it, but HeyGen is the one that seems to be the market leader for video translation is definitely a use case. I'd encourage people to go try. If you haven't yet translated a video of yourself speaking in whatever your native language is into another language, go try that. Just do a 30-second video on your phone, upload it to HeyGen, ask it to translate it into some other language and see what happens. It should blow your mind. If you haven't gone through that yourself, it just opens up your brain to the idea that these tools exist. The AI avatar feature is their newest feature at HeyGen, and that one is what Mallory and I are saying is perhaps not quite ready for primetime from our viewpoint, but you know, if you go check it out and you have a different point of view, shoot us a note. We'd love to talk to you about it.

Speaker 2:

Yep, we might bring you on a future pod episode. Just a warning, all right. Next up we're talking about cloud computer use. Anthropic introduced a new feature called computer use for its AI model, cloud, which allows the AI to control a computer in a manner similar to human users. This feature is still in the public beta phase and is available through Anthropic's API, and it enables cloud to perform various tasks by interacting with computer interfaces, such as moving a cursor, typing and executing commands based on visual inputs from the screen. Claude can perceive the computer screen, identify elements and interact with them by moving the cursor and clicking based on pixel positions, which allows the model to use everyday software and tools without requiring task-specific programming. The AI can perform complex tasks autonomously, like filling out forms by gathering data from spreadsheets and CRM systems, or navigating web browsers to complete coding tasks.

Speaker 2:

Despite its capabilities, claude's computer use feature is still experimental and can be error prone, and its ability to use computers is not yet on par with human proficiency. On evaluations like Osworld, its scores a 14.9%, whereas typical human scores range from 70 to 75%, but it's nearly double the score of the next best AI model in the same category, which scored 7.7%. So, amit, you shared this with me. I think it just dropped yesterday, so this is like hot off the press for our association listeners. What do you see as some near-term use cases for computer use?

Speaker 1:

Well, you're right that at the moment, these models aren't as good as a human in terms of their use of a computer, but they are really good in certain narrower contexts and they're also good at learning over time, as you tell them. Well, no, not that I meant this. One of the things I think that's going to be really powerful for this is improving quality assurance. So think about your association's website and the number of times people tell you that there's a broken link on it or that a particular feature doesn't work. You do QA. Typically, when you have a new system that you're putting in, like a new e-commerce system or a new LMS, you'll do a bunch of QA on it. And then hopefully, hopefully you're doing a bunch of QA on it before you go live with it. And then, once you go live, that kind of goes away and you just start incrementally changing it, and so then you add this button and that button and this image and that image and new pages, and all this stuff is happening and over time it's like a death by a thousand paper cuts scenario and then your website no longer works well, it's slow. So there have been automated quality assurance tools for years and there's companies and the whole industry around this. But those tools have been the domain of QA engineers. These are very technical tools. They're super expensive Like a good QA engineering kind of tool set can cost into the many tens of thousands of dollars per person but they have been able to create automation scripts and this is for years now that are very, very valuable and that's why companies pay for those. But with this kind of advancement in AI, both the capabilities are better, but it's also available to anyone. So you could, mallory, go to Claude's new computer use and say hey, this is our Sidecar AI Learning Hub LMS site. I want you to log in and I want you to test all the courses I want you to go through and make sure that the course links work, that the videos size correctly. I want you to do this on desktop and I also want you to simulate mobile use using this tool and you can give Claude the tools Now. If you do that right now, I suspect the the current version of cloud would probably not be able to do the whole thing, but very close, and it's going to keep improving at this pace where you know you could put in place that type of an assistant that just every day is running that qa on a continuous basis and sending you feedback saying hey, mallory, there's this broken link on this page. So there's stuff like that.

Speaker 1:

The broader way to think about this is for a long time we have had to adapt to the interface of the technology. So think about going way back in time and say well, originally, mainframe type computers, you actually didn't even interact with the keyboard. You would load your program with punch cards and these were literally perforated. You know thicker pieces of paper cards that you know that's how you would encode a program using special machines. I never worked with punch cards I'm not that old but my dad told me a lot about it, and so he actually did that in his college work and loaded the punch cards for his PhD thesis and ran them on a computer. And that is a great example of an incredibly inefficient way of communicating with a machine Right. And then from there we went to you know little green screens and keyboards, and then from there we got this thing called a mouse, which was a pointing device where we could tell the computer what part of the screen we wanted to interact with, and graphical user interfaces blew up from there. And then, with mobile, we had touch capabilities, multi-touch capabilities, and now we're getting into audio and video. So our interfaces with technology keep getting better, and that's what made ChatGPT in 2022 such a revelatory experience for people, because for the first time for most people, they could interact with a computer in their choice of language, as opposed to having to use the computer's preferred way of interacting, whether that's punch cards, keyboards or, if you want to make it do something more complex, programming code. Now we're shifting to the point where the computers can understand us, and that's kind of cool. But the other thing that's interesting and this comes back to this topic is that, as part of understanding us and gaining these new capabilities, they understand our world too, including all of the devices we interact with, and the computer screen is one of the great examples is if the computer can see the computer, and then this AI can control the computer.

Speaker 1:

It has access to every piece of software that's ever been written. It has access to everything right, and it already has the world's knowledge, and so it's a very powerful idea, because not all softwares are going to be available in a technical way, like via API, where the AI can interact with it, but there's lots of software that is used across so many businesses Like. Imagine if your association had a legacy AMS and that AMS has been in place for God knows how long, but it basically works. It might be really kludgy, but it works, and you don't want to spend millions of dollars in two years or longer to replace it. But what if you could use AI, using computer use, to automate processes? Right, and then you use the AI essentially as a way of extending the life of a legacy tool. There's a lot of examples like that that come to mind.

Speaker 1:

The broader concept again, though, is computers using our interfaces to connect with and extend use cases. It's another reason why humanoid robots are such a big thing. The whole world. We've crafted it in a way that basically interacts with us, so we have two hands and two feet and legs and all this other stuff. So if we can make robotics that are similar to us, they just plug right in to everything else. So that's the idea behind this in my mind is it's an accelerant. It's basically opening up a whole bunch of new doors for the AI to walk through.

Speaker 2:

If you check out the demo video, which we will include some of those in the show notes, you can see that Claude is actually taking screenshots of your screen over and over again and that's how it's figuring out, kind of what to do next and what to click and how to fill out these forms, which seems a little bit clunky. So to your point, amit, interfaces have gotten better for us throughout history, but then I'm wondering do we now go backward with AI or not, or backwards for us in the sense that it doesn't seem like an AI needs a computer right to interact with all these things. You mentioned APIs being an option, but do you see, do you think that AI will eventually go away from using our systems and kind of create more of a direct path?

Speaker 1:

I think there's definitely a possibility. I mean, AI systems left to their own devices will actually create their own languages, you know, because English is pretty inefficient. But I don't know if we want that or not. I think there may be value in always forcing the AIs to go down a path that we can interpret and understand Well an API calling an API.

Speaker 1:

For example, let's say I have a more modern AMS and I want to create a new member in the database. If I have an API to do that, that's a structured way of connecting to the AMS programmatically. I could connect that to Zapier, I could connect that to a CDP, I can connect that to whatever, and AI can, of course, connect to APIs as well.

Speaker 1:

That's going to be a higher resolution way of doing it, if you will or a higher probability of success kind of way of doing it, but there's a lot of things in the world that don't have those kinds of capabilities. Or you have systems that may have APIs, but the API is very limited and the user interface can do a whole lot more. So there's a ton of really interesting business use cases around this. I also think that, independent of the computer doing the driving, think of it this way too. If you enable a tool like this to watch your screen and watch you work while you're perhaps talking to it continuously speech-wise right, Forget about video, but just audio. And I'm talking to Claude like I'm talking to you, I'm saying hey, Claude, listen, I'm about to work on this Excel spreadsheet. I'm going to do this and I'm stuck here. What do you think's wrong? And then Claude says oh well, that cell it should be this, Let me change it for you. And it changes, so you're kind of like working in parallel. It's almost as if you imagine hey, I opened up a Google Sheet and you and I are on an really interesting.

Speaker 1:

There are limitations right now, partly the model's intelligence. Part of it, though, is the latency, with the way it's doing what you described, taking this sequence of screenshots, which becomes kind of like a children's flipbook of what's happening. Animation-wise, it's very low res, but that's because there's some major limitations with vision models right now. All that's changing. We're on this accelerating exponential curve, and so vision models today, the way you can efficiently interact with them is to pass them smaller images that they can process, make sense of and then drive the next thing that the model does.

Speaker 1:

Over a period of time, you're going to have a real-time, high-resolution video feed going into the model. It's going to have complete access to that on a continuous basis and we're not going to even think about it, Just like we're using Zoom today and we have HD video in both directions and we're not thinking about that as a remarkable thing, even though it really is. So we will soon have that as well, where the AI will be with us with continuous video. It's like 30 frames per second or higher, as opposed to like one frame every three seconds, which I think is what this thing's doing right now.

Speaker 2:

And think about how powerful for onboarding new team members to just have the AI kind of follow you throughout your work for maybe a few weeks you would have the ultimate onboarding avatar, even potentially that you could use through Haygen.

Speaker 1:

Well, and now imagine a version of this type of technology that is your association's AI avatar. That is perhaps the video part, perhaps not, but is capable of doing things like computer use in combination with knowing everything about your association. So let's say, for example, you are a CPA society and your members are accountants, what if your organization had an AI assistant that was capable, if the individual authorized it, to be able to kind of be part of their computer session, and your AI agent, of course, knew everything that the CPA society has training on, and which is a vast amount of content, and so that becomes a more powerful agent to potentially help. And so I think it's almost like you think of the AIs as different collaborators. I might say, hey, I'm going to invite this CPA Society's AI agent to hang out with me right now because I'm working on this report and I want that level of expertise. I just don't want the generic clod hanging out with me because that's not quite at that level. Or maybe it's highly specialized software used for taxes or something.

Speaker 1:

I'm kind of making this up. But every domain has uniquenesses to it and associations living in those veins where they have that deep domain expertise. But if they can essentially activate that expertise through new modalities and new channels like this. It gets more and more exciting, so I see this being super relevant to associations in terms of internal use, where you can think about things that you do repetitively or you should be doing more like the QA stuff I mentioned earlier and use this type of technology, play with it to try to automate some of that to save yourself time. And then the other side of it is think about how this affects your members and think about ways you could potentially help those members use this technology. In some cases, it might just be training them on it, providing them an ability to learn this stuff through focused training that's relevant to them in their context. In some cases, it might be custom tools.

Speaker 2:

I think I'm in AI agent mode because of your keynote session, amit. But I'm curious here we have a model FOD that can take action through a feature called computer use. Are we talking about an agent here, or would you still call that something separate from an AI agent?

Speaker 1:

I think it's kind of semantics at some point where there are things that computer use 100% is agentic in the sense that it has access to tools, it's able to interact with your computer, it's able to kind of semi-autonomously do things with your computer. So, but it's packaged within the model. Now, is cloud 3.5 Sonnet capable of doing that as a raw LLM? No, and LLM, lvm, you know, basically a model. It's. There's software engineering on top of that that's allowed cloud to see your screen, take the screenshots, pass them in. So there's definitely an agentic layer, even though it's being kind of packaged into the cloud offering.

Speaker 1:

But the reason I say it's semantics and you're just kind of I don't know that we ultimately are going to care too much, is that these are just going to be called systems. They're AI systems. There are one or more models involved and kind of the agentic sauce that surrounds those models will be a lot of times the things that actually drive the real world value. But I think the main differentiation between the two is that models are passive and waiting for you to tell them that you want some specific thing and that agents can be active and take action on your behalf either autonomously or semi-autonomously, and that's the definitional break I think that most people use.

Speaker 1:

It's certainly how I think about it, but I think ultimately it's not going to matter. It's just going to be AI solutions or AI systems and you're going to have bits and pieces coming together. It's just like you know. Think about a system you're more familiar with, like a CRM software. Within the CRM software, you have databases, you have web servers, you have user interfaces. You have all these. There's network elements, there's all these different components that you pull together.

Speaker 2:

The same thing is true in this world too. That makes sense. So essentially, as we've talked about with Microsoft Copilot, wave 2, with Claude's computer use, we really have the ability to create agents or agent-like systems at our fingertips. So I think really the big question here is what are people going to do with it? What are associations going to do with that?

Speaker 1:

But even our individual listeners, yeah, what I'd love to be able to do with the earlier example is say hey, Claude, hang out with me, we're going to do a QA testing session on this website. And now we know, okay, the Sidecar AI Learning Hub is awesome, we tested it. Now, hey, Claude, listen, what I want you to do now is to create a version of yourself that can rerun that exact set of tests as frequently as I'd like you to run it. And then Claude says no problem, Claude probably has to go write some code to do that, which Claude can do. Right, All these models can do that. Well. And then test it and say does that test replicate what our session had? And if so, then it's like, yeah, it's ready to go. And so we'll say that's QA Cloud. And so then I'll say to QA Cloud hey, I want you to run QA Cloud every day and send me a result, a report result, and tell me if there's any issues that were found and classify them as high priority, medium priority, low priority. And then, if you have these issues and you know what the problem is, recommend what the remediation is.

Speaker 1:

And then the next step beyond that is to say, well, in certain categories where it's like saying oh, there's a broken link. Well, if we so, authorize Claude or mini Claude or QA Claude or what we want to call them, we can say, well, here's our admin access to this particular account. You can go in there and change that if you find a broken link. And of course every level of this, you have to trust the technology more and there's varying levels of comfort for that. But you can envision a thing where it's kind of like self-healing, where it goes from QA to actually making the repair. So maybe you still get notification that something was corrected but maybe you don't. You know, modern computer systems constantly have all sorts of issues and there are many self-correcting mechanisms in places and databases on networks. I mean, the entire design of the internet is a continuous self-healing, self-correcting architecture and so that's how a lot of modern systems work. So I could see that being applied to association business processes.

Speaker 2:

I don't know if there are all that many people out there who like QA I know that I do not. It's kind of tedious, but I'm just thinking how exciting it is to live in a time where all these things that are pain points, tasks we don't enjoy, doing tasks that are really tedious AI is going to be able to do for us, and thinking about all that time it's going to free up for us to do the things we enjoy and the things that are more human yeah, and another element of qa is get the ai to be like a simulated student to take your course end to end and actually consume the content, give you feedback, attempt to attempt to answer the questions.

Speaker 1:

you'd have to do a lot of work to make sure the ai doesn't use its prior training, uh, and knowledge, to answer questions in an exam, for example, but or, or what? Do you cap it off and say, hey, you're capped off at a 12th grade education or whatever. There's all sorts of things you could theorize around that, but there's just an enormous number of things you could do with this technology and I think the feedback opportunity is tremendous.

Speaker 2:

All right, everyone. Thanks for tuning in to episode 53 of the Sidecar Sync podcast. We will see you all next week. Host Digital Now.

Speaker 1:

Thanks for tuning in to Sidecar Sync this week. Looking to dive deeper? Download your free copy of our new book Ascend Unlocking the Power of AI for Associations at ascendbookorg. It's packed with insights to power your association's journey with AI. And remember Sidecar is here with more resources, from webinars to boot camps, to help you stay ahead in the association world. We'll catch you in the next episode. Until then, keep learning, keep growing and keep disrupting.