Sidecar Sync

Exploring AI Audio – Part 1: Fundamentals & Essential AI Audio Tools | 54

Amith Nagarajan and Mallory Mejias Episode 54

Send us a text

In this special two-part episode, hosts Amith and Mallory delve into the transformative power of AI-driven audio for associations. From voice synthesis to real-time translation, this discussion sets the foundation for an AI audio strategy, highlighting the latest advancements in voice technologies that improve accessibility and engagement. With tools like Eleven Labs and ChatGPT's Advanced Voice mode, Amith and Mallory explore how audio allows associations to communicate more dynamically and how it stands to change member engagement. Stay tuned for part two, where the focus shifts to practical association-specific applications!

🔎 Check out the NEW Sidecar Learning Hub:
https://learn.sidecarglobal.com/home

📕 Download ‘Ascend 2nd Edition: Unlocking the Power of AI for Associations’ for FREE
https://sidecarglobal.com/ai

🎬 Sidecar Sync Ep. 47: Project Strawberry, Hugging Face Speech-to-Speech Model, & AI and Grid Infrastructure
https://youtu.be/lXcnx_HECes?si=7kVStK8JO09Qylfo

🎬 Sidecar Sync Ep. 47: Previewing digitalNow 2024, Google NotebookLM, and xRx Framework Explained
https://youtu.be/bpINBxVSM4s?si=vTx01nZyUWde22MT

🛠 AI Tools and Resources Mentioned in This Episode:
Eleven Labs ➡️ https://beta.elevenlabs.io/
ChatGPT Advanced Voice Mode ➡️ https://chat.openai.com/
Hugging Face Speech-to-Speech ➡️ https://huggingface.co/
Google Notebook LM ➡️ https://labs.withgoogle.com/notebooklm

Chapters:

00:00 - Introduction
02:45 - The Impact of AI on Communication in Associations
05:24 - Audio vs. Text: Conveying Richness with AI
08:48 - Key AI Audio Technologies Explained
11:40 - Barriers to AI Audio Adoption for Associations
16:30 - Real-Life Examples and Successes with AI Audio
21:36 - Notable Advancements in AI-Driven Voice Technology
28:16 - Demo of Betty, the AI Knowledge Agent with Voice
33:49 - The Future of AI Audio in Associations

🚀 Follow Sidecar on LinkedIn
https://linkedin.com/sidecar-global

👍 Please Like & Subscribe!
https://twitter.com/sidecarglobal
https://www.youtube.com/@SidecarSync
https://sidecarglobal.com

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.

📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.

📣 Follow Mallory on Linkedin:
https://linkedin.com/mallorymejias

Speaker 1:

Associations tend to have infrequent but deep engagement, which is not bad. They just haven't created a value proposition that justifies their existence in the lives of their members on a daily or weekly basis. Welcome to Sidecar Sync, your weekly dose of innovation. If you're looking for the latest news, insights and developments in the association world, especially those driven by artificial intelligence, you developments in the association world, especially those driven by artificial intelligence, you're in the right place. We cut through the noise to bring you the most relevant updates, with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amit Nagarajan, chairman of Blue Cypress, and I'm your host. Welcome to the Sidecar Sync, your source for all things artificial intelligence intersected with the world of associations. My name is Amit Nagarajan.

Speaker 2:

And my name is Mallory Mejiaz.

Speaker 1:

And we are your hosts. Now we have a real treat for you. In this episode and the one that will follow, we're going to be talking all about audio, a modality that has enormous potential with artificial intelligence, ways to communicate with people in just truly incredible new ways, and we've broken it up into two episodes. The first one, which you're listening to right now, is all about the foundation. It's about audio and the current tools as of late 2024 that we're really watching, that are foundational to thinking about your audio strategy with AI. And then the second half of this episode, part two, will air a week later, and that episode will go into specific use cases and specific tools. So that's what we have in store for you today, and before we get into the meat of the episode, let's take a moment to hear a word from our sponsor.

Speaker 3:

Introducing the newly revamped AI Learning Hub, your comprehensive library of self-paced courses designed specifically for association professionals. We've just updated all our content with fresh material covering everything from AI prompting and marketing to events, education, data strategy, ai agents and more. Through the Learning Hub, you can earn your Association AI Professional Certification, recognizing your expertise in applying AI specifically to association challenges and operations. Connect with AI experts during weekly office hours and join a growing community of association professionals who are transforming their organizations through AI. Sign up as an individual or get unlimited access for your entire team at one flat rate. Start your AI journey today at learnsidecarglobalcom at learnsidecarglobalcom.

Speaker 2:

Amit, how's it going today?

Speaker 1:

It's going great, you know. I know we're recording this a little bit earlier than usual, since we will both be at Digital Now next week and we will not have time to record the usual pods. So we're recording this two-part series today, just ahead of Digital Now, and I'm looking forward. I'm flying tomorrow and looking forward. I don't ever really look forward to flying, but I'm looking forward to getting to DC. How about you?

Speaker 2:

Indeed. Yeah, we are filming this early, so right now it's the. It's the Friday before Digital Now. By the time you all hear this episode, digital Now will be over, which at this point in time I can't really see the light at the end of the tunnel. But it's crazy to think that in just under a week this will all be wrapped up and I'll miss it so deeply. But yes, I'm excited to get to DC. I leave tomorrow as well. I think I get there in the afternoon and then we'll kind of kick things off Sunday morning with setup.

Speaker 1:

Mallory, I was asking you right before we started recording if you are an aisle seat or a window seat person and you're starting to tell me.

Speaker 2:

I am. You know, even though I'm on the shorter side, I do think I'm an aisle seat person. I think you're also an aisle seat person, amit. I think I just don't like how to crawl over people. So for me that makes the most sense, and I feel like I always have a ton of stuff in my hands as well that might just be a me problem but like a coffee cup and a purse and a backpack. So I feel like when I have the aisle seat, I can set all of that down and then put my luggage up and I'm comfortable. What about you?

Speaker 1:

I am normally an aisle seat person. However, if I'm on an overnight flight, I like the window, because then I can settle in and not be disturbed by anyone else. But, um, you know, I have a good friend who is actually a middle seat person.

Speaker 2:

No, yes, couldn't be. What does that mean? What does that mean?

Speaker 1:

So this guy is interested in people. He's a student of humanity, you could say, and he's retired and a brilliant guy, and he loves to talk to random people and hear their stories, and so he chooses middle seats so that he has two people to talk to.

Speaker 2:

Yeah, that sounds just like you, amit. I feel like you must be the same way right On flights.

Speaker 1:

Oh yeah, you know me, I'm the social butterfly just talking to lots of random people all the time.

Speaker 2:

Oh my gosh. Well, after this recording, you're going to have to tell me who your friend is so I can make sure. No, I'm kidding that I'm not next to him on a flight, but typically on flights I am putting on the headphones. I'm really not trying to have conversations but, you know. Every now and then you do meet someone interesting, so it's important to be open.

Speaker 1:

I appreciate that you ought to enjoy this guy. This guy is a brilliant, brilliant guy. He's a PhD in mechanical engineering. He went on to have a tremendous career in the computing industry and then beyond that, he actually went on to do stuff in healthcare, and on top of all that he's a standup comedian.

Speaker 2:

So um, well then, maybe, maybe I will. Maybe we should bring him on the pod, Amit. Maybe that's what should happen. Yeah, maybe we should.

Speaker 1:

That would be fun.

Speaker 1:

Well, in any event, yeah, people come in lots of varieties and a lot of our expressiveness and intelligence is conveyed through audio, so text loses out on a lot of what we are able to communicate. In fact, if you took the transcript of this conversation and simply read it, it wouldn't quite have the same effect. So I'm really excited about this episode, or these two episodes, I should say. Now. You and I have been talking about audio as a modality and how exciting it is to have democratized access at scale to unlimited audio consumption and audio creation. So can't wait to get into this with you.

Speaker 2:

Yep, as you said, in this part one episode we will be kind of laying the foundation to make sure we all understand what we mean when we're talking about AI and audio, and then we'll talk about some tools that we are keeping a close eye on Amit, I know some that you're a big fan of and then in part two, we will talk about specific use cases for associations.

Speaker 2:

So to kick off today's part one talking about foundations, audio is fundamentally more information dense than text, like Amit just mentioned. Think about how much meaning we derive from a single spoken sentence, not just the words, but the speaker's emotional state, their level of confidence, their geographical background, even their physical state. A simple okay can mean drastically different things depending on tone, pitch and timing. We instantly understand if someone is tired, excited, skeptical or sincere all from subtle variations in their voice, and this richness of information makes audio both powerful and challenging for AI to process. I think back just to the top of this episode when I said oh, amit, that sounds just like you, right, talking about the middle seat thing. If an AI had read that conversation text only, they probably wouldn't have picked up on my sarcasm.

Speaker 1:

Exactly.

Speaker 2:

At its core, ai audio processing involves three main capabilities. First, their speech to text, where AI converts spoken words into written text, like transcription. Then there's text-to-speech, where AI generates spoken words from text, creating synthetic voices. Finally, there's speech-to-speech, where AI can transform audio directly, like changing a voice or translating between languages. In real time. These capabilities combine in real time. These capabilities combine in fascinating ways. Some AI voice interfaces convert your speech to text, process that text and then convert its response back to speech. The real breakthrough has been in speech-to-speech transformation, which can modify audio directly instead of going through text, preserving crucial layers of meaning that make human speech so rich. So, amit, you have been talking about audio for a long time. I think this is something even I don't experiment with quite as often, or you have to remind me to try out Advanced Voice with ChatGPT, which we'll talk about. Do you recall any wow moments that you've had recently, or maybe all the way back to a few years ago, when you first started using this?

Speaker 1:

You know, going way back to when I was a kid, I remember the very first home computer we had that was a real computer was a Mac and it was the old old days of the Mac. This did not have a hard drive, it had one meg of RAM and it had a floppy drive. But on that computer there was a little program where it could speak very, very crudely a sentence that you put in and it sounded horrendous. But it was just amazing that a computer could actually speak based on any text that you put in. It was super slow and super terrible.

Speaker 1:

It'd be fun actually to dig up an old, early Mac text-to-speech demo. I bet there's something on YouTube we could splice that in here to illustrate. But what really got my attention was, you know really, the natural way that speech sounds when you generate it 11 Labs specifically, which I know we'll talk about in the next section or the next part of this two-part episode. But that particular product blew my mind when I first used it because it really sounded like a person. The quality, the tone, everything it was just right on and it's getting better and better and so that's really what got my attention is just that basic text-to-speech direction and to me that opens up a whole bunch of opportunity, because you know so much of what we want to do is hear as opposed to read. Reading is great too, but like being able to hear anything that just opens up possibilities, so that got me excited.

Speaker 2:

For sure. And then you were also an avid user of Otter maybe still are where you would go. Speech to text, right?

Speaker 1:

That's right. Yeah, I mean the both of the Ascend books that we've worked on together. You know those books I've contributed a lot of content through just walking around New Orleans like a crazy person just talking to my phone with ideas, and then Otter does a good job with transcription and I combine that with taking that audio I'm sorry, taking the transcript of that audio into something like a chat GPT or a cloud and then using it to kind of distill the ideas down and generate better versions of them. But you know, I think there is a lot lost have now with ChatGPT's advanced voice mode and many other models that are natively multimodal where there's no translation layer, where the model understands speech as well as video and images natively, and that is going to produce stunning breakthroughs.

Speaker 2:

Yep, I did a little test this morning with advanced voice, but I'll talk about that in the next section, so I don't jump the gun here. But I feel like people are still getting comfortable with conversing with AI. I'm curious why you think that is and if you feel like that's a mistake and that people should be really prioritizing that way of interacting with AI.

Speaker 1:

You know, I think there's so the people that are using AI. Ai, we have to remember, is a tiny fraction of the world still, so we have a lot of people all over the planet who still don't have access to this type of technology, and that's going to change. And you think about the form factor of technology in most places, it's the mobile phone and, soon to be, probably wearables, whether it's watches, glasses, something else, and so audio can fit on any size device. You know, you can have audio on a phone or even on a watch and an earbud, and so the beautiful thing about that is the portability, the mobility that creates, as well as the accessibility, because there are a lot of people who don't read, people who don't read, and so, when you think about it in terms of access, you think about democratizing access to the power of AI, whether it's for consumer services like financial literacy or for healthcare delivery. So, on that end of it, it's, I think, critically important to get speech right and, of course, be truly multilingual. There's a lot of work happening, particularly in languages that don't have large bodies of existing text or audio or video on the internet, and so to be able to preserve those languages, but also to enable them from an AI perspective, even if there's only say, 100,000 people in the world who speak that particular language, and a lot of times those are also the parts of the world where people do a lot more talking and listening than they do writing or reading, and so that's one of the reasons I think audio is so important. But also to me, ultimately, there's just a richness in audio that is different than what you get from text. Like the way I think about it.

Speaker 1:

If I'm trying to learn a subject, I love audio. I listen to podcasts constantly and I also love having phone calls with people. When I'm on the move. I don't like talking to people on the phone or in Zoom calls other than this one this is great but I don't like Zoom calls or phone calls when I'm stationary normally, partly because I get distracted and I just move on to other things in my brain.

Speaker 1:

Where I'm moving and I'm walking down the street talking to someone, I don't multitask because I'm just focused on not getting run over by a New Orleans driver and talking to the person, so I find myself more focused. But there's just kind of a richness to that right in terms of that audio conversation with the person. But if I want to actually go and you know, quote, unquote, put pen to paper to actually get the real work done, I'm working in text, I'm working with the computer and maybe it's text plus audio side by side. In our prior episode we talked about Claude and the computer use and that's text plus computer use. But if Claude also had a real-time equivalent like advanced voice mode, imagine talking to Claude and Claude's working on your computer and Claude's talking back while working parallel with you and there's also text going, so more like working with another person, essentially.

Speaker 2:

Okay, as a funny side note, if you have ever talked to Amit on the phone, normally you're right. Amit, it sounds like there's trains in the background of what. Sometimes when I talk to you, but it's because you're always on a walk, which. I think is a good way to be.

Speaker 1:

If I'm talking to someone on a phone call, I'm almost always on a walk because I like to stretch my legs and I find that it's just more focused, more focused In terms of comfort level. Getting back to your question, you know I don't know what to say. I think the people who are earlier adopters of AI will get comfortable with new things pretty quickly. I think that for a lot of people who haven't touched AI maybe they've heard of it on the news or something, but they haven't themselves interacted with it they're going to start getting interactions with AI just in their daily lives, because many organizations are going to put in place AI voice systems to talk to their customers and provide way better customer service than people have ever experienced, and in some cases they might not even disclose that it's an AI. I think it's important that you do that if you use AI, but anywhere you use it, you should disclose it. But I think the experience consumers will have in that respect will go up and then, all of a sudden, people will just assume it.

Speaker 1:

It's just like right now going through life. If you didn't have your mobile phone with you for a day, the feeling of loss and the feeling of disconnectedness you would have compared to what you had before access to Google, access to the internet generally. If you have access to an intelligent assistant through voice, you're just talking to this thing all day long and then it goes away. You realize how dependent you are on it, which is both good and bad, I guess. I guess the point is that I think there will be a very natural adoption for voice, because that's just what we do as a species is we like to talk and listen.

Speaker 2:

Yeah, I don't know why. I just get the sense that there's a bit of discomfort around it, and I think it might be because we are so accustomed to text when we're looking for information, whether we're texting someone we know or we're on Google searching something or looking in our email or on Microsoft Teams, and so I think the idea of going to voice when you need information, it's just a little bit of a mental shift.

Speaker 1:

I think that's right and you know it is for me as well. Partly it's that you know this disbelief that a computer can really do such a good job. Even those voice assistants or voice products that have been out for a while, like Siri and Alexa or Google Assistant, have been so terrible historically that when you do use them you're like, yeah, I have really low expectations, like Siri is going to totally screw up this text message, but I'm driving and I don't want to crash my car, so let me see if Siri can send this text message for me. So there's that whole side to it. But I think that our perceptions and our biases, they're self-reinforcing and they're very powerful. Actually, just today I posted something on LinkedIn. I had this really interesting experience in the last two days where I was talking to a colleague about a technical problem.

Speaker 1:

And this other person asked me a question. They said, hey, does this database do this one thing? And I said, no, I wish it had that feature. And I'm like, wait a second. I've been using this database product for like 30 years. I wonder if they've added that feature since. I bothered to check, turns out. So I started working.

Speaker 1:

This is Microsoft SQL Server, turns out I started working with Microsoft SQL Server right after they bought the database engine from Sybase, which is like the mid-90s, 1995, in version 6.0. And I've been working with it for a long time. Right, I know it pretty well. Well, back in 6.0 and 6.5, they didn't have this feature. But in version 7, which came out in 1998, they added this feature and I haven't known about it for, you know, 26 years so, and I'm pretty good at that and I'm also pretty good at adapting At least I think I am.

Speaker 1:

But we all get sucked into these loops where we're telling ourselves that and I'm like, wait a second, is that true, right? So asking yourself that. Anyway, the reason that I think is relevant to your question is we have these self-reinforcing behavioral loops that cause us to feel a certain way at the same time, because there's so many consumers and users of products that haven't come online yet at all or haven't come online with AI. I don't know that it's going to be an issue at scale, because the number of people that are currently using this stuff is a tiny fraction of the people who will use this stuff.

Speaker 2:

That is a really good example. I feel like not similar at all, but I feel like I have that experience with teams even. I'll just be like it would be so nice if stumble upon this new app within, like the Microsoft suite. I'm like, oh, oh, I guess I could have done that the whole time. So just having to kind of keep searching, always keep asking those critical questions of yourselves and of the technology we use.

Speaker 1:

Yeah, it's like the. Is this still true? What are my deeply held beliefs about what AI can and cannot do as of right now? And then you say, well, is that still true? And some people who had initial this is actually something I do regularly is I talk to executives who said, oh yeah, you know, we know we were. We're really on top of AI.

Speaker 1:

We tried chat GPT back in February of 2023. It was right after it came out. We're really proud of ourselves for checking it out. We were all over it and it sucked. It's so bad, it was so full of shit. You know, it was on and on and on about how bad it was. I'm like, agreed, it was terrible. Have you tried it since then? Like no, of course not, it was terrible. It's like, well, ordinarily, in like a year and a half, a software product wouldn't have been, you know, dramatically different, maybe a little bit different. And it's not an unreasonable assumption to say, oh, I tried Excel in early 2023 and I hated it. It was just terrible. And now it's late 2024 and I didn't try Excel again. I still feel like crap from having tried it back in early 23. But this is so different and it's changing so fast, right? So you have to readdress or reevaluate ideas. You can't hold these kinds of assumptions like I was doing for 26 years and expect to be effective in what you do.

Speaker 2:

That's so funny that it came out in version 6.5, you said, or version 7? Yeah, Wow, so you were just on the brink of having this feature that, I'm assuming, would have been pretty helpful in the past 26 years.

Speaker 1:

I would say so. Yeah, it would have saved me a lot of time and it's pretty cool. It's a minor thing, but minor things that you do thousands of times over a lot of years end up adding up.

Speaker 2:

Right, right. I always remind on the Intro to AI webinar. I remind everyone that same fact with Midjourney, because these AI powered image generators were pretty rough I feel like back in 2022, in terms of the ridiculous images that they were creating and they were so bad and everyone thought it was so funny. But they're pretty good now, and so I always tell people even if you tried it last year, try it again, because these things are improving consistently.

Speaker 1:

And that is definitely true with audio, and that's what part of what excites me so much is that there's a lot of investment going into audio as a modality, not just with the specialized tools that are out there, which we'll talk about shortly, but also with the general AI models that are becoming truly multimodal, where Gemini, claude and, of course, openai are all multimodal models natively and some of them haven't exposed that capability from a consumer perspective, where you can't talk to Gemini yet, but you will be able to soon because they're multimodal. The real-time side of it is just really hard nut to crack and OpenAI, I think, has done a good job with that, but everyone's going to have that within six to 12 months.

Speaker 2:

Well, now that we've got this common foundation and understanding of audio and AI, I do want to talk about some tools that we've been keeping a close eye on. One of those, amit, is already mentioned on this podcast, and that is Eleven Labs, which features advanced speech synthesis with natural sounding, context awareaware intonation and emotion. It also features voice cloning technology, which allows users to create custom voices from short audio samples, and AI dubbing capabilities that translate speech while maintaining original voice characteristics. I have actually not tried out 11 Labs myself, which is terrible, but I've seen all of your demos so I feel like I know what it's capable of. Next is Chat2BT's advanced voice mode, which is real-time natural conversations with the ability to handle interruptions and respond fluidly. It features emotional recognition and response, allowing the AI to detect and adapt to the user's tone and emotional cues, and there's native speech understanding, like Amit mentioned, using GPT-4-0, eliminating the need for separate speech-to-text and text-to-speech models. Now, I did give this a brief test this morning because I had not tried it out for myself and I tried to sound really angry and ask it do I seem mad or do I seem happy or angry right now? And it wasn't quite getting it. So I don't know if maybe I was pushing it a little too hard or if I didn't sound angry enough, but just worth sharing with all of you.

Speaker 2:

Next, I want to talk about hugging face speech to speech, which we have covered on a previous podcast episode which we will link in the show notes. Basically, speech to speech features automatic language detection option for seamless multilingual communication and it's real-time, so it allows for immediate responses and audio format in various languages. And then, finally, google Notebook LM, which we've also covered on a previous pod. It creates podcast-style discussions based on uploading content, transforming written material into engaging conversations between two AI hosts. And now added feature since that original podcast episode is that users can now provide specific instructions to guide the AI hosts to focus on particular topics or adjust the expertise level for the intended audience. So, amit, I know you're a big fan of 11 Labs, but also Advanced Voice. Which of these do you find yourself going back to over and over?

Speaker 1:

I use them for different things. So with Advanced Voice mode and ChatGPT, I'm just talking to ChatGPT a lot either. If I'm driving or walking, you know same thing. It's like I'm having these detailed conversations and it's way more fluid, as you mentioned, than the voice mode they had for the last 12 months, where you could speak to the AI and then what?

Speaker 1:

it would do is it would take that speech, turn it into text, submit the text prompt for you get a response and as the response was coming in, it would start synthesizing speech from that, which was actually quite useful. The regular, the prior speech mode they had, or voice mode, was quite good, but it was kind of like you know, submitting your little speech and then waiting for a little speech to come back and it still had a lot of utility, but now it's way more conversational.

Speaker 1:

I have also found it to not be the best at picking up on tone. The demos we've seen talk about like picking up on sarcasm a little bit, but I don't know that it's going to be great at that. Demos tend to also be kind of cherry picked, so I think there's some ways to go for that. But what it is good at doing is actually listening if you interrupt and there's a lot of utility just in that one feature, because a lot of times it starts going down a path explaining something I'm like no, no, no know, doing whatever kind of work I'm doing. Eleven Labs I have used it a fair bit for dubbing If I want to record a video and I want to put the video out there. But I don't necessarily want my own voice. Let's say I'm, you know, discussing some technical thing or whatever and I'm like I want the brand to have a consistent voice, like you're mentioning and we'll talk about in the second part. You know, I can go and take a video of a screen share where I'm presenting on a topic and then I can strip out my audio and save that as a WAV file and then upload that to 11 Labs and do speech-to-speech dubbing with whatever voice I want, and other people on our team could also do the same thing with different videos that they've been assigned and then afterwards, like you, can do this speech to speech dubbing and then all the videos have the same professional voice from you know, kind of a it's not a person, but there's something valuable about that potentially. So I think it's interesting. The Hugging Face speech to speech stuff I'm a big fan of because it's a toolkit that allows you to develop things and there's many other ways to do what they've done. They just put together a full stack for this With advanced voice mode from OpenAI. There is an API for it, but it's somewhat of a sealed API, meaning there's not much control you have over it, whereas the Hugging Face stuff allows you a lot more granularity as a developer to build your own solutions with their tool set. So I think that's interesting.

Speaker 1:

So I'm kind of all over the place on this stuff and Google Notebook LM my favorite use case for that is, once again, I like to walk around, so if you want to find me in New Orleans, just look around either in the early morning or in the evening and I'm probably out on the street walking around but Google's Notebook LM.

Speaker 1:

I just used it last night. I got this long email from one of the companies that we made an investment in. One of the people in that company sent this long email with had a five-page PDF attached to it proposing this new project, and then another board member had replied to it. And so I took the PDF and the email and I took the reply and I put them all into Notebook LM and I said and I was doing this while I was getting ready to go for my walk, I'm like and I hit generate audio overview and then I had an eight minute podcast that got me up to speed and I came back and then I replied to the email really quickly and actually included the audio clip saying hey, this is what I turned it into, cause I didn't feel like reading your email.

Speaker 2:

No way that is so smart. To me, that's a really good idea, do you feel? Did you read the email just in case to confirm? No, you listened to that. You felt like you got it Well, because this is the thing where it's like it's going to turn into a conversation.

Speaker 1:

So if I missed a little bit of detail, it's a low risk kind of scenario. I wouldn't do that for like oh, here's a legal contract where I'm like buying a house, let me have a notebook LM on this.

Speaker 2:

Although that would be interesting too.

Speaker 1:

Yeah, a contract, fun and interactive, the way notebook LM does, you know, I think. One thing I would point out and maybe we'll play a little clip of this is you know, we have been doing a lot of experimenting around audio with one of our AI agents, which is Betty, and since this podcast is going to air after digital, now we can play a little clip of this because we're going to introduce it at digital now on stage as part of my keynote.

Speaker 1:

But Betty is going to have a voice mode, and so Betty is known for being this accurate knowledge agent that works on top of the association's content corpus. So you have your association's vast array of prior publications and content, and Betty learns it, and today Betty has been interacting with members through text, and now, with the voice mode that is not yet available but will be in the early part of next year, your members will be able to have conversations with Betty through a really advanced voice interface, and so the clip that we'll play here now has just a little taste of what that might be like. Hey, betty, what's going on?

Speaker 4:

Hey there, exciting stuff in the association world. Digital Now is coming up and it's the place to be for top association leaders, keynotes, workshops and networking galore.

Speaker 1:

That's awesome. When is the conference?

Speaker 4:

Digital Now is happening from October 27 to 30, 2024 at the omni shoreham hotel in washington dc. Ready to dive into some game-changing content yeah, I am, and you know what?

Speaker 1:

here's a surprise for you. I hope you're excited because you are actually on stage right now during my opening keynote, so say hi to everyone hello everyone.

Speaker 4:

It's a thrill to be here at Digital Now sharing the stage with all of you visionary association leaders. Let's dive into some cutting-edge ideas and make this conference unforgettable, Ready to shape the future.

Speaker 1:

Hey Betty, listen, some of the folks in the room are probably wondering the best way to get started with AI. They're a lot of times perhaps hesitant because their organizations don't have a ton of technical resources. So let me know what you think and can you speed it up a little bit, because I'm a little bit impatient.

Speaker 4:

Absolutely no need for hesitation. Here's a quick start Dedicate just 15 minutes a day to learning about AI, follow AI thought leaders on LinkedIn, subscribe to AI newsletters and listen to the Sidecar Sync podcast.

Speaker 1:

Well, that's a great idea. Mallory and I, in particular, are 100% on board with everyone listening to the Sidecar Sync podcast that we co-host every single week. Well, listen, betty, we've got to go. This keynote isn't super long and I'm going to get pulled off stage pretty soon. Thanks so much for your help. We're really excited about your voice mode and your help for all associations as their knowledge agent.

Speaker 4:

Thanks for having me. I'm thrilled to be part of your AI journey. Enjoy the rest of Digital Now and remember the future of associations is bright with innovation.

Speaker 2:

It's a pretty impressive clip, Amit. I wanted to ask you about it, but I wasn't sure if we were going to talk about it on the pod. But you're right. Since it's after Digital, Now it's out there in the open. Can I ask what model you're using for this?

Speaker 1:

there in the open. Can I ask what model you're using for this? So under the hood, we use a few different things and this is a prototype. It's an early version of it. It's more than a prototype, but it's an early version.

Speaker 1:

So the conversation I was having was with an underlying API from OpenAI, using their real-time API, but also using some additional components that went with it. So right now, the OpenAI real-time API has the lowest latency, and so that latency is important because research suggests that between 200 to 400 milliseconds, or between two-tenths and four-tenths of a second, is the amount of time people feel is natural for the gap between when one speaker stops and the next speaker responds. So if it's more than about a quarter of a second, then it feels as though there's an artificial delay. It's like playing you know old school telephone game when there was long distancing and waited a second or two for the response to start coming back in.

Speaker 1:

But you know, the thing is, I think, that even though they have the advantage there, there's and there are some really amazing things that OpenAI has done, but it's very, very expensive and it also is extremely inflexible. So real-time API from OpenAI will get better and will probably be both affordable and very flexible at some point. At the moment it's not, so we're really prototyping the use cases with that tool and then we're planning on building. We are building with another tool set that's directly integrated with Betty's actual knowledge base. So it's a combination of things that we're using. We're kind of being open-minded about, like, which model to actually plug in, but at the moment it's a mixture of GPT-4.0 and their real-time API on top of that.

Speaker 2:

Nice. So the native GPT-4.0 speech-to-speech that's powering there.

Speaker 1:

Yes, we're using what is from the consumer perspective. It is advanced voice mode. In ChatGPT we have essentially used the same underlying technology for what we're calling the Betty voice preview. This is not the final product. This is a preview of the idea behind the use case. The reason we're doing this is because sometimes you have to really illustrate a concept fully before people understand the utility from it, before people understand the utility from it.

Speaker 1:

And actually, betty, the whole journey for the last two years with that product has been very much like that, where you first had to kind of explain to people the reason why it would be valuable to unlock the knowledge of your association. To begin with, some people immediately got it, but a lot of people need to see it to understand the value creation, what that utility is. And then, with voice, it's again part of what you had said earlier. People use the text. What's the point of having voices? Is it just kind of like a neat party trick or is there utility? And I think there's tremendous utility. It's obviously our position on it and so showing people that through the preview is our goal. It's going to get better and better and over the course of the next several months as we work on the development and get something ready to go. I think people are going to be blown away with what they can do.

Speaker 1:

Here's the bottom line, though whether it's Betty or something else, people in the world are interacting with AI through voice more and more. They're going to expect that from you. You are a customer service organization. At the end of the day, people are going to expect immediate, real-time access, and not to some bot thing that only knows like wrote answers, but to an actual, human-like, 24-7, 365 total expert on your association who knows not only all of your policy and product stuff and membership renewal rules and all that, but an expert on your actual domain. That's what people are going to expect from the association, and that's certainly the opportunity. That may not be tomorrow, it may not even be at Digital Now 2025, when consumer expectations are fully aligned with that statement, but it's soon. It's very, very soon.

Speaker 2:

I think this conversation sets us up really well for part two. But I do have a question for you, amit Do you find that a lot of associations have mobile apps? Because I do think kind of the utility of something like that would be on your phone. So I'm just curious if you have any insight on that.

Speaker 1:

The only mobile app use case that I have seen successful over time in associations historically has been event apps, where people download the event app, like they will for digital now, when they come to the event and it's a great experience to be able to see the sessions that you're going to organize yourself, maybe connect with other people, and then after the event they delete it and that's it, because there hasn't been a reason for people to want to interact with an association frequently enough to justify the existence of an app. Even if you say, oh, we're going to put all of our most commonly used resources in an app and make it really pretty and make it easy to download, the problem is that people don't engaged in that. So most of the time associations tend to have infrequent but deep engagement, which is not bad. They just haven't created a value proposition that justifies their existence in the lives of their members on a daily or weekly basis. The newsletter would be one exception to that. A lot of times. If newsletters are done well, then that can be something people look forward to every day or every week.

Speaker 1:

But my belief is coming back to your point about apps. If you have like, why would you download an app Like versus like. I know you're a user of Canva. I use another product called GitHub for software development a lot Like. Why would you download the app? It's because you use it a lot and you want a better experience than the mobile device. Otherwise, you just go to the mobile website version. There's no point. Why would you download an online banking app versus go to the website? The experience is probably better. Maybe it's more secure too in that case.

Speaker 1:

But going back to the apps for associations, I think mobile app for an association would be awesome if there was a reason to use it frequently, and the reason to use it frequently is it helps me do my job. It helps me in my life. It doesn't help the association, like you know, no matter, no matter amount of gamification or anything else is going to draw me into engagement formats that are not value accretive to me. So that's why the knowledge assistant concept, I think, is like a dead obvious one in terms of like. If I like, if I can help my members on a daily basis in their day-to-day work, that's an enormous opportunity. It's an untapped opportunity because no one's doing this yet. So associations are in a perfect position to go and crush it in their respective domains. It's exciting in my mind.

Speaker 2:

Well, that is a perfect way to end part one of the AI and audio series. In part two, we will be exploring some association use cases, maybe with some mobile apps, so stay tuned.

Speaker 1:

Thanks for tuning into Sidecar Sync this week. Looking to dive deeper? Download your free copy of our new book Ascend Unlocking the Power of AI for for associations at ascendbookorg. It's packed with insights to power your association's journey with AI. And remember, sidecar is here with more resources, from webinars to boot camps, to help you stay ahead in the association world. We'll catch you in the next episode. Until then, keep learning, keep growing and keep disrupting.