Unlocking AI-Powered Insights from Unstructured Data | 48 Artwork

Sidecar Sync

Welcome to Sidecar Sync: Your Weekly Dose of Innovation for Associations. Hosted by Amith Nagarajan and Mallory Mejias, this podcast is your definitive source for the latest news, insights, and trends in the association world with a special emphasis on Artificial Intelligence (AI) and its pivotal role in shaping the future. Each week, we delve into the most pressing topics, spotlighting the transformative role of emerging technologies and their profound impact on associations. With a commitment to cutting through the noise, Sidecar Sync offers listeners clear, informed discussions, expert perspectives, and a deep dive into the challenges and opportunities facing associations today. Whether you're an association professional, tech enthusiast, or just keen on staying updated, Sidecar Sync ensures you're always ahead of the curve. Join us for enlightening conversations and a fresh take on the ever-evolving world of associations.

All Episodes

Sidecar Sync

Unlocking AI-Powered Insights from Unstructured Data | 48

September 19, 2024 • Amith Nagarajan and Mallory Mejias • Episode 48

Send us a text

In this special episode of Sidecar Sync, Amith and Mallory dive into the transformative power of AI in analyzing unstructured data, especially for associations. They explore how AI can help organizations unlock insights from sources like emails, community discussions, event recordings, and more, all of which have been traditionally challenging to analyze. Amith shares real-world examples of how unstructured data can reveal hidden trends and opportunities, offering valuable insights for future decision-making. Tune in to discover how your association can stay ahead of the curve with AI-driven data strategies.

🛠 AI Tools and Resources Mentioned in This Episode:
OpenAI Playground ➡ https://platform.openai.com
Sidecar AI Learning Hub ➡ https://sidecarglobal.com/hub
Free Book: Ascend - Unlocking the Power of AI for Associations ➡ https://sidecarglobal.com/ai

Chapters:
00:00 - Introduction
06:27 - Unstructured Data: An Overview
09:42 - The Power of Unstructured Data for Associations
11:46 - Pre-AI Approaches to Unstructured Data
14:30 - AI-Powered Analysis: A Game Changer
20:42 - Playground Demo: Structured Insights from Unstructured Data
24:21 - Real-World Applications for Associations
33:43 - Predictive Insights and AI’s Future
38:38 - Real-World Applications
43:23 - Exploring Advanced AI Models and Predictive Capabilities
51:12 - Tackling Key Pain Points
55:14 - Classical Machine Learning vs. Foundation Models
1:01:18 - Predicting Future Trends with AI

🚀 Follow Sidecar on LinkedIn
https://linkedin.com/sidecar-global

👍 Please Like & Subscribe!
https://twitter.com/sidecarglobal
https://www.youtube.com/@SidecarSync
https://sidecarglobal.com

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.

📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space.

📣 Follow Mallory on Linkedin:
https://linkedin.com/mallorymejias

Speaker 1: 0:00

So that kind of extraction of insight from other modalities is super interesting, but I'd suggest people you know kind of crawl before they walk and walk before they run and text. I think is kind of an easier thing to play with. Welcome to Sidecar Sync, your weekly dose of innovation. If you're looking for the latest news, insights and developments in the association world, especially those driven by artificial intelligence, you're in the right place. We cut through the noise to bring you the most relevant updates, with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amit Nagarajan, chairman of Blue Cypress, and I'm your host. Welcome back to the Sidecar Sync. We have another episode for you lined up today and it's going to be awesome. It's a special episode, all about unstructured data. My name is Amit Nagarajan.

Speaker 2: 0:53

And my name is Mallory Mejiaz.

Speaker 1: 0:55

And we are your hosts. Now, before we get into this special episode all about unstructured data, which is way more fun than it sounds we're going to take a moment to give you a few thoughts from our sponsor.

Speaker 2: 1:08

Today's sponsor is Sidecar's AI Learning Hub. The Learning Hub is your go-to place to sharpen your AI skills, ensuring you're keeping up with the latest in the AI space. With the AI Learning Hub, you'll get access to a library of lessons designed to the unique challenges and opportunities within associations, weekly live office hours with AI experts and a community of fellow AI enthusiasts who are just as excited about learning AI as you are. Are you ready to future-proof your career? You can purchase 12-month access to the AI Learning Hub for $399. For more information, go to sidecarglobalcom. Slash hub. Amit, how are you today?

Speaker 1: 1:48

I'm doing great. I'm excited for this week, both because of this episode. We're recording a little bit earlier this week than normal, but that's because a lot of us from across the Blue Cypress family are heading up to Utah for our annual leadership summit, where we have a lot of our senior managers and coming together to just learn together and network and it's always a lot of fun.

Speaker 2: 2:09

Absolutely, I will be attending as well. Right now I'm still in Atlanta, but I'll be heading to Park City tomorrow. Wow, tomorrow evening, and I'm so excited. This will be my third year attending the leadership summit, amit. What inspired you to create and host the Leadership Summit all those years ago?

Speaker 1: 2:26

Well, the timeframe was 2021, four years ago and we were looking for a way to get people together after a long period of time where there wasn't much face-to-face contact because of COVID, so it was September timeframe of 21. And getting together in an offsite has always been something I've enjoyed doing with leadership teams and people from all levels of our organizations, because it's just a time to kind of disconnect from the day-to-day and think a little bit more broadly and deeply and just get to know people better, and that leads to all sorts of amazing things for the business. It's healthy for everyone to get away from their day-to-day work and I think there's tremendous power that comes from that. I've done this for years and years and years across different companies For Blue Cypress.

Speaker 1: 3:16

It's just been a really formative event for a lot of people in their careers here, and I think it's one of the pillars of our calendar when we get people together. So we bring everyone from across the senior management teams from all of our companies as well as the BCHQ team. So it's a ton of fun and I always learn a lot just hanging out with people talking to them. We should bring in a speaker or two from outside to give us some additional perspective. So we've got some interesting stuff lined up this year as well, and it's just a great part of our rhythm.

Speaker 2: 3:49

So I had the opportunity to attend the past two years but I was helping more on the planning side. So I will say I'm really excited this year to get to attend no strings attached, just sit back, relax and enjoy. But one of the most interesting things I've found about the past two years is we put a lot of work into the structure of the event, into the sessions that we curate, and to me some of the most interesting conversations that are had are those that kind of fall outside of those structured sessions or it might be one specific topic that we do like a Q&A on that goes on for 30, 45 minutes. So I always think it's really interesting to see just how excited people are to connect and to talk and to troubleshoot things together.

Speaker 1: 4:29

It is fantastic and of course, this year, like the last few years, we will be talking a lot about AI. We'll be talking a lot about clients and the state of our community of associations and nonprofits and thinking about how we can better serve the market. So there's always a lot of interesting things to discuss. But I agree with you, it's the pieces that are not part of the formal agenda tend to be the best parts. I mean, that's really what we do it for ultimately the formal agenda. You could deliver a lot of that content virtually and save a lot of money and a lot of time, but it's the other stuff that you can't replicate, so especially in a world of time. But it's the other stuff that you can't replicate, so especially in a world where we are mostly remote. We have offices in various parts of the country where we have small clusterings of people and we don't mandate people go in a specific number of days or anything like that. I know a lot of organizations do that. We've kept it pretty much 100% remote, but just have the facility available and that seems to work for us. But this type of event is fantastic.

Speaker 1: 5:25

We also do smaller events throughout the year, for example for our technical crews. We do hackathons at various times of the year and that's a great way to get people together. And then we also use the rhythm of the event calendar for the association community to bring our teams together. So we'll often have people come together before or after an event that's not, you know, a Blue Cypress specific event. So we think that's a real important part of relationship development and for us, since we are a family of over a dozen different brands, a lot of times people don't even know what the other companies do. So you know, we learn an awful lot about what our colleagues are doing and how they're approaching different problems and opportunities they see, and so on. Lot about what our colleagues are doing and how they're approaching different problems and opportunities they see and so on.

Speaker 2: 6:05

Yeah, absolutely, you nailed it on the head. That's one of the onboarding processes about joining the Blue Cypress family of companies truly is just figuring out what the other companies do in the portfolio. That's a lot of companies to really learn their business models, learn who works there. It's a challenge, all right. Well, today, as Amit mentioned, we're talking about AI powered analysis of unstructured data, specifically for associations, and we're going to kind of divide that into three sections. First we're going to talk about unstructured data what is it to set the foundation? Then we're going to talk more about unlocking the potential of that unstructured data and then, finally, we're going to talk about why this matters and what exactly you, as a listener, can do next with this information.

Speaker 2: 6:48

So, first and foremost, what is unstructured data? It refers to all the information that doesn't fit neatly into organized formats like databases or spreadsheets. This includes things like emails, blog posts, social media content, audio and video recordings. It's the kind of data that you are generating every day, but it's often overlooked because it's not easy to analyze. In fact, around 80 to 90% of all the data we deal with is unstructured and, for associations specifically, this can be anything from member emails to community discussions, event recordings and even the transcripts of your board meetings.

Speaker 2: 7:27

The challenge with unstructured data is that, while it contains valuable insights, it's not organized in a way that allows for quick analysis or action, and, until recently, most organizations haven't had the tools to fully tap into this resource. Ai is starting to change how we approach this kind of data, offering new ways to analyze, organize and extract meaningful information from it. So, amit, you kind of joked at the beginning that this is going to be more fun than it sounds like. I'm wondering why this has been a topic that you've been focused on recently. What is it? Something you've been thinking about for years? Is it something that's kind of been sparked with the AI landscape? Which is it?

Speaker 1: 8:10

I've been thinking about this for a long, long, long time. For me, a lot of it is that the amount of unstructured data has always eclipsed the structured data that we've had available, and the rate of growth in unstructured data is far faster than the rate of growth in structured data. So we're at 80, 90%. If the trend lines hold, it'll probably be nearly all of our data will be unstructured. Now, keep in mind, structured data is also growing. But structured data takes a lot of work. You have to structure it, you have to take the time to think about what that structure is, and then you have to build systems of both technology and systems of process to get that structured data into a structured database or, at a minimum, like a spreadsheet. So that's why there's so much less of it, Whereas, particularly with mobile devices and cameras and all the things we have, we generate so much unstructured data every day. It's staggering. So I've been thinking about it for a long, long time because in the world of business application software, the insights that you glean from the structured data are helpful, but they're very limited. It's like looking through. Go to your front door and try to look through that little hole to see who's at your front door. That's the kind of vision you get of the world when you're looking at just structured data. Unstructured data has the potential to give us so much more insight.

Speaker 1: 9:25

We've talked a lot about how there's information loss as you go from one modality to another data, and unstructured data has the potential to give us so much more insight. You know we've talked a lot about how there's information loss as you go from one modality to another. So, for example, video to audio, you lose some information because you don't have the video. Audio to text. You lose some information because all you have is the transcript.

Speaker 1: 9:42

Well, if you go to structured data, you lose even more because you know you're getting into this super, super narrow space. So it's been an opportunity area and in theory, you think about all the potential insights that are in there in images, in videos and obviously in text. If we could just ad hoc ask any question we had in a consistent way across all of that unstructured data and get back the results immediately and be able to use it, it would be quite interesting, and I know we're going to talk about that a lot. But to me that's the opportunity for this market, because associations have such a massive amount of unstructured data at their disposal.

Speaker 2: 10:17

Pre-AI. What is the process that you've seen in this industry and the association market for organizations to take advantage of their unstructured data, or do you not see that?

Speaker 1: 10:28

Well, pre-ai, if there really was a pressing challenge or opportunity, you would put people against it. So you would say, hey, mallory, we need to evaluate this data, we have this content we have, and we're going to have to get a team of people to read all the content and then give us a report. And you might do that on a consistent, continual basis where you say, hey, we're going to do this kind of analysis on every document we put in our journal, we're going to extract certain information. Or we might want to look, for example, at the market that we serve.

Speaker 1: 11:00

So let's say that we're in the insurance sector. We might want to keep a close eye on all the companies in the insurance sector. How healthy are those companies? Are they growing, are they shrinking, are they profitable, are they having financial difficulties? And that would traditionally require both proprietary surveys that you might run with your members. But you might also look at publicly available data on companies that are publicly traded in your sector. But you would manually look at things like earnings calls or quarterly filings, things like that, but you would throw people at.

Speaker 1: 11:30

It is the short version of the answer to your question and, of course, that is not a scalable solution, both in terms of cost but also in terms of time, because it takes people a long time to do things like this and they probably have to be pretty well informed in order to participate in those kinds of tasks.

Speaker 2: 11:46

Now there's obviously lots, many types of unstructured data. We talked about a few just now, being emails, community discussions, recordings, transcripts. It's kind of a tough question to meet, but when you think of associations broadly, is there any specific type of unstructured data that you think oh, if you could just do something with this, of unstructured data that you think?

Speaker 1: 12:06

oh, if you could just do something with this, you would have the key. I mean 100%. I think that the content that associations produce that's proprietary to them whether it be academic journals or professional journals or even blog posts, but certainly proceedings from conferences, video recordings from events all of that is unstructured data that basically all computers have been able to do is store it for you, store it and transmit it, and even that has been a fairly recent phenomenon. For a long time you didn't even have a lot of that content digitized. So basically we're talking about a big opportunity because the association world is full of content like that and much of that content is proprietary, it's not available anywhere else and usually it's pretty relevant and pretty good content for the field that the association is serving. So from my point of view, there's all sorts of potential in there to build new products, to certainly make existing processes faster and more accurate, but to create new products and services that generate new streams of revenue for associations. That are also hard to compete with for organizations outside of your association.

Speaker 2: 13:16

And then last question here when you are speaking to organizations or leaders about unstructured data, are there any common misconceptions that you hear over and over that you want to write on this podcast?

Speaker 1: 13:29

Well, I think a lot of people just assume that unstructured data is still kind of in this domain of the unreachable. You know, computers have historically honestly been pretty dumb. They've just gotten better at doing dumb things, meaning that you've had to tell them everything that they needed to do and then they do that one task over and over and over again. They just have been able to do it faster and cheaper. Now we have a fundamentally new capability, which is that the computers don't have to be taught every single little thing and they're able to learn in many respects on their own, and that yields all of these unprecedented capabilities. So a lot of people don't realize that AI can in fact, help you automatically extract insights from unstructured data of all modalities, and that's part of what we're obviously trying to address by sharing this type of content. But to me, that's the big issue is that there's a little bit of disbelief out there that you can indeed do this accurately, cost effectively and at scale can indeed do this accurately, cost-effectively and at scale.

Speaker 2: 14:30

And yeah, that's a perfect segue for the next portion of our podcast, where we're actually going to show you a video of this in action so you can see just how easy it is.

Speaker 2: 14:35

As Amit mentioned, historically this has been a people job. You had to have people read and watch and listen to all your pieces of content to extract relevant information, and you can imagine that sounds incredibly time consuming, like the process was probably incomplete and, of course, subject to human error as well. So we know AI right now is making it possible to not only analyze and structure data more quickly, but turn it into structured insights. It can categorize large volumes of content, summarize those key insights and even assign specific values to qualitative data For associations. This means extracting valuable information from your unstructured data sources, which makes it easier for you to interpret and act on those insights. So to show you what we mean here, we want to play a demo video so you can see this in action Essentially, AI helping to pull structured insights out of unstructured data. If you want the full effect, I recommend that you check out the video version of this podcast episode on YouTube. Otherwise, we do have a narration of the demo as well for our audio-only listeners, and we'll play that now.

Speaker 3: 15:42

In this demo, we're going to take a first crack at extracting structured insight from unstructured content. This example is using earnings calls. Publicly traded companies typically have quarterly earnings calls, during which there's a discussion of the company's performance in that time period as well as discussion about the future. Wall Street analysts tend to ask questions of the principals and the company, and the transcripts of these calls are recorded and generally available for download from the company's websites as well as from services that aggregate these transcripts. So there's a lot of information in there. So let's say that in our organization, we wanted to analyze all of the earnings calls for a particular company, or perhaps all of the earnings calls for a sector, and then extract some standardized, structured insight across all these earnings calls as part of a research effort or for whatever other purpose we may have. Well, let's go ahead and start with a single earnings call. We're going to be using Microsoft, so I'm going to paste in a system prompt that I already have created and you'll see that in this rules block, I've provided some important information to the AI that I only wanted to use the knowledge that I provide to not use pre-training. That's important so that the AI is being directed to use the information I provide, only not its own training content. In addition to that, I'm saying your job is to answer the questions using only the information provided and provide JSON as the feedback, that is, javascript-style responses which are then parsable by any kind of PRM language. Author. The next thing I'm going to do is I'm going to actually paste in some earnings calls transcripts. So earnings calls transcripts, again, are available as public data and I'm going to cut and paste over here a Microsoft earnings call transcript. In fact, I'm going to add two. There's a lot of text in here, so I'm expected to read that in this demo, but there's two different quarters, the most recent two quarters from Microsoft. So that's now part of our prompt. You can see if I scroll all the way up, you have my system prompt and then I have this user message which has that big chunk of text. Now I'm going to go ahead and put in my first question. So I'm just going to say, for the most recent transcript, how optimistic, on a scale of one to five, is the CEO and the future of the company with respect to AI innovation? Since this is Microsoft, I suspect it might be pretty high.

Speaker 3: 17:59

Let's see what OpenAI's GPT-4 model tells us. It says optimism level five. Well, that sounds about right. Well, let's see what OpenAI's GPT-4 model tells us. It says optimism level 5. Well, that sounds about right. Well, let's see what. If I wanted a little bit more information than that. Let's delete this request and this response.

Speaker 3: 18:16

Let's go ahead and paste in a little bit more detailed prompt. This time I'm asking for a specific response format. I want the optimism level on a scale of 1 to 5, and I want the reasoning. So let's go ahead and run that. Here. I get optimism level 5, and then the reasoning is information that is pretty interesting to look at. It's a quick summary of why we think, or why the model thinks, that the optimism level is 5. That's interesting. That's a bit of structured insight coming from unstructured data. That's interesting. That's a bit of structured insight coming from unstructured data.

Speaker 3: 18:48

Now let's say I'm interested in getting a list of all the participants in this call so I can store them in a database, so I know exactly who from Microsoft was speaking. So I'm going to go ahead and paste them. Another prompt that I have saved. So it says for the most recent transcript who are the participants, and I am also being very specific here about the format I want in the response. I want the first response, I want the first name, I want the last name, the job title, and I'm also asking for something called personality style, which I specifically instruct the AI to use the disk framework, and I want an explanation of why that is the style that was chosen. So let's go ahead and run that and we can see very quickly. Gpt-4.0 is giving me the feedback. You'll see that it actually stopped. This is maximum token limits reached. I just made a little mistake over here. I'm going to increase the maximum token limit. I'm going to rerun this and I'll get the full response this time. So here I have now all these different individuals and it's giving me the estimates of what these people's personality styles are, their names, their titles and so forth Once again, extracting structured insight from unstructured content.

Speaker 3: 19:49

The last example I'm going to give you for earnings calls is, let's say we wanted to compare information from two different earnings calls transcripts, so I'm going to paste in another prompt that I have saved and this one says over the periods provided, what are the changes in each of these categories Financial performance, ai, optimism, competitive landscape and so forth. Let's go ahead and run that. So now I have very quickly again, a response Financial performance improved. Ceo optimism is flat in terms of AI, that is, and then several summaries, different kinds of summaries what changed, what are the risks, what are the low performance areas and so forth. You can see very, very quickly here. With low effort I've been able to construct a set of simple prompts that allow me to consistently extract structured insight from an unstructured data source.

Speaker 2: 20:37

Amit, can you talk about the tool that you used in this demo and why you selected that tool?

Speaker 1: 20:42

So we used something very simple but often unknown to most users, even those who are familiar with a variety of AI tools. We use something called the Playground, which is an open AI tool that allows you to basically play around with models in a far more detailed way than the consumer chat GPT product allows you to do, and in the video, if you see that there we really zoomed in a very narrow part of what the playground can do but you can choose the specific model that you want to work with. You can set something called a system prompt, which has a profound impact on the way AI models behave, and then you can provide a whole series of different messages so you can chain them together in interesting ways, and that also affects the way the model works. There are other settings, like temperature and things that we won't get into, but there are more controls, basically. So, instead of being you know a very simplistic just type in a prompt and hit enter, there's you know a handful of controls, and the reason we use it for this demo is because it's a great place to show how a system could actually scale this kind of concept, because you wouldn't do this one document at a time. The demo showed a couple of earnings call transcripts for one company that were manually loaded up into the Playground and then questions were manually asked. The idea behind it is that anything that can be done in the Playground and then questions were manually asked. The idea behind it is that anything that can be done in the Playground can also be done programmatically, meaning that a software developer can write code to do those same steps through the OpenAI API as well as with lots of other models.

Speaker 1: 22:18

We used OpenAI because their Playground is probably the best Playground-type tool in the market. There are others. The Grok Cloud has a good playground. So does Anthropix Cloud environment. Most of the major developers have some kind of playground type environment. Openai has just been added a little bit longer and they have a more robust playground tool. So that's why we used it for the demo. But you can use any language model, I should say any significant language model. The smallest language models are not necessarily the best at all the tasks we talk about, but even something like a LAMA 3.1 middle-sized model, like the 70 billion parameter model, would be perfectly fine at doing a lot of the things that we demonstrated. So that's the basic idea is, the Playground is just a way of simulating what you might actually go and build.

Speaker 2: 23:06

In your regular day-to-day? Are you opting to use the Playground or just the normal version of ChatGPT?

Speaker 1: 23:12

I'm a normal consumer most of the time, so I'm using the ChatGPT app on my phone because I'm always moving around.

Speaker 1: 23:19

I use all the mainstream-type tools as a consumer If I want to do something, where I'm thinking through, hey, there's a new way we could do something with a new model.

Speaker 1: 23:27

Like OpenAI just released the O1 models and the O1, or the O1 Preview, I should say, and the O1 Mini, which is the new name for Strawberry, which we talked about recently on this podcast, is now called O1 Preview and O1 Mini, which are an interesting topic we can touch on a little bit today. But that particular model series is available to some consumers via the API, and so if you want to be able to more rigidly test models with the same system, prompt and sequence of user prompts, it's a great place to do that. I get into that from time to time. My typical workday is not really predictable, which I kind of like, but I do use that tool regularly to try to simulate hey, this is what it would look like if we did this programmatically, and then I'll go talk to different development teams across different companies and say, hey, what do you think about this idea? So it's a good prototyping tool, I guess is the best way I'd describe it.

Speaker 2: 24:21

That's helpful. Talking about the demo, you asked AI to evaluate the CEO's optimism on a scale of one to five. I'm sure we have some listeners that are hearing that or viewing it and wondering how exactly does AI determine something as subjective as optimism? So I wanted you to touch on that and then to maybe expand on what practical value would that finding optimism on an earnings call have for associations?

Speaker 1: 24:48

Sure. Well, let me first start off with how it would handle something a little bit less subjective, which is I asked it to extract the names and titles of the people that were in the call, and that is a little bit less of a hey. How optimistic was Satya Nadella versus his CFO and that it was able to knock that out, and almost any model would do that very well. So that requires a little bit less of a leap of faith in the sense that it's looking for something very specific. In the other example, though, like optimism level in general, like how optimistic is this person about their company's future? How optimistic is this person about the economy broadly, company's future? How optimistic is this person about the economy broadly? How optimistic, in the case of the example, was the CEO about AI specifically for their company?

Speaker 1: 25:37

So we're asking for a specific person. So in the transcript of the earning call, the AI has to narrow it down to just that person's commentary. We're asking for a particular type of optimism around AI. So it has to be smart enough to know when that person's talking about AI. But fundamentally, your question is an excellent one, because when you think about optimism, how would you determine if Satya Nadella is optimistic about AI. What would you do? How would you go about doing that, mallory, if I gave you the transcript and asked you to do that task?

Speaker 2: 26:07

Well, if you gave me a transcript, that's a whole other issue. I'm thinking audio Easy, right, I would listen to tone, I would listen to all the other little details there. With a transcript it would be word choice Maybe, how wordy someone was if they were elaborating a lot, or if they were short and concise and to the point, the kinds of questions they were asking.

Speaker 1: 26:29

I guess that, yeah, there's the substance of the thing. Which is what did they say? Did he actually say I am very optimistic about artificial intelligence? Probably not right. And so lesser AIs in the past would have been basically thrown off by that. You know, old school classical natural language processing or NLP would have looked for like keyword counts. So if Nadella never said AI but he's talking about intelligent computers and it's not a simple synonym those kinds of systems would have been thrown off.

Speaker 1: 26:57

So it's much, much more than that. It's essentially effectively a facsimile of how a human might actually think about that task where the AI is looking at the full corpus of content it's been given, which is the transcript of the earnings call. Agree with you completely. By the way, the loss of tone really takes away a lot of the information, right? It's a perfect example of what we were just talking about. So I'd much rather feed the actual audio files to the AI when AI models tend to be multimodal. So certainly with ChatGPT you can do that. You can feed in the audio file and you can ask it to listen to the transcript, and then it's going to be much, much better because there's more information there.

Speaker 1: 27:36

But ultimately it's making a judgment call. That's what it's doing, and the way AI makes judgment calls is through probability distributions. So in fact, you can even see what the token probabilities are for particular prompts. There's this thing called log probes that you can look at as a developer Most users would never encounter this and you can see okay, well, the five was like the highest probability, versus the four being the lesser probability. So it's picking essentially the most likely token, right? We've talked about that a lot. It's still next token prediction.

Speaker 1: 28:07

And so, ultimately, that training data, this massive corpus of content that these models are trained on, basically all of human knowledge, basically is teaching the model how to think about what does optimism mean and how do I determine if this person's optimistic or not. So it sounds really kind of weird, but that's what's exciting about it, because normally but that's what's exciting about it because normally it would take humans and their time and their judgment to read an earnings call transcript. Even if you're a speed reader, it's probably 30 minutes to an hour and then, for every time I ask you a question about that particular transcript, I might ask you all the questions up front. Say, hey, mallory, can you read the last 20 quarters of Microsoft's earnings call transcripts and give me the answers to these 10 questions for all the last 20 transcripts, and you could go do that. It would just take you a really long time and if, after you got done, you go, oh jeez, I'm so sorry, mallory, I forgot to ask you to answer these other two questions. You kind of have to redo a lot of the work, right, because you didn't have that in your context window, that extra questions, this. Now that's the technical answer. Now you also asked why does it matter, right?

Speaker 1: 29:10

What's the value of, in this particular example, earnings call transcripts? We like this example because it's public data and most people have some general understanding that publicly traded companies report their results on a quarterly basis and that data is public. And for associations, and that data is public and for associations, associations are all in an industry or a sector. Some of them span multiple sectors, but being able to instantly or close to instantly ask any arbitrary question across the entire corpus of content for all the earnings calls in your sector could be interesting.

Speaker 1: 29:48

So let's go back to that insurance example. Let's say that I am an association somehow involved in the insurance world and I'm just curious how insurance companies feel about AI. Well, I could run every earnings call transcripts for all of the insurance companies and everyone that's kind of in a related field through that type of tool and get a pretty immediate understanding of where people are at, and that can be valuable in terms of publishing research reports, for example, or possibly even calibrating the kinds of products and services that I might bring to market, depending on how interested my sector seems to be in certain topics. I might be able to use that to gain insight, to pick better topics for my annual conference and for my publication schedule and so forth.

Speaker 2: 30:33

With the insurance company example you gave. I want to bring this back down to ground, down to the ground a little bit. If you had to estimate and you might not be able to give a good estimate of this this technology is available right now. Our listeners could go out and run that exact experiment right now. What would you say would be the timeline and cost roughly if someone wanted to start that project today?

Speaker 1: 30:56

You're referring to doing one document at a time manually?

Speaker 2: 30:59

No, I mean automating earnings calls exactly what you just said. I want to do that right now. What does that look like?

Speaker 1: 31:05

Well, you could certainly go build something like that by hiring a programmer or maybe asking an AI to write the program for you, and it would probably get pretty far with it, and then go and do this thing and it would take you probably a matter of a few weeks and probably not a massive amount of money, but that requires skills or dollars. There is a tool that we have available that we didn't talk about in the demo that particular demo but it's called Member Junction, which is an open source common data platform that we didn't talk about in the demo that particular demo but it's called Member Junction, which is an open source common data platform that we publish and it's a totally free tool for the nonprofit community. And Member Junction actually has functionality built into it which allows you to do exactly what we're talking about. It allows you to essentially point Member Junction at one or more sources of content, can say, hey, the source of content is a website or is a cloud storage folder or whatever. There's lots of options for content sources and then you can specify the questions that you want to ask of each of the documents that exist in that content source.

Speaker 1: 32:03

So it allows you to essentially just structure, like all the prompts, like we put them in one by one in that demo, but you can basically put them into Member Junction in what's called a content type, and then Member Junction will automatically go and do this against all the documents that you have in whatever location you ask it to process, and it'll keep doing it forever.

Speaker 1: 32:21

So you can point it at a website and say, hey, do this for my website and it'll automatically process all the content on your website. As soon as new posts appear on your website, it'll automatically suck those in and keep processing them for you. If you add new questions, it'll go back and reprocess the old documents so that you have a consistent set of structured insights across both new and old pieces of content. So Member Junction does this quite nicely, and associations and third-party vendors anybody can download Member Junction and start using it right away for that. But you can also build these solutions yourself. It's not intended to be like an ad for MJ and, of course, member Junction is free anyway. So the point is is that there are ways to do this. That is one way that is super easy and inexpensive.

Speaker 2: 33:10

Mm-hmm, using MemberJunction as the example, is this something that you set and forget, so you kind of do the initial setup and then just see the insights keep rolling in? Do you need to keep a close eye on the quality of those insights and then kind of work to fine-tune things? What does that look like?

Speaker 1: 33:20

You know it's an iterative process, in that you would probably let's say I wanted to do this with earnings calls. First of all, what's the content source? Am I going to subscribe to a paid source of content and then pull that data in, or do I want to point at specific company websites? Do I want to try to pull data from the SEC? There's lots of ways to pull that. But then the questions that you want to ask from time to time, you're going to change them right.

Speaker 1: 33:43

This is where we're going from scarcity to abundance. We like to say a lot where up until now, the ability to ask and have answered these kinds of questions against vast arrays of unstructured data has been very expensive and therefore it's been scarce. You haven't been able to do what I'm describing but for a very small number of questions asked against a small number of documents, and even that costs you a lot of money. But we're moving to a model of abundance where you can ask any number of questions at any time against any number of documents and basically get the answers for near free. So that is going to cause people to think differently, but you're gonna have to think more creatively because you haven't had the ability to ask a whole lot of questions.

Speaker 1: 34:26

All of a sudden you do so. It is going to be an iterative process, the actual mechanical process of member junction, doing what I described. You can set and forget, but what you want to do is go back and look at what's extracting from the documents. So imagine it's like essentially dropping in the answers to all those questions into the database for you. Once it's in a database, you can write reports against it, you can feed that into all sorts of other things. So you do definitely want to inspect the outcome, but I think the most important thing is figuring out what you want to ask, and that by itself is something you have to learn how to do.

Speaker 2: 35:02

In terms of AI analysis of unstructured data. Amit, I've heard you talk about the example of how this can apply to medical research. Can you kind of talk a little bit about that? It's much more complex, I would imagine, than what we're talking about here with transcript calls, but maybe not.

Speaker 1: 35:17

It's more complex in a way, in that the subject matter, I think, would tend to be more maybe not even more complex, but more domain specific. But in a way a research paper is actually simpler because it's more predictably structured, Whereas an earnings call is just people talking and they typically have some planned comments.

Speaker 1: 35:35

But then, what makes earnings calls interesting is analysts can get on the phone on this conference call and ask whatever questions they want, and typically the executives from the company will indulge quite a few of those questions and answer them. And these people tend to have a lot of PR training, so not a lot slips out. But every once in a while you get some interesting comments that come from people outside of their prepared remarks. So it's interesting, because that aspect of how unstructured it is makes it somewhat complex for the AI to analyze. Now, research papers, coming back to your question, across any domain, there's tons of research happening in the world in all sorts of fields and you can get access to the world's academic publications a number of different ways. There's Google Scholar, there's Arvix, there's other sites where you can download these things for free and you can search. But what if you wanted to say, hey, listen, I want to monitor all of the research happening in my field and, let's say, a few adjacent fields, and what I want to do is I want to automatically pull in all the papers as they're published and I want to be able to answer a number of questions about these papers. So if I'm in cancer research, maybe I want to know what particular types of cancer is this paper dealing with? Maybe I want to know what type of research is it Is it looking for, for example, is I want to know what type of research is it Is it looking for, for example, is it looking for early detection type of techniques? Is it about curing the disease? What is the paper about substantively and there might be several classifications for that, and I might want to do that across every single paper that's in my field and perhaps in some adjacent spaces, which is a massive task, right.

Speaker 1: 37:09

But associations that are, in particular, narrow domains, I think would do well to find a way to pay more attention to what's happening in their fields and then do things with that.

Speaker 1: 37:17

I mean, what can you do?

Speaker 1: 37:18

Well, first of all, you can inform your members about it more effectively if you have this newfound superpower to keep up right.

Speaker 1: 37:25

Right, because the volume of research that's happening in a lot of fields is so overwhelming that even the most in-depth practitioners in those fields maybe consume five or 10 percent of the work that's out there. It's probably a massive overestimate because there's so much volume of content. So I think it allows an association to do a better job of curating, do a better job of then feeding their, their members and perhaps others with more timely insights. And then you know the other thing that you could potentially do, talking about putting layers on top of what you might do with the basics what if you had a really smart capability on your website where you allowed any of your member folks to ask questions of any type across the entire corpus of content your own journals and all these other things and to get research reports brought back right? It's like a report being prepared by someone who's doing the work manually but done on a completely automated basis. That would be a service I think a lot of associations could monetize in a pretty meaningful way.

Speaker 2: 38:26

And then especially using that proprietary data. It's really a service that no other company or organization could provide to the world, you know journal and get published is very small.

Speaker 1: 38:36

The number of pieces of content that are actually published by people is massive, right, because those are like anyone can say hey, I'm publishing this paper, there's like no constraint on that. I can go publish a paper on cancer research tomorrow. I have no credentials and that's, and I wouldn't read it, but then I wouldn't know what I'm talking about. But like nothing stops me from doing that, right? So that's kind of like this massively wide array and so therefore, you get a lot of stuff that's not really great science and then say you take something like a journal, like Nature, which is one of the most preeminent publications. To get a paper published there is a really big deal and that takes a lot of time.

Speaker 1: 39:27

Credibility the peer review process is extremely rigorous, as it should be, because once something is published in one of those journals, it's considered to essentially be like hey, we've shown that this experiment works, that this thing's really a thing, whereas you know, this massive volume of content that precedes that type of that level of achievement doesn't mean it's not interesting, especially if it's from people who aren't well known. A lot of times it's like anything else in life when humans are involved in something. If you take the next paper that Jennifer Doudna writes, she's the person who was one of the main contributors to creating CRISPR. If she publishes a paper, every journal is going to look at it immediately because she's Jennifer Doudna. But if some person who's never been published anywhere has this unbelievably brilliant thing and they want to publish it, it may or may not ever get noticed by anyone, right? So all I'm saying is is that this gives us the scale of resourcing where we can do things that previously would be really out of reach.

Speaker 2: 40:24

For the sake of these examples, on this episode of the podcast we focused mostly on text, with the earnings calls, transcripts and then even with the research that we just mentioned. Can you talk a little bit about what it would look like to use audio or video, or if you recommend sticking with text for now and then maybe trying that out later?

Speaker 1: 40:43

I would start with text for now, for two reasons. One is it's easy to test out for anybody and it's really inexpensive. Multimodalities, particularly with video, is going to cost you a lot more, and I'd wait for the curve to keep working in your favor. I'm not saying wait five years, I'm saying wait six to 12 months maybe. For example, just last week the Mistral folks in France released Pixtrel, which is their video or sorry, image classification model. They have a video version coming and that's open source and that's going to be able to do a lot of extraction of insights from images. So you can pass along an image of a picture of your house and say, tell me what this is and what part of the country might this house be, and all sorts of things like that, and it will give you pretty good answers what part of the country might this house be, and all sorts of things like that, and it will give you pretty good answers. So that kind of extraction of insight from other modalities is super interesting. But I'd suggest people kind of crawl before they walk and walk before they run, and text, I think is kind of an easier thing to play with. The models are also getting a lot better too. So as these models get smarter and smarter, you're more likely to get interesting insights. So as these models get smarter and smarter, you're more likely to get interesting insights.

Speaker 1: 41:49

One of the things that could happen in an experiment with this is you just do something really simplistic. You pass in a chunk of data and you ask a question and you get something bad back and you go oh well, clearly the model is not smart enough to do this. I'm not going to bother with it, and that could be true. It could be that your use case is above the capabilities of the model, or it could be that your use case is above the capabilities of the model, or it could be that you didn't approach the prompting strategy in a way that is correct for your use case, or maybe you're using a model that isn't quite at the forefront of the type of thing you're trying to do. I would recommend trying several different kinds of prompts in this experiment. I would also recommend trying two or three different models, like certainly try OpenAI, maybe try out the Lama 3.1 models on the Grok Cloud, or try Anthropix Cloud, which is an amazing model as well. So I would try multiple different models.

Speaker 1: 42:37

The other thing you can do is you can go to a frontier model. You could even go to GPT-01 preview and say, hey, this is what I'm trying to do, this is the type of content I'm working with. I want you to help me create the best prompting strategy to use with a lesser model, and so this super high-end model, like O1 Preview, might be able to do a really good job at doing the prompt engineering for you. So I wouldn't give up too easily. But it's easy to give up right, because you go, especially people coming in, going. I don't think AI is smart enough to really pick up on all the nuance in my content, and that may be true, but it's also possible that you just need to do some iteration and try a bunch of different things. That's oftentimes how a lot of stuff happens in AI engineering is you're literally throwing stuff against the wall to see what sticks Tiny tangent, since you mentioned it, have you tested out O1 and what do you think?

Speaker 1: 43:28

I have. I haven't tested it extensively. So there's two new models that were released by OpenAI last week O1 Preview and O1 Mini. O1 Mini is not a preview. It's just a much, much smaller version of the latest cut of O1. O1 Preview is labeled that way because it is not intended to be like a long-term model people are going to use. There will be an O1 preview is labeled that way because it is not intended to be like a long-term model people are going to use. There will be an O1 at some point, probably in the next month or two.

Speaker 1: 43:53

This is just a preview and what I found is that the chain of thought reasoning that the model does itself while it's you know, quote, unquote thinking is really helpful for more complex questions and it's interesting to watch. You know, watch how it's taking that additional time, as we talked about in the last pod. It's using that time, it's using this process, which is like chain of thought prompting. It's similar to that. It's just been baked into the model and so it is coming up with smarter answers to more complex problems, for example with code generation. O1 is amazing at that. It's way better than GPT-4.0. O1 has been shown to be able to solve a lot of problems at the PhD level in domains across a wide range of different disciplines, which is really exciting. Gpt-4.0 is not at that level, which is really exciting. Gpt-4.0 is not at that level and OpenAI used a process where they actually hired a bunch of PhDs and spent a ton of money doing reinforcement learning with PhD input and that's why they're able to claim and there's quantitative reasons they're able to show it's actually better than a lot of PhDs in a lot of these fields. So it does represent an interesting breakthrough.

Speaker 1: 45:06

I haven't gone super deep in it, but there's a lot of good information out there about it. At a technical level, if you're interested, there's a great podcast called Latent Space, which is very technical. It's for AI engineers. That had a great pod that just dropped, I think two days ago, that is, with an open AI person as well as the hosts of that podcast. They talk about O1 a fair bit and I think there's just a lot of blogs you could read up on it that do quite a good job. But I think our earlier podcast actually Mallory I think we did a pretty good job explaining how then Strawberry, now O1, worked and it's pretty much as was advertised.

Speaker 2: 45:43

Yeah, and we really structured that whole episode or at least that topic on reports and leaks. But it seems like most of those were pretty accurate. I will say from a not technical perspective, I like the name Strawberry better. I wish they had gone with that instead of 01. I'm sure they have their reasoning, but I myself have tested it out, just like one to two prompts, and it is really interesting to see the model. Quote unquote, think I asked it about event planning marketing and it was like mapping out the plan, thinking through the timeline, and it kind of takes maybe 10 to 15 seconds before it generates a response. So I recommend that you all check it out. I'm excited to see what the big release looks like.

Speaker 1: 46:22

I suspect that in the coming weeks, certainly months, we will see similar types of things announced by Anthropic certainly, and probably all the other major players in the space, and it'll be interesting to see. Will there be a GPT-5? What they've said is that O1 represents a reset of the counter, because they're now, in kind, of a new era of models.

Speaker 1: 46:45

So, it doesn't sound like GPT-5 will ever be a thing, but there might be an O1 and then an O2 and an O3, and the O stands for open AI. Gpt was you know. I think a lot of people have become accustomed to GPT, chat GPT, but it's a very technical term and doesn't really mean anything to anyone other than the fact that it's an acronym that they know means AI stuff, but over time GPT, as the fundamental technology, is also going to shift.

Speaker 1: 47:09

So it makes sense that they got rid of that, because there will be other model architectures, post-transformer, that will supersede GPTs. Anyway, I digress, but I think it's very much worthwhile, as you put it, for you to go experiment with it and try the things that you haven't been able to make work in a pre-01 world.

Speaker 2: 47:31

I saw Ethan Malik had a great post about that on LinkedIn. Like you should have this list of challenges ready to go run them through that model. If you don't have that list of challenges, what are you doing? And I thought that was a really great post.

Speaker 1: 47:42

Yep.

Speaker 2: 47:43

Well, we have been kind of dancing around this topic all episode in terms of why does this matter, why does this whole topic of the episode matter, and what can it do for associations specifically? So I think we both believe, and I think it's the case, that analyzing unstructured data can fundamentally change how you operate and create value, and that allows you to tap into new opportunities and create value, and that allows you to tap into new opportunities. So we've mentioned this as well. But analyzing public unstructured data, for example, like research papers or industry trends, and a mix of your proprietary unstructured data, like member communications or even event feedback, you can uncover patterns and insights that were previously hidden and sure. This allows you to stay ahead of trends, to be more responsive to member needs, but also to identify gaps in the market that could turn into new revenue streams.

Speaker 2: 48:33

Ai might reveal trends in member behavior that suggest a demand for a new type of service or product that your association can offer, and by understanding these trends faster than before, yes, your members will be happy, but you'll also be positioning your association as a leader in your field, and so the big question here is how exactly do you take advantage of this? My immediate thought, amit, with the demo example, is it's amazing. Extracting these insights is great, but kind of the follow up, naturally, is well, what can you do with these? Let's say, you scale this, you automate it, you have all these insights, all the insights you could ever want. Well then, what do you do with them?

Speaker 1: 49:12

Sure. Well, with any new capability, it takes time to figure out how to use the thing right. So electricity took a long time to make its way into all the different applications that are out there, and I think the same thing is true for AI, broadly speaking, and for this particular capability of AI. The way I'd put it is this I would look for some pain first before I looked for opportunity. It depends on the organization. So let me give you a very specific example.

Speaker 1: 49:41

Many organizations run events, and for those events they will typically issue some kind of a call for speakers, call for papers, that type of a thing. Depending on the style of event they have, they call it different things, but the basic idea is they're opening up for people to submit proposals to speak at their event, and some events are very large scale where they might actually have many, many hundreds of sessions, and in order to fill the hundreds of sessions, they might receive thousands of proposals for talks, different people submitting these things, typically over a period of a few months, and so this abstract submission is typically what it's called. This abstract submission process is one that people also have for journals, for a lot of different things. It's essentially like an application process. Well, it turns out that the way people deal with this right now is a combination of staff as well as volunteers. Read the proposals, right Makes sense. If you want to speak at Digital Now, you submit a proposal and ultimately, mallory and others will have to read that proposal and determine do you qualify on certain basic criteria? Then maybe, after that first pass, you'll look at it and say, okay, well, let's get some input from different people in terms of the speaker, the particular topic and so on, and ultimately you make decisions based on not only the quality of the submission but also the mix of topics that you want to include in an event, as well as other factors.

Speaker 1: 51:07

So where AI can help in this particular case is, at a minimum, doing that first pass. So let's just say that you had a storage location in Azure or AWS or Dropbox or wherever, where you just dropped off these files. So you get all these submissions coming in, their Word documents, their PDFs, whatever right. And so you're getting hundreds or maybe thousands of these documents and rather than going through and reading all of them for that first pass of like, does the submission check off the basic checkboxes Like does it have the information we asked for? Is it substantial enough? You know, sometimes people submit a two-sentence description and you ask them for 500 words, or whatever the case may be, is it on a topic that is related to the program that you are building for the event? That's a little bit more nuanced, right.

Speaker 1: 51:59

So what if we could automatically go through every one of those documents and answer those kinds of key questions? Does this document like? There's a rubric, eventually right where you say, hey, these are the types of things we're looking for and we put those into questions. We ask the AI, and the AI basically answers those questions automatically for every single abstract that's being submitted in real time. So the abstract drops in and it's close to real time. You can set it to be every few minutes. It'll do this, and then you can just look at a dashboard or a screen that shows you all the submissions and the extracted structured values, meaning you might have a rating scale of completeness, but you might also have a bunch of check boxes, says did the author include contact information? Is it, at least, you know, 500 words in length? Did the author provide citations to other publications if that was required? And on and on and on, right.

Speaker 1: 52:49

In some cases you might have scenarios where, like for Digital Now, for example, we allow non-association staff to submit, but they have to have an association staff person co-present with them. We do that for a variety of reasons, but we could ask the AI to determine whether or not there was an association staff co-presenter along with the non-association person, and it'll be able to figure that kind of stuff out. So if you're getting this massive volume of documents coming in, if you can narrow it down to the documents that actually fulfill your criteria, you've saved a lot of time, and then you can go on to the more substantive work of actually evaluating the program. Now you can, of course, get the AI to help you with those steps too, but just take that most basic, simple step. You'll probably save a lot of time, and I think that's a great place to start. There's lots of pain points like that.

Speaker 1: 53:36

You might have a similar scenario with volunteer applications, where people are saying hey, I want to volunteer for this committee and you require some, maybe in a video submission saying why do you want to be part of this committee and you want an AI to quickly look at the video and extract several pieces of information to make sure that the video is worth your time to watch. Now, if you put a system around this right where you have some kind of tooling and you have this process, maybe you go back to people who've submitted something in an incomplete way or something that doesn't quite match and give them feedback, because, as an organization that operates off of volunteer-provided content, you don't want to choke off that pipeline. You want to encourage people to submit. So wouldn't it be great if you could give very rapid feedback to the people who are submitting and say, hey, your proposal didn't quite meet the mark, here's where it needs to improve. That's really useful, because most of the time the experience for someone submitting to a conference is you hear nothing, nothing, nothing, nothing. And then you either get an email saying we're pleased to accept or thanks for your submission.

Speaker 1: 54:37

You're not in, but usually there's no detail behind it, right? So there's a lot you can do. That's just one example. The other examples we're talking about with cancer research or research papers in general, and earnings calls are more like opportunistic saying what could you do if you had this tooling? You could build new services and new products, and that gets me more excited than anything else. But I would start if I was an association staff person. I would start with my pain. I would look for where I was dealing with a lot of unstructured data in my day-to-day job and then look to solve a problem or two. That's fairly simple with this kind of tool.

Speaker 2: 55:14

It seems like up until this point, we have talked about real-time insights, or nearly real-time insights, or historical insights, in the sense of looking at the past six months, let's say of calls from our call center. Can you talk a little bit about maybe phase two of this and if AI well, I know it's also capable of creating predictive insights, but kind of if that's a separate process or if that's something that goes hand in hand with what you're talking about now?

Speaker 1: 55:41

Well, one of the things to think about is we have a lot of tooling for structured data that is not capable of operating against unstructured data. So you have report writing tools like a Tableau or a Microsoft Power BI, and you have a lot of people also who know how to use those tools really well, right, whether they're vendors or your own staff. Well, once you have structured data, you can do a lot of things with your classical tooling around structured data, which can be super interesting. That can perhaps be descriptive analytics, where you're saying, hey, tell me about the situation, right, give me charts and graphs and tables that summarize the structured data that I've extracted from the unstructured. That's useful. You can also look for prescriptive analytics, where you're looking to solve specific problems and there's different approaches to that, and you can look to do predictive work, where you say, well, let's take the data set we have, let's just say earnings calls transcripts, and let's say we had I don't know 30 or 40 questions that we're asking across all of the earnings calls transcripts for the last 10 years. Let's just say, like AI, optimism being one, and let's say there's 30 or 40 other variables we're asking for and we go and get that for an entire industry, say insurance or consumer packaged goods or financial services or whatever the sector is, or maybe even more broad than that, and we have a database that has all these additional features, these additional structured features. And then we also pull in some additional interesting public data, which is already structured, things like the volume of that security by day, by month, by week, the price and other factors, what was the market cap of the company, other things that would be interesting to look at. And then we train a classical machine learning model to predict where the stock price might go, based upon the variables in the unstructured data.

Speaker 1: 57:28

Now, I'm not suggesting that is a good investment idea, just to be clear. But I would not be surprised in the least if hedge funds were formed around this type of concept. Right, looking for additional things to trade, train quantitative strategies against. Now, in the association world, most people aren't going to do what I just said. The reason I use that example is, again, I think a lot of people are familiar with the stock market at a very basic level at least, and these concepts can apply to predicting things that matter a lot in your sector. Maybe in your sector the price of a certain commodity matters a lot to everyone in your space, or maybe being able to predict employment levels is really important. There's a lot of things that I think are factors that associations can do a much better job with from a predictive viewpoint, using the structured insights that come from this process.

Speaker 2: 58:15

That's really helpful. You mentioned in your example a classical machine learning model. Can you clarify how that's different from what else we've been talking about?

Speaker 1: 58:23

Yeah. So we spent all of our time talking about what a lot of people call foundation models, which are pre-trained large scale models. Even what we call small models, like small language models, are fairly large in classical terms, classical machine learning being basically where you don't have this pre-training and this foundational aspect or multi-purpose use. It's not for that, it's for solving one specific problem. A good example of that would be like classical image classifiers, where you would say, hey, I'm going to train on 10,000, 100,000 million images and I want to be able to say, oh, this image contains a cat, this image contains a horse. So models like that but that's actually even that type of model is fairly general purpose.

Speaker 1: 59:04

You might have another model that says I want to have a prediction on how likely it is for my members to renew. So what I do is I create a machine learning model, train on all my historical data, on a variety of different variables, on my members and a renewal rate. So, for example, I might have things like how long has the member been with us? I might have their name, I might have their gender, I might have their location, I might have the total aggregate amount they've spent with us life to date and a variety of other structured variables, right? So I have all that and I can train a machine learning model to predict, with different confidence levels, how likely it is for Mallory or Amith or somebody else to renew or not renew, and there's a lot of value in that. The problem is is that the structured data is this narrow, you know pinhole of visibility into the true world of data. So what if you could also use every email that they've sent you, every comment they've ever made on your online community and ask questions about each member across all of that unstructured communication at the member level? Then populate your AMS or your other system with additional attributes that are telling you in real time something like member happiness level. Right? Self-reported NPS it's better than nothing, but people are pretty bad at self-reporting how they feel about a lot of things.

Speaker 1: 1:00:22

So if you could pick up on this digital exhaust, as we like to call it, which is all this other stuff in your ecosystem, and say, hey, I'm going to add enrichment to my structured database? So now I have two, three, four, five additional fields in my member record and then I train a machine learning model on that and I start using their happiness level? Right. There's a lot of other things like that. You could use and then train an ML model to predict renew, not renew. So that's what classical ML is. In my definition of it, it's a purpose-built model that does only one prediction type. Generative AI, by the way, is predictive. It's just predicting the next word or predicting the next pixel. So all machine learning and all AI is about prediction. It's kind of the scale of it and how wide the use cases are.

Speaker 2: 1:01:11

That makes sense. Okay, so for our listeners who are still with us, who have decided they want to go out and use AI to pull structured insights out of their unstructured data, would you say or this is what I would recommend, and you tell me if I'm wrong, amit, or this is what I would recommend, and you tell me if I'm wrong, amit I would recommend that they select one unstructured data type based on a pain point that they're experiencing not just any random data type, but a pain that they're experiencing. Select a data type, decide on the fields or attributes that they want to pull out of that, or, essentially, decide on the structured insights that they want to pull out of that, and then you fill in the rest.

Speaker 1: 1:01:49

Yeah, I mean, it's the questions that they'd love to know the answer to. Right, it's like what are you curious about and you think about? Like, well, in all the emails and all the online community posts and all the social listening we can do, could we answer how happy is Mallory with her membership at XYZ Association? Maybe not, but there's actually a good chance that you probably could, and so, and maybe you can do it, definitely at the population level or at a segment level, and say hey, across all the people who fit into this category of early career professionals or late career professionals or whatever. So you could say give me, like, take, all of the emails that are from people in those categories and then train against all of those, right, so maybe it's not at the individual level. So there's all those.

Speaker 1: 1:02:30

It's the questions you would love to have answered as a marketer or as a customer service person or as a CEO. There's tons of questions that you'd love to ask. If you had this magical machine that you could ask any kind of question like that and get reasonably good data back, you'd probably be pretty into that. So that's everything you just said, I agree with 100%, and that's really.

Speaker 2: 1:02:53

The variables that you refer to are just the questions that you'd love to know the answer to. I love it and I love the analogy you gave at the top of the episode with looking through the little door hole. So if you feel like that's you listening to this episode or viewing this episode and you want to expand that view just a little bit more, hopefully you've left this episode with a few tips and tricks.

Speaker 1: 1:03:15

We will see you all next week after Utah. Thanks for tuning into Sidecar Sync this week. Looking to dive deeper? Download your free copy of our new book Ascend Unlocking the Power of AI for Associations at ascendbookorg. It's packed with insights to power your association's journey with AI. And remember Sidecar is here with more resources, from webinars to bootcamps, to help you stay ahead in the association world. We'll catch you in the next episode. Until then, keep learning, keep growing and keep disrupting.