From Hackathons to AI Mastery: Understanding Vectors and Embeddings | 35 Artwork

Sidecar Sync

Welcome to Sidecar Sync: Your Weekly Dose of Innovation for Associations. Hosted by Amith Nagarajan and Mallory Mejias, this podcast is your definitive source for the latest news, insights, and trends in the association world with a special emphasis on Artificial Intelligence (AI) and its pivotal role in shaping the future. Each week, we delve into the most pressing topics, spotlighting the transformative role of emerging technologies and their profound impact on associations. With a commitment to cutting through the noise, Sidecar Sync offers listeners clear, informed discussions, expert perspectives, and a deep dive into the challenges and opportunities facing associations today. Whether you're an association professional, tech enthusiast, or just keen on staying updated, Sidecar Sync ensures you're always ahead of the curve. Join us for enlightening conversations and a fresh take on the ever-evolving world of associations.

All Episodes

Sidecar Sync

From Hackathons to AI Mastery: Understanding Vectors and Embeddings | 35

June 20, 2024 • Amith Nagarajan and Mallory Mejias • Episode 35

0:00 | 48:31

Send us Fan Mail

In this episode of the Sidecar Sync, hosts Amith and Mallory delve into the fascinating world of vectors and embeddings in AI technology. They break down complex concepts into understandable terms, exploring how vectors represent data in multi-dimensional space and the transformative potential of AI, comparing it to the early days of the internet. Amith shares insights from a recent AI Hackathon, and they discuss practical applications for associations, such as professional networking and content personalization. Tune in to learn how these technologies can revolutionize data handling and provide personalized experiences for members.

🔎 To learn more check out this ‘OpenAI Embeddings and Vector Databases Crash Course’:
https://youtu.be/ySus5ZS0b94?si=sZTt_42WprgsiGvt

🚀 Follow Sidecar on LinkedIn
https://linkedin.com/sidecar-global

Please like & subscribe!
🇽 https://twitter.com/sidecarglobal
🌐 https://sidecarglobal.com

More about Your Hosts:

Amith Nagarajan is the Chairman of Blue Cypress 🔗 https://BlueCypress.io, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.

📣 Follow Amith on LinkedIn:
https://linkedin.com/amithnagarajan

Mallory Mejias is the Manager at Sidecar, and she's passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association spac

🎀 Use code AIPOD50 for $50 off your Association AI Professional (AAiP) certification at https://sidecar.ai/aaip

🚀 Help Us Grow Our Reach - Share With a Friend or Colleague!

👍 Like & Subscribe!
https://x.com/sidecarglobal
https://www.youtube.com/@SidecarSync
https://sidecar.ai/
https://www.linkedin.com/company/sidecar-global/

Amith Nagarajan is the Chairman of Blue Cypress, a family of purpose-driven companies and proud practitioners of Conscious Capitalism. The Blue Cypress companies focus on helping associations, non-profits, and other purpose-driven organizations achieve long-term success. Amith is also an active early-stage investor in B2B SaaS companies. He’s had the good fortune of nearly three decades of success as an entrepreneur and enjoys helping others in their journey.

https://linkedin.com/amithnagarajan

Mallory Mejias is passionate about creating opportunities for association professionals to learn, grow, and better serve their members using artificial intelligence. She enjoys blending creativity and innovation to produce fresh, meaningful content for the association space. Mallory co-hosts and produces the Sidecar Sync podcast, where she delves into the latest trends in AI and technology, translating them into actionable insights.

https://linkedin.com/mallorymejias

Exploring Vectors in AI Technology

Speaker 1 0:00

And that's the power of these vectors is that in high dimensional vector space, meaning thousands of numbers, we're representing concepts we cannot put words to. Welcome to Sidecar Sync your weekly dose of innovation. If you're looking for the latest news, insights and developments in the association world, especially those driven by artificial intelligence, you're in the right place. We cut through the noise to bring you the most relevant updates, with a keen focus on how AI and other emerging technologies are shaping the future. No fluff, just facts and informed discussions. I'm Amit Nagarajan, chairman of Blue Cypress, and I'm your host. A warm welcome back to everybody to the Sidecar Sink. We have another exciting episode here for everyone. Today we are going to dive into a very technical sounding subject, but I will promise you this if you stick with us for this episode, you will definitely learn something new, and it is actually not really a technical episode. We'll talk about why that is in a few moments. My name is Amit Nagarajan and I'm one of your hosts.

Speaker 2 1:05

And my name is Mallory Mejiaz. I'm one of your co-hosts and I also run Sidecar.

Speaker 1 1:09

And before we jump into the wild world of vectors, which is our topic for today, we're going to take a moment to hear from our sponsor.

Speaker 2 1:17

Today's sponsor is Sidecar's AI Learning Hub. The AI Learning Hub is your go-to place to sharpen your AI skills and ensure you're keeping up with the latest in the AI space. When you purchase access to the AI Learning Hub, you get a library of on-demand AI lessons that are regularly updated to reflect what's new and the latest in the AI space. You also get access to live weekly office hours with AI experts live weekly office hours with AI experts. And, finally, you get to join a community of fellow AI enthusiasts who are just as excited about learning about this emerging technology as you are. You can purchase 12-month access to the AI Learning Hub for $399. And if you want to get more information on that, you can go to SidecarGlobalcom. Slash hub Amit, how are you doing this week?

Speaker 1 2:04

I'm doing great. How are you today?

Speaker 2 2:06

I'm doing pretty fine myself. I know you had a hackathon at your house last week. Do you want to talk a little bit about what you all did and how fun and exciting that was?

Understanding Vectors in AI Technology

Speaker 1 2:16

Yeah, for those who tuned in the last week's episode, you might have noticed a little bit of audio quality problems on my end. I do have a pretty good internet connection up at my house in the mountains and I hosted seven developers for about a week up in the mountains of Utah and we had an awesome time. We were working on AI software development. So for those of you that don't know, I've been a software developer myself for over probably 35 years. At this point I can't even remember when I started, but it's been my whole life.

Speaker 1 2:44

Basically Still love it very, very much, and every once in a while I'll get together with a group of developers from across our family of companies and we'll build stuff, and that's what we did last week up in Utah.

Speaker 1 2:55

And, of course, when you get seven developers in a house cranking on code, they're all very much fueled by caffeine, but also by internet bandwidth, and so when Valerie and I were recording the episode last week, I had a couple of minor issues along the way with audio quality and connections, but we had a great time. We worked on multi-agent systems frameworks, which is a lot of words Basically means way smarter AI, getting all of our systems and technology ready for GPT-5 and GPT-6, which, of course, are not with us yet. But we always like to be a model or two ahead in terms of what we're planning for and essentially plumbing our technology systems so that when those models not those models specifically, by the way, on last week we have some really, really cool stuff that got built and planned out. So, plus, we're in the mountains of Utah, so it's hard not to have a good time up there.

Speaker 2 3:53

Sounds like it was a blast. Do you think that it's because of the hackathon in part that you were inspired to talk about vectors on today's episode?

Speaker 1 4:01

You know maybe. So I spend quite a bit of time talking to people about vectors because it's a topic that seems really like out there and mathy and technical and all this, but it's such a critical concept to understand. You know, with the CEO mastermind group that I run with our good friend and colleague, mary Byers, we meet once a month with about 30 leaders from across the association market to talk about strategy plus AI. That's our CEO mastermind group and in that group a couple months ago I led a presentation all about vectors and I did that then because I felt like, even though that group is decidedly non-technical for the most part, they really need to understand this technology. I think everyone needs to at least have a basic grasp of this technology and how they can take advantage of it.

Speaker 1 4:50

It's kind of like this If you were walking around in the 90s or early 2000s and you didn't understand what the internet was and it had this transformative potential to really change the way the world worked, ai is like that. Ai is even bigger than the internet and vectors and understanding vectors is a very important part of understanding AI. So I was excited to present that a couple months to the CEO group and I think our listeners are going to get a lot out of this episode if they just stick with us. So there's going to be a little bit of math that we're going to get started with, but I promise it won't be too painful and it's going to be very, very interesting.

Speaker 2 5:25

Yep, I will say the topic. Don't click away just yet. The topic sounds a little bit technical, a little bit intermediate. I think even if you do have an intermediate understanding of vectors, you'll walk away from this episode learning something new. But if you're a total beginner, stay with us. We definitely made this episode easy to understand. So, first and foremost and we've got to talk about vectors at its core what are they? They are numerical arrays that represent data in a multi-dimensional space. To visualize this, imagine mapping the characteristics of an object, like size, shape and color, onto a graph. Each characteristic corresponds to a dimension in the vector. For example, a two-dimensional vector can represent an object's length and width, while a multi-dimensional vector can capture a more complex set of attributes. In data science and AI, vectors transform diverse data types, like text, images and audio and video, into numerical formats that machines can process and analyze. So it sounds easy enough, I guess. But, amit, in your words, what are vectors?

Speaker 1 6:28

I think you did a great job explaining it. I mean, from my viewpoint, it is just a bunch of numbers. Each vector is a bunch of numbers. But the concept is essentially is how do we create a common format, a common way of representing things, a thing being a piece of content? It might be an article from your website, that thing might be a profile of a member, or it might be a product you offer or it might be a session at an upcoming conference.

Speaker 1 6:57

These items or objects or things are all complex. They all have many characteristics and essentially these characteristics are these attributes, or what we call dimensions in vector space. So it's a way of creating a common mathematical representation of literally anything that can then be compared against the other vectors, the other representations that are out there. So the reason it's so important is AI models don't actually understand the words we speak and the images we see. They have to translate it into some kind of representation that computer can understand.

Speaker 2 7:34

And vectors are the basic concept that all these AI systems work on top of, and so vectors are a set of numbers that can represent concepts, text, like a singular word, a singular image, or is it pieces of an image? How does that work?

Speaker 1 7:51

taste, texture, smell, like. These are attributes we can put words to. So if I wanted to categorize something like an animal and say, okay, the animal, is it furry or not, is it big or small, is it aggressive or not, is it a carnivore or not, you know, those are attributes that I can probably assign words in human language, and whatever human language I use, to the object. However, there's many other qualities or characteristics that can apply to anything. You know, probably very soon, in the near future, during this podcast, my dog's going to start barking.

Speaker 1 8:34

So you know that's an attribute of his personality that I can probably put in words if I wanted to, but it's going to be hard to, you know, to figure out what that is Like. Is the attribute noisy, not noisy, barker, non-barker, those things? So what ends up happening is there's a lot more information in our brains that we cannot represent in words, in human language, but vectors are capable of actually encapsulating all this other information. It's almost like saying, hey, that intuition, we have that soft stuff that we can't put into words that's what vectors are able to encapsulate beyond known attributes.

Speaker 2 9:12

That makes sense. So it's kind of on one hand, we're talking about attributes that we can put words to, because that's how we, as humans, communicate with one another. That's how we interpret the world is through language. But right before we started recording Amit, you said in fact, language is kind of limiting, it's the only thing we have to express ourselves. But in fact there's a lot more on Earth, in the universe, that we can't necessarily put words to, but that still exists and still has relationships to other objects.

Speaker 1 9:36

Right, and it's kind of like, if you take the classical example is if you take the video version of this podcast and then translate it to audio only, and then translate the audio only down to just the words and the transcript, you lose information. With each of those translations. Right, video to audio, you lose something you can't see us and there might be expressions in our faces or something in our backgrounds that would be information, right, and then you go to audio, it's less information. You go to text only, it's less information. But there's also other shifts in modality that are interesting to think about too. So think about, like the same text, you could speak it, you could sing it, you could turn it into more of a poetic interpretation of literally the exact same pieces of text. Right, where you can capture that. If you go upscale it essentially into audio or certainly video, right, so that information loss.

Speaker 1 10:30

Like we can express ideas in more than just words because we can speak, we can see, we can act out. Those are all part of the human experience. And then our memories take into account, of course, elements of what we recall in terms of words and images and audio, but really we ultimately mash that up somehow into our memories in a way that then ultimately forms like what we probably refer to as intuition over time. So really, what I'm referring to is, you know, with AI and with vector space, we don't exactly know how all of it works, which is part of what's interesting but vectors are able to have literally thousands and thousands of attributes which the AI systems and us are incapable of labeling. We can't go to these vectors and say, at position 500, that's what this particular dimension means, but we know it has some meaning.

Speaker 2 11:23

Is there potential for an infinite number of attributes in a vector? Yes, Wow.

Speaker 1 11:29

The models currently produce many thousands of attributes. But there could be, and the question is is there value to that right Beyond thousands, if you go to hundreds of thousands or millions of attributes, does that produce any scale advantages? I suspect that there will be, because there's more and more and more subtleties to everything in the world. So I suspect that as we get more computational power and these embeddings, models which we'll talk about in a little bit become better and better and better and can produce even bigger vectors, there will be upside to it.

Speaker 1 12:00

The question is is there a point of diminishing marginal returns if you go from 5,000 numbers to 500,000 numbers, to 50 million numbers in a vector? Theoretically it can be whatever you want, if you have enough compute, enough storage and so on. Will there be value from that? So I think there probably will be. In model sizes we see these so-called emergent properties. When you scale from GPT-3 class models to GPT-4 class models, new capabilities of the model quote unquote emerge where the designers of the model didn't necessarily know these things would be capable of these new capabilities, and that's a little bit different thing, but there's kind of a parallel in terms of model size with parameter counts that we talk about a lot and vector sizes. It's again a different concept that the scaling of vectors over time I think potentially could bear more fruit.

Speaker 2 12:51

Okay, this last question, just about vectors, is more for my beginners. You might think it's a silly question to me, but I don't think so. Where do vectors live? And in that sense, are they in the cloud? Where are these numerical representations living?

Speaker 1 13:06

I think it's an excellent question. So when you create a vector which we'll talk about in a little bit you run this specialized AI model called an embeddings model and that embeddings model basically creates a vector from a given piece of content text, audio, etc.

Speaker 1 13:21

Once you have that vector you can store anywhere. It's just a set of numbers, so you can put it in a file. You could put it in a specialized vector database. So there's a lot of ways of doing this, the vector databases that are out there. You can use a tool like Pinecone, which is the one that we like a lot. It's a very popular, very cost-effective database. It's not like a traditional database, though. It's specifically built to store these vectors. So in a traditional database where you have tables and columns and things like that, you could stuff a vector into one of those tables, in a field or in multiple fields.

Speaker 1 13:56

But it's not going to be very efficient because a relational database or even other types of databases are not designed for the kind of math that you want to do at scale. So if I have a vector let's say I have a vector for every piece of content that my association has ever created. I have another vector, a vector for each member, representing everything I know about the member, and I want to be able to compare all of my member vectors against all of my content vectors, and there's a lot of reasons I might want to do that. So let's say I have you know, let's just use a big number. Let's say I have a million members and I have a million pieces of content. So you're doing this comparison of pairwise comparison of every piece of content relative to every member.

Speaker 1 14:37

If you have it in an optimized environment, aka a vector database, that is a nearly instant comparison. It's measured in microseconds. Mic aka a vector database, that is a nearly instant comparison. It's measured in microseconds. Microseconds are millions of seconds. Compared to traditional computing, you measure in milliseconds, which is thousands of seconds. And so these algorithms are so highly optimized for vectors in these specialized vector databases that you definitely want to use a vector database store, and that's what we do across all of our products that I was talking about earlier. You definitely need to take advantage of these vector databases. Part of what's happened is vector databases. The concept used to be kind of esoteric, harder to deal with, expensive, hard to deploy. Now it's become really easy to do.

Speaker 2 15:21

So, in your example, with the 1 million members and 1 million pieces of content, if we looked at one member, that member would have a set of attributes which would be represented by a vector, essentially by a list of numbers. And then you would have a piece of content with the same various attributes represented by numbers, and then there would be a comparison between those two, like essentially how close in proximity those two are to each other, and that is how personalization would work.

Speaker 1 15:49

That's how personalization would work, that's how duplicate detection might work, that's how you might be able to do professional networking recommendations by comparing people vectors against other people vectors, and the list goes on and on, especially as you have new types of entities coming into this space. So I have two categories of entities coming into this space. So I have two categories of vectors in this example. One is a vector for every piece of content. Another one's a vector for every member. Well, what about a vector for every session at my upcoming annual conference? What about vectors for every product in my e-commerce system? What about vectors for every vendor that wants to sell products to our members? So I can do better matching of vendors at a trade show to the people that are attending the trade show, and on and on and on right no-transcript.

Assigning Numeric Values to Semantics

Speaker 2 16:37

Okay, quite exciting. I want to move along to embedding. So vectors can encapsulate various types of data, ranging from text and images to audio and video. Ai models create embeddings, which are long vectors that encapsulate the meaning of the content and context within the data, allowing for sophisticated comparisons and analyses. So can you. I feel like we just kind of touched on this, but just set the tone for how vectors and embeddings are related to one another.

Speaker 1 17:11

Yeah, I mean for purposes of this episode, they're really the same thing, okay. So a vector is a general purpose mathematical construct where you can use vectors to represent anything, and embeddings model is a special type of AI model that's not gotten a lot of press, but it's actually the workhorse underneath a lot of the language models people get excited about. So language models utilize embeddings to do comparisons and scale, and so the embeddings model is kind of like this separate model that basically converts a piece of content, whatever that is into a vector. So when people talk about, hey, I got the embeddings for this object, or I have the embeddings for this document, it's basically the exact same thing as a vector. A vector is just a more generalized term, whereas the embedding is this process of running it through this specialized AI model.

Speaker 2 18:05

Okay, so today's topic was mostly informed by a new chapter from Ascend 2nd Edition, and it was a great chapter. I've read it a few times now. There is an example you give within that chapter, or several examples where you talk about, you know, one word plus, another word minus. This word will give you this. I'm hoping you can share one or maybe a few of those examples to help contextualize what it means to assign a numerical meaning or a number to semantics, if that makes sense, sure.

Speaker 1 18:35

Yeah, and I think that the semantic meaning of a piece of content you know. You break it down and so you try to break it down to the attributes. It only can go so far, again, using language. But we'll do our best and we'll also try to keep it really simple. So if we, you know, work in the world of animals for a minute and say, hey, we want to classify different kinds of animals and we say something like we take a base concept like a kitten, and then we add an attribute called adultness and we say this kitten has a high adultness and maybe let's just say it's either on or off. So we took kitten and you add adult. What do you get? What's the outcome of that equation? Kitten plus adult equals Cat, exactly, and so cat actually theoretically could, I guess, refer to kittens and adults. But generally speaking in language people use cat to refer to an adult cat. Similarly, puppy plus adult equals dog. So that's a very simple example where you've kind of captured some semantics, where you combine words and that's. You know those are attributes essentially.

Speaker 1 19:35

So and then you can keep going. You can say, okay, well, those are two dimensions. It's like you know what is the species and what is the maturity level or the adultness, right? And of course these variables don't necessarily have single values. They're not like zero or one. They can have any value between zero and one. So you know you might say, oh okay, well, I've got a cat and that cat is nine months old. Is that a kitten still, or is it an adult? I think one year is when people say cats are adults. I think that's definitely true for dogs. And so how adult is my cat at nine months, right? So there's a little bit difference there between that and like a newborn kitten. So there are subtleties to this. It's not just on or off values. But then you might add another, a third attribute, like let's say, okay, but what about? You know how domesticated is this particular animal? So we take kitten, um, add adults. But what if you added wild to that?

Speaker 1 20:34

so you get kitten plus adult plus wild equals lion could be lion, could be bobcat, be tiger, could be something scary, right yeah?

Speaker 2 20:45

it's like I'm taking a quiz.

Speaker 1 20:47

Yeah, scariness, right, scariness attribute. And you know it's funny because I forget I was talking to you about this. But the other day I was just talking to somebody about cats versus dogs and I was like, you know, my cat is six pounds, eight pounds, whatever it is, and it's my daughter's cat. I'm a dog person but my daughter really, really wanted a cat, so we succumbed to that pressure a few years ago and the cat's great for the most part.

Speaker 1 21:13

But just watching the cat move is kind of interesting. It's so efficient and almost elegant in its ferocity, even though it's like, over the course of millions of years, evolved into this relatively harmless you know house cat but, um, you know that cat being even two or three times the size if it was 20 or 30 pounds, you know it's a lot scarier, even at seven pounds or whatever, than most of my the dogs have ever interacted with, because it's just built to kill. Um, it's just an unbelievably efficient killing machine. It's like nature's, you know, one of nature's finest examples of evolution as a carnivore. And I look at that thing and I'm like, yeah, I really don't trust that cat not to eat me if I was passed out.

Speaker 2 21:53

So good thing it's low on the wild, the wild scale at least.

Speaker 1 21:57

Yeah exactly and, and at least on the size scale, it's small. But if you added even like a couple of decimal points there, I'd be worried. But that's the basic idea is you try to break down concepts into their attributes and you do that at scale like we run out of words. After 10, 20, 30 different words, we're probably kind of out. We're like oh, how do you compare my cat to the next door neighbor's cat, even if they're both the same breed, the same age? The to the next door neighbor's cat, even if they're both the same breed, the same age, the same size? They do similar things, but there's definitely differences. What are the words beyond color, beyond shape, beyond eye color, all these other attributes we can put? We run out of words after a while.

Speaker 1 24:00

And that's the power of these vectors is that in high dimensional vector space, meaning thousands of numbers, we're representing concepts we cannot put words to. And that's really where I don't know if this is truly what a neuroscientist or a philosopher would say. But I think this is where we're kind of starting to border into what we normally classify as human intuition, where it's the things that we just know. We know that we know something right. We have a feeling that this makes sense. We have a feeling that these two people should connect.

Speaker 1 24:27

We have a feeling that this upcoming session at this event is gonna be a great fit for Mallory, but I don't really know why. I mean, I kind of do. There's certain things I might say hmm, yeah, I really should tell Mallory about this upcoming session at the conference because I think she'll really enjoy it and get a lot out of it, and I can tell you two, three, four, five reasons why. But there's something else, right. There's just that feeling you have. Well, where is that feeling coming from? Well, the AI's equivalent of that is high dimensional vector space, and that's where some of these unbelievably amazing recommendations are coming from. In recommendation systems, whether it's for commerce or content or professional networking on a platform like a LinkedIn, that's what these types of systems are using.

Discovering Bias and Vector Databases

Speaker 2 25:09

So, for the sake of the example, with the kitten plus adult equals cat example you mentioned, we're talking about these attributes as zeros and ones on and off, yes and no's. But in an actual vector, I'm assuming each of these attributes is a spectrum because they're represented by numbers. So we don't have to say a kitten is a kitten based on this nine month example or 12 months or whatever, but that each attribute has a spectrum. So it's not. So black and white is what I'm saying.

Speaker 1 25:37

Yeah, it's not zeros and ones, it's. Those are like binaries of zeros and ones. These vectors are far richer than that. They're, they're very large numbers, basically, and they're represented between zeros and ones, but they're, they're very. They're very large numbers, basically, and they're represented between zeros and ones, but they're very. They're like the decimal. Number of digits past the decimal is incredibly high. So these are all very large numbers that each represent quite a bit of meaning. So how adult is my cat? At nine months, you know? Is it 0.75 on that scale, versus like a five week old cat? Is, you know, 0.01, or something like that?

Speaker 2 26:11

And how does an AI assign meaning might be the wrong word, because that's how we're understanding it, but how does it assign a number to an attribute, like from the beginning?

Speaker 1 26:17

Yeah, so I mean the way these embeddings models are trained is based on a large corpus of content, just like all other AI training is. So it's like based on this idea of saying, hey, if we feed a lot of information to a neural network, the neural network will kind of tease out these types of attributes and knowledge and learn from it. And then that type of model is specifically a model that's designed to come up with vector embeddings that match content elements. So it's based on model training, which is very similar to how an LLM might be trained or a specialized machine learning model. It all works in some of the same types of principles of how you train a neural network, and these embeddings models are designed to be very fast, very efficient. They're much, much smaller than large language models, but they form a critical component of the concepts, of what makes it work.

Speaker 2 27:04

And then one of my last questions for this section is this an area where bias becomes apparent and harmful? Because I'm thinking assigning a number or numerical value to catness or adultness is simple enough, but I could see that being quite harmful when looking at things like gender and race and orientation. So is this the part of AI where we're seeing bias appear?

Speaker 1 27:33

There's absolutely potential. It's based on the quality of the training data, and that's an area of deep conversation that is happening in the AI field and should be so. If I have a corpus of content that I use to train a model and that corpus of content says that, for example, certain types of people have a higher level of credibility, right, professionalness might be an attribute, and, let's say, there's biases in the training data based on race, gender, age, whatever that may be. The model is going to have those perspectives when it produces these embeddings. Now, ultimately, these embeddings don't tell you something about the object in question, but the embeddings in turn are used by other models, so it can compound potential biases. So it's definitely a factor.

Speaker 1 28:15

Biases are issues in all machine learning models of all kinds. Data bias is one of the fundamental things that data scientists have to look for. The basic thing is that all data is biased. All of us are biased. The question is, how much transparency can we add to our systems and processes to look for bias?

Speaker 1 28:35

And one of the greatest opportunities is that, since there's such a rich ecosystem of AI products and models to choose from, we could potentially have models check each other's work in a sense. So you can say, hey, I'm going to take an embeddings model, and not just the embeddings model, but like the full stack of these models from an open AI, let's say, and I'm going to also use Claude from anthropic or Gemini from Google or llama three from meta right, and I'm going to have these models kind of form an adversarial relationship in a technical sense, where they're they're checking each other and testing each other and looking for biases at scale. So, on the one hand, ai brings out the worst of our species where we have these deep biases All of us do but it also provides us a tool to discover these biases better than ever before. So but the embeddings model, you've hit on something really important, mallory, is that if the embeddings model's training set has biases, the embeddings will reflect those biases.

Speaker 1 29:33

By themselves that doesn't really mean anything, but as other models utilize these embeddings or vectors to make decisions or make recommendations, that can definitely be a factor. So those biases might, you know, affect how I recommend content or how I personalize professional networking suggestions at my conference. So those biases might affect how I recommend content or how I personalize professional networking suggestions at my conference. So definitely something to be thoughtful about.

Speaker 2 29:54

Okay, so we touched a little bit on vector databases, which house, of course, vectors. Can you help contextualize for listeners how vector databases can allow associations specifically to combine and manage all their types of data text, image, audio and video.

Speaker 1 30:10

So the vector database serves a hyper-specialized role, so it doesn't replace your relational database, which is probably what you have underneath your AMS or membership system. It doesn't replace a SaaS application like a Salesforce or a HubSpot. It sits side by side with these kinds of systems, so it's an additional data store, so it doesn't replace an existing database. Vector databases do not do a good job of saying, hey, I want to search for a member by phone number or I want to look up a product by name. They're terrible at that.

Speaker 1 30:43

Theoretically, you could stuff that type of information into a vector database, because for every vector you put in, you can tag it with attributes called metadata and then you can search on the metadata. But it's not designed for that. It's designed for vector math. And then, once you get the vectors back, you can then look at these tags to find out other information. That's how you connect the vector database back to your other databases. But vector databases do not replace traditional SQL or relational databases. They don't replace content stores like websites or SharePoint or things like that. They're designed to accompany those systems.

Speaker 2 31:19

And they are better at housing unstructured data in comparison to other databases that an association might have.

Speaker 1 31:26

Well, vector databases actually can be kind of the intersection of all types of content in a sense. So let me give you an example. Let's say I have an AMS and my AMS is highly structured. I've got member data, I've got committee data, I've got information about transactions, financial transactions. That's highly structured data, right. And then I have, let's just say, a content management system and I have 100,000 historical articles from my association there and that's very much unstructured data, right, it's just text, images, maybe audio, but it's essentially unstructured data on my website and my CMS. So my CMS has all my unstructured content in this example and my AMS has my structured content.

Speaker 1 32:10

Now, vector databases don't really care, they're just storing numbers. So if I can convert my unstructured content into vectors which I can, right, I can run each of those 100,000 pieces of content systematically through an embeddings model which will produce a set of numbers, the vector, and I can stuff that vector into the vector database. Right, so I can take my unstructured data and convert it to vectors and put it in a vector database. I still need my unstructured content because the vector database doesn't actually have the content in it, it just has the vector representation of that content. Then, on the structured content side, the structured data side, my AMS I can convert that structured data into a format where I can then stuff it into the embeddings model too. So I can say hey, here's Mallory, here's everything I know about Mallory, here's her professional background, here is her email address, here's where she works, here are all the people that she knows. Whatever I've got right, here are all the articles that Mallory has clicked thumbs up on in our online community or whatever posts that Mallory's had. I can take the aggregate of what I know about Mallory and I can feed that to an embeddings model and I can get a vector back that represents Mallory and that's taking structured data right from the AMS or the online community and converting it also into a vector. So then I have a vector for Mallory and I have a vector for 100,000 pieces of content. I can compare them in the vector database.

Speaker 1 33:40

But to come back to your question, is the vector unstructured or structured content? Vectors can represent everything. That's what's so powerful is they're the intersection of the world of unstructured content and the world of structured content. They're kind of like an intermediary between your structured and unstructured content, and they can also help you connect different sources of both structured and unstructured content. So let's say, for example, I've got a website, maybe I have five websites, maybe I also have not only an AMS, but I have a marketing automation system and I have an event system. I can get vectors out of all of those systems, or I should probably say, not get vectors out of them, but I can get the data out of those systems, convert them to vectors, put them all into a single vector database and then now I have this connective tissue, if you will, that links these things back. Does that kind of make sense at a high level?

Speaker 2 34:35

It does. I think this is finally starting to click, even for me. So you can take your unstructured data and your structured data, run those through an embeddings model and then get vector representations for basically everything. You have every piece of data that you have From there. I think where we really need to spend the end of this podcast is talking about the why. So like, let's say, an association did exactly that they have all these vector representations of every piece of data they own. Then what?

Speaker 1 35:02

Well then it comes to. So this is the enabling technology. So vectors as part of AI, it's a general purpose tech. It's an enabling technology, it's not an application. So, to your point, it's like so what? Who cares? Why should I pay attention? So if you've been with us this far and you kind of understand the concept, the next thing is exactly what Mallory said, like how can you apply it?

Speaker 1 35:24

So the idea of a recommendations engine or personalization engine, this concept has been really exciting, yet elusive for associations for a long time. You know, going back to when Netflix first went from the mail order business, where they had those red envelopes showing up with DVDs at your house, to online streaming, they actually had a form of personalization, a very rudimentary machine learning approach, even back in the days of the mail order business, where they recommend on their website which DVDs you should put in your queue, and that was based on crowdsourced data. And then the online version was their first version of streaming. So, but back then, only companies of the capital and tech caliber of a Netflix or an Amazon could do this kind of stuff. And when someone like Netflix or Amazon did it, and they invested 10s or even hundreds of millions of dollars in years of time, with world class data scientists, to just do basic stuff. They ended up with a very narrow model, meaning they could just recommend DVDs to people. That's the comparison they could make right Highly specialized, ridiculously expensive and basically inaccessible technology. So the concept was even starting to formulate back then, and well before that too, but the practicality of it was very limited right Back then, 10 years plus ago.

Implementing Vector Databases in AI

Speaker 1 36:44

The thing that's happened is we've had these six month doublings of AI, continually going along with Moore's law, compounding and compounding the power and lowering the cost and improving the capabilities. So now we have general purpose capabilities in these embeddings, models and vectors. So we can now say look, we can do recommendations from anything to anything. We can compare and contrast any entity or any object, whether it's content or database information, against each other. So that's where the applications come into play. Remembering that this is it might be technical sounding, but there are ways of doing this even for a decidedly non-technical association. You might need a little bit of help, but it's not millions of dollars, it's probably not even hundreds of thousands of dollars to do this now. It's accessible to a lot of associations today and it'll just be getting easier and cheaper. So, coming back to your question, some applications that I think are really exciting.

Speaker 1 37:39

My favorite one, probably, is professional networking. So how can we do a better job of creating, fostering, nurturing meaningful connections within our community, right, how can we enrich the lives of our members by connecting them with each other in a deeply meaningful way? And you know we've talked about this in the past and this idea of like look, in your life, have you ever been connected to someone that's made a big difference, professionally or personally or both? And most people have stories to tell where they say, yeah, you know, a mentor of mine connected me with so-and-so, or I was connected personally with a friend or future spouse, or whatever that situation may have been. Can we do that at scale as an association?

Speaker 1 38:21

Associations have always been in the business of professional networking, but it's always been a combination of just kind of brute force and luck, where you just put a lot of people in the room that have similar interests and you hopefully get a good mix out of it, and the extroverted types oftentimes do better in those settings, the introverted types less so. But how can we replicate that intuition at scale? I think professional networking is one of the highest priorities I see of leveraging this technology. So to me, that's application number one. Obviously, content personalization is another one where you can feed better content to people with lower friction, and there's many others.

Speaker 2 38:59

I'm wondering is there any sort of reinforcement feedback loop within your vector database? Because I'm assuming the vector database is only as effective as the embeddings model. So what happens in the event that two vectors are close in proximity to one another? Maybe it's a piece of content and a member, and so you recommend that piece of content to that member, but maybe they don't love it, maybe it actually wasn't a good fit. Is there any way to feed that back in?

Speaker 1 39:25

So, on top of the AI's kind of intuition of similarity comparison, you have to layer in some additional intelligence into your systems.

Speaker 1 39:33

What you're referring to is like this feedback loop of reinforcement learning or other techniques where you can say, hey, let's take a thumbs up, thumbs down type of feedback whether it's actually a thumbs up, thumbs down an app or something else and let's store that data and then that way we would train the next iteration of our models.

Speaker 1 39:50

On top of the embeddings, we can do a better job of feeding content, because the decision to recommend or not recommend is based on the embeddings and the proximity and vector space. We've been talking about this whole episode, but there's more to it than that. If you just simply say, hey, I'm going to take the closest vector to Mallory to recommend to her, but I don't recognize that it's already someone she works with and talks to all the time, you and I do a podcast together every week. We talk several additional times per week typically, so why would that recommendation make any sense? We already know each other, obviously, so it would be kind of a useless recommendation. So you have to layer in other what we'd call deterministic concepts on top of the AI, so the.

Speaker 1 40:32

AI might not know that you and I work together, but our database system should know that or might know that. Or maybe if the database systems didn't know that and you gave me a thumbs down, then we know that and we can infer from that something else. Oh, actually, they're too close in vector space. They're too similar in some ways, right. So there's definitely more to do than just simply taking the closest possible vectors and just throwing them out there and seeing what happens. There's more work involved in that and that's obviously beyond the scope of what we can talk about here. But implementing these ideas requires quite a bit of planning.

Speaker 1 41:07

The reason I think this is such an exciting thing to think about is it's not because it's a press a button and done kind of application. There's more implementation work. There's definitely planning. There's some dollars involved in getting it right as well, but it is very approachable compared to what it was, even five years ago. Five years ago, this would not have been a conversation really for any association. This was still in the realm of, like, the Fortune 100. And now it's available and there's still work involved.

Speaker 2 41:34

So we talked about some use cases, the really exciting ones like professional networking and content personalization. I do know as well. In the chapter that you wrote in Ascend we talked about duplicate detection and eliminating inefficiencies in member records, for example. Can you talk a little bit about that, maybe the less glamorous use but might be exciting for some listeners.

Speaker 1 41:53

I think it's a fantastic use. So if I'm able to create vectors for, let's say, all of my member records, I should be able to use those to find similar member records and then from there I might be able to identify on a nearly automated basis the duplicates. So that has been a pain in the side of every association since the beginning of time, and not just associations, but every company on Earth has duplicate data problems. So data quality problems in general, but duplicate data being kind of one of the core issues people struggle with mightily. And the problem is is how do you identify these things without a person looking at these two records side by side or these five records side by side? So you know, if I have two people that are kind of similar but they're a little bit different, the traditional database programs we've had don't pick up on that. So they might look for absolute matches on first and last name and title and employer and phone number and email, those kinds of things. We say these are absolute duplicates and they might have some so-called fuzzy logic to say, ok, well, let's look for the first five characters, the first name matching, and an absolute match on the last name. People come up with all these ideas for, like, how to do potential duplicate logic. That's been around since the 60s, you know in terms of duplicate data management, but it's still reallys.

Speaker 1 43:06

You know in terms of duplicate data management, but it's still really weak, you know it gets thrown off super easily and so what ends up happening is associations and other orgs kind of give up. They, if they notice a duplicate, they might, if their system allows, they might try to merge it. And, by the way, a lot of database systems are not good at handling that scenario where they're able to merge duplicate data together. That's another problem. But just identifying the dupes is something that backers can be really good at Because, again, since they encapsulate potentially 1000s of attributes of each of the records, they're able to look beyond just the data that you know a traditional database might look at.

Speaker 1 43:39

So one of the an example would be like if I go to your typical member services associate and an association and say, hey, take a look at these two records for mallory, is this the same mallory? Are they different mallories? Um, a typical, you know kind of even entry level membership associate type person would be able to figure that out with a reasonably high degree of accuracy, because they probably look at it and say, well, um, maybe they're the same person, but this one lives in atlanta, this other one is in New Orleans, so maybe not. But then if they just look a little bit further and say, oh wait, hold on, both of them say that they're the manager of Sidecar. Maybe they are the same person and Sidecar isn't. You know, it's not Coca-Cola, it's not General Motors, it's a smaller company, so therefore probably it's the same person. So those kinds it's like partly a little bit of knowledge encapsulated there, but partly also it's intuition and it's knowing to look at the employment history or looking at the educational backgrounds.

Speaker 1 44:35

It's unlikely that the two Mallories have the same degree from the same university, yet they're different people, right, and we have a lot of this data in association databases. Or we can buy it from third parties, right, we can enrich our data, which makes it much more likely we'll find these dupes. But it's a fantastic use case. Again, it's going to require some thought and some work to integrate this technology into your system. You can utilize something like a common data platform, um, you know, like the member junction, open source, cdp we've talked about a lot, um. Or you can try to build this on top of an AMS, if you want. I wouldn't probably recommend the second one because AMSs move kind of slow in terms of their evolution and they're a little bit harder to work with usually. But doing this with your data once you have it in a CDP, becomes much easier.

Speaker 2 45:19

Well, you pretty much. As you were talking, I started thinking of the CDP and I was like well, wait, we can deduplicate records out of the common data platform as well. And then was realizing I don't quite have the grasp on. Do these sit side by side, the CDP and a vector database? Do you take the data in the CDP and convert it into vectors? Is that more of the process?

Speaker 1 45:39

Well, that's certainly that's what Member Junction does specifically.

Speaker 1 45:46

Not all CDPs do the same things, but in the case of Member Junction it's built as an AI native CDP from day one.

Speaker 1 45:51

So Member Junction specifically has a facility within it where you literally just click a few buttons in the admin console and you can turn on or off vectorization for any part of the database.

Speaker 1 46:03

So you can say, hey, I have all my member data flowing in from my AMS directly into the member junction CDP and I can set up that particular area of the CDP to be what we call auto vectorized. And with auto vectorization the software is already there to automatically take the data from the structured data, convert it to what we call a synthetic document because it needs to be converted into a format that an embeddings model can understand, feed it to an embeddings model, get the vector back and then put that vector into a vector database and that's all done automatically through the open source, freely available member junction software. So we're pretty excited about that part, because you can kind of we like to say when we talk about mj it's you can auto vectorize the world right, and you can easily vectorize all sorts of other content and then you can bring them together through what we just talked about the vector database, essentially being the connective tissue between the unstructured and the structured world.

Speaker 2 46:58

That makes perfect sense. So, amit, for people who are still with us, who are not scared anymore by the idea of a vector database and beddings, what advice do you have for them in implementing something like this?

Speaker 1 47:13

Zero people have left this podcast midstream Mallory, I'm sure, In fact, they've been so excited about what they've been hearing us talk about that they've brought other people that they're nearby to listen in on them as the podcast has gone along, to listen in at them as the podcast has gone along.

Speaker 1 47:28

I think what people need to do as a next step is just dig slightly deeper than this If you're interested in it. You can only get so much from one source and from one format. So maybe do a YouTube video search on vector databases and we'll post a couple of links to some really good tutorials. Maybe read the upcoming edition of ascend. We plan to have it out before august 1st. It has a whole chapter on it. We'll also be posting more articles to the sidecar website. But learn a little bit more um. Reach out on the sidecar community, which is just communitysidecarglobalcom. Have conversations.

Speaker 1 48:01

We have a lot of our ai experts from across the blue cypress family as well as many community members who are deep in this stuff involved and can answer questions there.

Speaker 1 48:10

But do something with it, just play with it, learn more and then think about one very small use case where you can run an experiment. Right, that's what I always try to reduce these things down to is it's the concept is powerful, but it's only powerful if you do something with it to really learn it. You know I understood kind of the math and the theory behind this for quite a while, but until really probably in the last 18 months, until I actually dug into it and was working with examples, that I really understand the potential for vectors in this space and why I've been out there really advocating for everyone to learn a little bit about it. So you have to get in there and experiment a bit to learn a little bit about it. So you have to get in there and experiment a bit. So that's what I recommend is get started with a little bit more education and then pick one simple, easy use case and go play with it.

Speaker 2 48:55

I think that's great advice. I hope you all have enjoyed this episode. I certainly did. I had the opportunity to attend the CEO mastermind session that Amit mentioned earlier in this episode, and I've heard about this stuff a few times, but I will say this episode was a really big click moment for me. So thank you, amit. I hope it was the same for all you listeners and we will see you next week.

Speaker 1 49:18

Thanks for tuning into Sidecar Sync this week. Looking to dive deeper? Download your free copy of our new book Ascend Unlocking the Power of AI for Associations at ascendbookorg. It's packed with insights to power your association's journey with AI. And remember, sidecar is here with more resources, from webinars to boot camps, to help you stay ahead in the association world. We'll catch you in the next episode. Until then, keep learning, keep growing and keep disrupting.