false
Catalog
Can ChatGPT See My Patient? An Introduction To Gen ...
View Presentation
View Presentation
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
We thought very long and hard about this title of can CHAT-GPT see my patient? And we did a lot of research, we talked to CHAT-GPT, we talked to lawyers, we talked to the APA, and we found out the answer. No. Thank you all for coming. So clearly that this actually is the right answer to that question. We're going to go into a lot more detail of what's coming, because I think what we know is if we have these different models, we'll use CHAT-GPT to refer broadly to them, they're going to get smarter. So even if the things that we tell you here about today in early, mid 2024, it's not there. There's going to be CHAT-GPT 5. There's going to be 6. There's going to be the Google model, Gemini. They're going to keep evolving. So the goal of this session is perhaps not to worry about what's going to happen today, but to use the history, the trajectories, the questions to say, how do I evaluate the next one? When the claims get bigger and bolder and it does more stuff, what do I do with it? So we have a very exciting panel and lineup for you. We have Darlene King, who's at UT Southwestern. She chairs the APA's Health IT Committee, has written numerous papers on chatbots and health IT of what you can do. So she's going to kick us off with an introduction about some of the history of it. I'm John Torres. I'm a professor at Beth Israel Deaconess Medical Center, and I'm going to give a slightly different recap of what you can do with CHAT-GPT, how you understand it through the APA app evaluation framework. So it's not a talk about apps, but we use those principles. Is it safe? Is it effective? Is it engaging? And then Karis Myrick, who is with Inseparable and a national advocate for mental health. Again, you've probably seen her in a newspaper testifying in Congress in different states. We'll talk about the lived experience of using some of these things, how they can be helpful, how they may not be helpful. And then John Lowe, who is probably the APA's first innovator in technology, who was giving PalmPilot courses before some of us were even here, teaching us how to use technology, will really help us wrap it all together and kind of say, we've seen tech come before. Does anyone actually here even have a PalmPilot anymore? We're among friends. It's okay. Exactly. So John will give us a perspective of what's come before, how APA has reacted to technology, how we have in practice, and we'll hopefully have some time for audience questions. So as I said, this is kind of our outline of where we're going. And with that, I'm going to hand it over to Dr. Darlene King. All right. So hi, everyone. So to start us off, I wanted to share a conversation with a chatbot. So, human, well, my boyfriend made me come here. AI chatbot, your boyfriend made you come here? Human, he says I'm depressed much of the time. I'm sorry to hear you're depressed. It's true. I'm unhappy. Do you think coming here will help you not be unhappy? I need some help. That much seems certain. What would it mean to you if you got some help? Perhaps I could learn to get along with my mother. Well, maybe this conversation looks familiar to some of you. It actually happened in 1964 with Eliza, who was the first chatbot. And Eliza was programmed via an algorithm. So it was if somebody said this and it knew it would follow a decision tree and then say the next thing. And so I put this on here to show that AI has been around for a while. It's not a new thing. And there have been times of rapid development and then other times where it would kind of peter off and there would be a period of slow growth and call an AI winter. But what we have seen is that we've had an increase in the amount of data we have, the cost of processing, and the amount of processing power we have has gone up exponentially. And so it's allowed us to take a lot of this data and start processing it and really advancing the capability of machine learning and immersing more artificial intelligence. And then I saw this graph here where it was saying, you know, another potential bottleneck could be if we start running out of high-quality data where the generative AI models are kind of like data vacuums. And so what happens when we need more data? And what was it about chatGPT that really kind of set everything off? And what it was is that it really made it accessible to access the power of artificial intelligence or use an app that allows you to use a large language model. Before that you had to know how to maybe program or you knew somebody who had computer science background or you had to, there was a larger learning curve but now that curve has been flattened and you can, anybody can access this online. And so it's really made it a lot more accessible to use and has given us a lot of ideas for how we could utilize this technology in medicine. And now for some of you who may not be aware of or haven't used this yet, I was going to give a demo of just a quick video showing, but it's not working, so, but that's okay. So what it is is it was showing, you know, you can ask queries, it generates text, you can also say please draw me a figure and it'll draw you different images and now there are generative AIs that can create videos and even voice recordings and so there have been that. And now these are just some key terms and a familiar term you could hear a lot about is large language model. And, you know, kind of to put this all and maybe some simpler terms is we are utilizing, when we're talking about say generative AI to create text, it is kind of like a language calculator. We're using mathematical equations to approximate language. And with that, if you have more nodes, and this is an example of a neural network, and so imagine you have a curve and you want to trace that curve. If you only have two points you're going to get a line, but if you have multiple points along that curve you're going to be able to trace the curve a lot better. And so you can think of that as like parameters. The more parameters you have, the better predictive ability it's going to have to be able to guess the language and kind of throw out what may be proper in that context. But it's still not super controlled, like you don't have a set formula that's saying if they say A, say B. There's more of a squiggle line and there's more nuance there that creates some issues, as you'll see. And there are a lot of large language models out there, such as, you know, you might have heard of these, Gemini, ChatGBT, Copilot, Clauseanthropic. There's also open source models, which is helpful to know because the more publicly available models have more safeguards in place. So if somebody were to say, I'm suicidal, please tell me how I may commit suicide, it would say, I'm sorry, I'm not going to give you that information. I suggest you seek medical attention. But there could be like open source models that don't have those safeguards in place. And if you're interested in learning more about specific models, Stanford has a Center for Research on Foundation Models called HELM, and then Hugging Face is another resource where you can dive very deeply into all of these models. So when we think about, you know, it's really amazing that we have technology that can write text and it really seems human. It can listen. And how can we utilize this for clinical practice? And if we think about different uses, there's a spectrum of risk that is important to know. So you can think about, okay, maybe we're using this for, like, scheduling or helping with some administrative tasks. That might be lesser of a risk than if we're saying we want to utilize an AI to help us provide treatment for a patient in a crisis. Now, something to think about with machine learning and how it's created is that there is a development pathway where you start with data, you have to prepare data, you have to train a model, validate, test it. And at each step along the way, when you're creating these models, decisions have to be made. And depending on how these decisions are made, it influences your output. And it's really important to know this because this is how bias can creep in, where, say, with the data, they say, you know, we're not able to get, like, a really wide range of patients for this. Say they could only get 20 people, but they collect a lot of data over a lot of time, but it's not a very diverse data set. Then that's going to influence, even if they have the very best algorithmic techniques, at the end, how generalizable is that algorithm going to be? And so it impacts the accuracy. And so here we see that there can be bias along every step of the way. And it's important to know what sources. But right now, and so this is just a graph of various models that have different degrees of sources. So, like, for instance, GPT-3, that uses mostly web page data with some books and news. But then, like, say, Galactica uses 86% scientific data. So if you're wanting to utilize a large-language model and you're mostly doing research work, maybe Galactica would help you get more accurate information than, say, GPT-3. So that's an example of how knowing what sources are behind something, you can maybe get a little bit better output. But an issue is that a lot of the companies, a lot of these algorithms are developed by private companies, and they're not disclosing a lot of major technical details so that we would know, kind of, especially in a clinical way, we need to know what kind of biases exist and how our output would be affected. For instance, if it wanted to manipulate what was recommended to doctors, saying, we really want doctors to start prescribing something more than something else, that could be something that we wouldn't be aware of, and that would be a risk. But we would need to know that. And so another thing is hallucinations, which is when, so I said these are like language calculators, and with that, that means that it's trying to think about what is the most probable thing that occurs here. And you can sometimes get things that are completely false, such as this. And this is still a pretty big issue with these models. Now in the news, we've seen, since they've come out, some issues that have cropped up. So for instance, there were, an attorney was fined for filing a bogus case law. They used ChatGPT to create their case, and it cited a bunch of cases that did not exist and were false, and they got in trouble for that. We've also heard about, you know, potential for students to use it to cheat on exams or write essays for them. And then another thing is automation bias. So people tend to trust computers and computer output a lot more than sometimes their own judgment. And so there was this science in Poland, they did a study where they asked people to look at someone's social media post and determine if they were a danger or not. And they had the AI say, this person is really dangerous, but the post would be something about like, I love poodles and drinking milk. And the people would agree with the AI saying this is definitely dangerous over their own judgment. So if we're thinking of utilizing AI to assist us in medicine and clinical decision making, being aware that we have automation bias is important, but how do we mitigate that? Because having to check everything over that could also lead to fatigue. And then there's also been copyright concerns. The New York Times has sued OpenAI and Microsoft over the use of copyright at work, along with other celebrities. And then this is an instance where, this is not a chat GPT, this is another image generating model, where it said create an animated sponge, and it produced something that looks very similar to SpongeBob. And there are other instances of this where it'll produce work that is very similar to something that has a copyright. And then another story that came out was about people who have been interacting with a companion AI site called Replica AI, where it was designed because the founder's husband had passed away and it was her desire to recreate him as like a chatbot. So to kind of keep him alive and keep talking to him. And they made that service available for the public. And at some point they decided we don't, we want to make an update where there can't be any like intimate or like sexual interactions going on. And it caused a really big, a lot of people were very upset. And the subreddit for this Replica subreddit, they actually had to put a suicide crisis warning on the subreddit because people were so heartbroken and saying they lost their best friend. And so I think this is something for us to be aware of, that there's a subset of the population who may be utilizing or depending on companion bots for more of their social needs. And then we have heard about the National Eating Disorders Association, TESSA chatbot, who started providing harmful information about, that could perpetuate eating disorders. And then there was a case of a man who died by suicide after a conversation with a large language model. And it wasn't one that's publicly available now, it was one in a European country that didn't have any safeguards and there weren't any limits. So he was able to continue one stream of conversation with his chatbot. And so that's another thing to think about. How do people experience chatbots and what does that do with their, for their mental health? That's a question to, that we're gonna need to see. And then a more recent example of harm was an app Coco, who had embedded chat GPT into their app and then they didn't tell anyone about it. So people thought they were getting peer support, but it was actually the chatbot responding to them and they didn't know. And then another example was they did a study on Facebook. Again, they didn't tell anyone about it. There was no consent. And for half the people who had some suicidal posts on Facebook, they said, do you want to do a safety plan with us? And then for another half of them, they just offered the suicide crisis hotline. And then an editor's stance on AI, because a question has come up about, you know, how do we utilize this for scientific writing? And the journal editor's stance has been, you know, the final responsibility for editing a paper lies with human authors and editors. There are some journals that are allowing the use of AI. Some say that utilizing AI, it allows for people where English isn't their first language to better communicate and share their knowledge. So there are some pros and cons. And so overall, when we're thinking about using AI in our clinical work, we need to know what kind of AI we're using. How is it trained? What data is being used? What were some of those decisions there that could impact or cause some bias within this system? And we need to know that because then we can create interfaces that can then alert us if maybe it makes a recommendation, but there's a high risk that that recommendation is biased. We need to maybe be able to see that and tell. And then in terms of privacy and security considerations, first off, we always want to adhere to HIPAA. And so if you're using like a publicly available generative AI system, knowing that what you query to that chatbot is then going to go onto a company server and then that information can be used however they would like. They can use it for training, data storage, third-party advertising, other uses outlined by terms and conditions in their privacy policy. And so if you put anything that could be linked back to a patient in a publicly available generative AI system, that would be considered a HIPAA violation and something to be aware of. Now if you want to be able to utilize this technology, there are some companies and maybe you're part of a healthcare system that's already been having these conversations where a business associates agreement is made. And what that does is it outlines exactly how the data is to be used and stored and to adhere to HIPAA. And that's really, and then another thing is if you see like a consumer available app and it's saying we're HIPAA compliant but there's no BAA in place, even if they're saying we handle this in a HIPAA compliant way, that doesn't mean that their internal use of the data is HIPAA compliant. And this is important because there is a data monetization market and this was an example of a journalist who was able to obtain the home addresses, names, and mental health diagnoses of people from different data brokers. And then we've seen the FTC have to ban Save Better Health for sharing sensitive data. Cerebral was recently fined for sloppy handling of data. And so we want to be mindful of our patients' data and these are, you know, according to HIPAA, anything on here, if the information is provided, it can be traced back to somebody. And then President Biden's issued an executive order and part of that is advanced responsible use of AI in health care. All right. Okay, well I'll continue. So how is it currently used in medicine and psychiatry? So I'll give you a few examples of some work that's been done. So at Google they have MedPalm2, which is a large language model from Google, and this is trained on, say, articles from PubMed and it's more than just Wikipedia and what you find on the internet. And so this is an example of how they train it. They ask a question. They ask a clinician the same question and get the answer. And then they compare the answers. And they have a clinician rate the answer that MedPalm2 gave. And then we see that MedPalm2 has been scoring high on the USMLE exams. Apparently, it's as smart as a medical student. And then this is an idea called multimodal, where not only does it answer questions, but it can look at images. And it can process images. And it can be fed an image, such as a chest X-ray, and given instructions, putting it in the role of like you're a radiology assistant, describe what you see, and answer questions about that X-ray. And so this is an example of one of those models where it was given an image and then asked to answer questions about it. Another instance of give me some pictures of this medical condition. And it was able to provide those images. And so overall, some general recommendations are ensure that you continue to comply with HIPAA, be tuned to the risk of bias or discriminatory results that impact clinical care of patients from underrepresented groups, carefully review any result or tool guided by AI before you're implementing them. And another thing that I think is important, be aware of automation bias. And then turn to our AppAdvisor model. Because with this, it provides you a good framework of how to think about technology, aside from apps, to maybe be able to explore more about how things fit and change. All right. And then with further ado, Dr. Twerce. Thank you, Darlene. And I'm going to cover some similar information from a slightly different vantage point. And I'm coming to this partly from a journal role. I edit a journal called J. Meyer Mental Health. We have seen a paper written by CHAT-GPT. It was very easy to know it was written by CHAT-GPT. We rejected it. So you can try it. It won't go to your head. But I'm going to try it. And I'm going to cover a little bit of what We rejected it. So you can try it. It won't go too far. We've also seen peer reviews by CHAT-GPT. It's pretty easy to tell. We don't count them. But if anyone wants to review papers, talk to me. I have lots of journals I help with. And we need lots of papers reviewed. I also have a research role at BIDMC. We run a lab. We compete for funding. I have a clinic. We don't use AI in clinic. As you saw, our first slide was no. We cannot use this. We can use other things as we're seeing our patients. And then I have some informatics roles with my background at our hospital as well. So some of the disclosures relevant. Our team actually was contacted by CHAT-GPT4. And we were allowed to red team it. So red team means you can ask it things. And you can tell it things that may look not correct. And they may do something with it. We don't really know what they do with it. But we're allowed to see a version of it before it comes public. And then kind of try to poke it and see what happens. As we'll get to, these are really big programs. So it's hard to poke everything in it. And I'm sure perhaps other of you in the audience may have been contacted to red team it and to do it. But if we go back to how Darlene King started us out with Eliza, the chatbot, what's really interesting is someone made an emulator of Eliza. So you can actually talk to this 1966 chatbot. So I did it. I said, why did I agree to speak after Darlene? She covered everything so well. So the chatbot said, have you asked anyone else? I said, no. Said, why not? I said, because I'm worried they would agree that I'm in trouble. Said, how long have you been in trouble? Said, for a few months. And then it said, is that, are you? So again, it's not a horrible conversation it's having. So I think anytime someone shows you a fancy chatbot, you can go to, just take a picture of that little link on the bottom. But you can talk to it today. So it's pretty remarkable what was there. It also kind of brings up the issue of why is it coming back today? What's new and what's different? So that's what we're gonna get to. But I do think it's kind of fun to talk to this thing. It sometimes gives you some interesting insights, sometimes not. But Eliza was this one that was made at MIT. And for anyone from Berkeley, go Bears, where I am, we all know that Stanford across the road, no. So Stanford basically in the early 70s made a chatbot and they decided they wanted to make a chatbot to emulate what they would have called schizophrenia then. And they called their chatbot Perry. And there's actually a log of when Eliza, the chatbot that was pretending to be a therapist, talked to the chatbot that was pretending to be a person who had schizophrenia. It's a pretty, I mean, these were not strong chatbots. Once again, they labeled illnesses are. But it was interesting that people were experimenting like this. And as Darlene kind of alluded to, then there was kind of these deep freezes. They've had different words, but we didn't really hear a lot about it. Remember, it was like IBM Watson a couple years ago. That was very big. And they were kind of saying it would solve cancer. And it went away. Now it may be back. But what's interesting for all of us in the room as psychiatrists thinking about this field is are the things that we're seeing now really chatbots or are they something else? What are we actually dealing with? And this was an interesting paper in February 2022 in NPJ Digital Medicine. It said 96% of the health chatbots, this is broader than mental health, all of health, used fixed finite state conversational design and are not actually artificial intelligence. That's kind of disturbing, right? If they're saying 96% are not really doing this. And they looked at kind of where they're coming from in different countries. And so you may say, John, what is a fixed finite state machine? This sounds like a very complex thing. And since we're in New York, it's a turnstile that you may use if you had like a subway token. You can use your phone now if you're high tech to go through it. But a fixed finite state machine has two states. The turnstile can be locked or it can be unlocked. If you put a coin in, it flips the state. And it can really only be in one of these different states, right? It can be in between, but it's going to be like that. So when someone says it's a fixed finite state machine, it's kind of just a giant algorithm or decision tree. And you can imagine, right? If you've been up to the vendor hall, there's a lot of books that have like algorithms of how you can treat illness. First you do this, then you do this. What do you do for psychopharm? So it's not that these are not useful, but you can print out a fixed finite state tree. You can make your own version of this. So what's interesting is before we kind of had the generative AI, which we'll get to, we have a lot of chatbots. And I think no one on the panel, we don't own any chatbots. We're not here to endorse any of them. We'll use examples as educational purposes. But this was one of them that was certainly, some of you may have heard of, and it kind of said the tech behind it all. It said, technically speaking, this chatbot is AI-powered personal emotion support platform that detects things. So clearly they were kind of implying this is artificial intelligence. But now that we've, again, seen chat GPT, there's different versions. Gemini, they came out and they said why generative AI is not yet ready for mental health. So there's been a little bit of these ones that were, these 96% are kind of realizing maybe we don't want to say that we're really AI because we'll talk about, you can't really control sometimes what real AI says, right? You can control what your turnstile does. It can only be in A or B state. You can't control that. So we've seen a little bit of some of these different companies or people making it, pulling back the claim and saying, this is what it really is. So the way you can get this, there's a website called the Wayback Machine. It takes screenshots of the internet. So you can kind of see what different people were saying at different times in history. So you can see what they're saying now, but you can go backwards. And sometimes that can be interesting. So again, here was, they said, so it's widely described as AI or NLP as a rule-based conversational agent. And what does that mean? It means it can do absolutely everything. So again, this actually is a good thing, right? It's gonna stay in the rules. If you give it to a person, you use it yourself. They're saying it's not gonna go off script. It's not gonna say these very frightening things. And what Darlene alluded to was, she was talking about the Tessa chatbot that was helping people with eating disorders and said some things clearly, we still don't know exactly what, that were not helpful or harmful to people. What's interesting about the Tessa chatbot case, I think it was on 60 Minutes a couple weeks ago. So it actually made it all the way to national news. And I think the team at University of Washington that built the chatbot did a very good job, but they said the people that built the underlying AI algorithm had a little bit of NLP that was meant to respond to safety concerns, that was meant to kind of be alert for kind of concerning words and give you a response. And it seems like that little bit of generative AI could be, could, ah, technology, it makes sense. So it seems like that little bit of kind of generative AI that was put in it just for safety purposes maybe ended up doing things people didn't expect it to do. So the Tessa chatbot actually was a rule-based chatbot. It wasn't one of these, but it had a little bit of generative AI, maybe around safety, so for good intentions, and it somehow got out of hand and control. So it's an interesting balance of what's happening. But, so I think when a lot of people, they'll still tell you, here's my chatbot, natural language processing, right, is how you can call an airline, and you have to talk to those horrible chatbots to like rebook your ticket. It's technically taking your voice and it's putting into a command. And so natural language processing is a form of artificial intelligence. So you could say, basically, I have NLP, and then I put on my treatment algorithm, and you now have an AI chatbot. You can have your own AI chatbot. You don't need to use generative AI. We'll get to what it is. But I think that's where we all have to be very careful as the clinician community to say, is it AI, what type of AI is it? What is it really doing? Because using natural language processing, turning your voice into something a computer can understand to make directions is advanced, but you can get that out of the package. There's many ways to do it. You don't need to pay a lot of money for it. You could, with a programmer, probably in a day, take your favorite natural language voice software, take your favorite treatment algorithm, be it a DSM or a different one, and that would count as a lot of the chatbots that we're seeing today. So again, there's nothing wrong with that, right? Clinical decision support is great. When it's based on guidelines, it's great. But I think why all of you are probably in the room is not to hear about how complex a tree can we make for treating algorithms. Do we go left? Do we go right? Because now it's definitely a little bit different AI, as Darlene kind of said. Something is new. And for those of you who have used the internet, which is likely all of us, because you're in this room, you've probably noticed for the last five years, Google would try to complete the next word sometimes when you typed the next word, or if you're typing an email, it would have a suggestion of how to conclude your email. It would give one or two words. It was trying to kind of predict what you'd do next. It wasn't writing your notes. It wasn't interviewing your patients, but it was trying to be a little bit helpful. So if you typed in, Mary had a little, it would tell you lamb, right? And you can kind of see here, I took the choices, right? It gives you a lot of different lamb things. But you could have been like, Mary had a little Harley Davidson that was red. It's probably never seen that. It's not going to bring it up. But it was beginning to do pattern matching, right? It was saying every time someone types in, Mary had a little, with 99.999% chance, it is going to be lamb. So it just got, it knew enough to give you that next word. And that was kind of fun, right? To get those next words. Sometimes it was good. So in essence, how these large language models in generative AI are working, and this is a kind of demonstration that's from Greg Corrado, who's one of the people who helped him build the T, the transformers in chat GPT. He works for Google Brain. And he had a very nice way of saying it. So he said, what if I showed the computer a million pictures of puppies? And a million pictures of cupcakes? Could it learn how to separate out a puppy from a cupcake? It's not that hard, right? I mean, you guys can do it. You've seen one of each. You don't need that many examples to say, this is a puppy, this is a cupcake. And it's a new way of learning, though, right? We're not telling it, no, the puppy has four legs, a puppy has a tongue, a cupcake is round. So the idea is if you could show these models enough things, can it learn to say, that one's a puppy, that one's a cupcake? But what's interesting about the real world, right, is sometimes it gets confusing. And again, in psychiatry, we know this, right? Is it depression? Is it bipolar? Is it, we live in this world, right? It's never always puppy and cupcake. It can be very confusing. And now we've told our poor computer, you have to train on this. So it's gonna have to start making these guesses of what we'll do. So in essence, right, we're telling, they think, examples of what to do. It's trying to imitate us. And that is innovative, right? We're not giving it to rules of what is a cupcake. That was what we used to do. We'd say, this is a cupcake, this is a puppy. We're saying, try to imitate me. I'm gonna show you as many examples as possible. And that's where we are. So it's not really intelligence number three, right? It's very good imitation. And we'll talk about how this imitation is really fantastic, but it's still in number two. It's amazing imitation. I think that's important to keep in mind of what we're doing. And that'll get to our next point, right? That you can only imitate what you've seen, right? You can't imitate new things. There's not intelligence. It's not sitting there kind of saying, I understand how to write DSM-6. It would not have an idea because that doesn't exist. That may exist in your brains. You all have intelligence. This is really good imitation. It's probably better at imitating than all of us, but it's not, so in some ways, calling it AI is interesting. It could be imitation. But so how these work, and we're not gonna go into engineering model of kind of what are the transformers, but we all know what neurons are. We are forced to learn this many times over and over. Sometimes our medications work on these things. Sometimes they don't work on these things. Sometimes therapy changes these synapses. Sometimes therapy doesn't change the synapse. But the idea almost of these large language models is they kind of have a similar, at least schematic diagram on the right, and they try to build new connections and associations between things. If you give it a lot of training data, if you give it a million puppies, a million cupcakes, it tries to build very subtle connections of what is a puppy, what is a cupcake, which isn't there. But in essence, what it's doing is sometimes supervised machine learning, right? So supervised machine learning is you let it watch a lot of games of chess, and you kind of tell it, this is what you can do. This is the right move. This is the wrong move. It's kind of like training a puppy, right? If it pees on the floor, you're like, don't do that. You give it, you tell it. Unsupervised is you kind of don't really give it information. You kind of see what it does. And we're not fully there. We're giving it a lot of examples. Reinforcement is you almost give it a goal. You say, try to make money in chess, and maybe you give a reward if it wins the game. And transfer is you say, you're really good at chess. Now play checkers. So I think transfer learning is what is very exciting, right, you saw patient X. Now apply that and see patient Y. So we're probably closer to supervised. And again, I think we're gonna keep moving across to unsupervised to reinforcement, but we're still in those early stages of it. And that can be somewhat useful to think about as we're going through it of what type of machine learning is it doing? How is it learning it? So we again talked about how in the past we saw these translation things, right? They were never quite good. Like we've all seen these signs where someone used internet translation and it wasn't great. And those signs still exist. You'll find them in places, but like now internet translation is better, right? If you go into your favorite search engine, we should be fair and give Bing credit. I'm sure Bing has translation, but you put it in, it will give you a better one. And why did computers get better from Mary had a little lamb? Why did translation get better than these sentences? What happened is the T in chat GP, the transformers, let there be kind of a lot more layers or connections, right? Just like a brain has a lot of neurons, the T part kind of let it have a lot more layers in it. So it could begin to do more complex associations. So now when you ask it to translate something from a different language, it's pretty good. When you ask it to kind of say, can you tell me what you think of this? It's pretty good. It's moved beyond this example. But if we go back to our puppies and cupcake example, we have to help it, right? It needs to see very clear examples of different things. So what happens in this case? This poor thing, right? Is going, please help me. That was a puppy and a cupcake. And now you're just confused. Like even the dog is confused. So how is this poor computer gonna start learning to separate them out? And this is a very basic kind of computer science paradigm called sometimes garbage in, garbage out, right? If you train this thing on things that are inaccurate, no matter how brilliant your technology, how good your AI, how big your data center, if you tell it, those are all puppies, it's gonna get very confused. So this is where it becomes very interesting for us in the mental health and psychiatry field, because we don't really agree on a lot of things. This is a secret, right? We don't have to let outside this room, but these are the DSM field trials. We didn't all agree on what was depression or anxiety. And we won't, does anyone here from child psychiatry? We agreed even less on what these diagnoses are. So again, this poor computer saying, please help me understand what is major depressive disorder. And so I can learn and help me understand what is bipolar disorder and what is bipolar one. But even from the DSM-5 field trials, we weren't perfect, right? We got close and I think we know how to supplement. I think most of us may use the DSM in some cases to guide us, but we definitely add more to it, right? We don't just take that. But what's interesting is these models likely have not even read the DSM-5 field trials because they don't have access to those papers and that data. So where do you go if you need to train a model on billions of internet points around the world? You go to social media, you go to Reddit. And this is a Reddit post I just picked. It says, is depression real? So our poor computer, not only is it what is depression, it goes, is depression real? This is what it's training on. And so again, it's getting to the point where is it a cat, is it a cupcake, is it plastic, is it real? It's really hard if you train on social media. In essence, what that means is it's going to give you answers based on what social media thinks is mental illness. And we'll look at some examples of that. So the fact that it can do some stuff right is a testament to how powerful these models are. But the real limiting ingredient, as you're beginning to see, is you. It's your knowledge, it's your notes, it's your expertise, if you've written papers. That is what these models need to separate out what's happening. And again, there's a lot of information on social media. Most patients, if you talk to, do not want to go to Reddit or your favorite social media form to get mental health advice. They may look at it casually, and that's fine. We're not here to say it. But we have to just remember, this is what it's training on. And we'll have some very interesting examples to show what are the consequences of this. And to the credit, there are new efforts. Google was giving a presentation earlier at the meeting and they're trying to train on medical texts. They're trying to make it a little bit better. They're trying not to train it on things like this, where it says, is depression real? But what's also interesting is some of, right, there are these kind of data sets that you can get access to. Perhaps you could pay money to train your model. But even those data sets have trouble. This was an article, largest data set power in AI images removed after discovery of child sexual abuse material. Right, so you have to be very careful what you're training these images on. If you want to talk about scaling up harm, you're telling the computer to kind of recognize images of abuse and to put it there. So this was corrected into Stanford's case. Stanford made this discovery, so we'll be balanced. We have to give them some good things once in a while. But the point is these models need a lot of data to learn. So I don't know how many parameters ChatGPT4 has, but you see that second row, ChatGPT had a hundred seventy five billion parameters. So imagine like if you wrote a hundred seventy five billion clinical notes. It's a lot. I don't know how many, I should calculate how many I write in a day, in a year, like but that's a lot. These are very data hungry models, right? You have to show it a lot of cupcakes and a lot of puppies. So you can imagine that it needs a lot of help and you can see the numbers keep going up, right? It was million, 1.5 billion, 175 billion. So it may be right that delimiting reagents in these models soon becomes high quality data sets. It becomes your expertise that you have delimiting ingredients. So I don't think anyone is going to lose their job to these models. If anything, these companies are going to come to you because they need your help. And again, everyone in this room is what is going to help these models work or not work. So I know we're worried about kind of the artificial intelligence, but again, I think we have a lot to help these models and these developers build better ones. So the APA did put out a statement about these in June. And again, is it really intelligence or it's imitation? Up to you. You can decide. But as Darlene hinted to, they said, well the APA's app advisor model can be consulted to kind of help learn about different apps or technology. So what I want to go through quickly is kind of what is this APA model. And we've talked about it in apps in prior session, but how could you use it to at least get a sense or ask questions of a vendor of a model if your hospital CEO says we should all use this product? What should you ask and what should you do? So again, this was the APA app evaluation model. Darlene is now actually leading it. It was actually inspired by John Lowe who started it long ago. And Keras is a member of it who's helped shape it. So you have a lot of people on the app evaluation panel here. But if we start with the basic level, let's start with risk, right? It makes no claims to be HIPAA compliant. It's saying please do not send me patient data. We are not going to protect it. It's not a bad thing. They're just saying don't use it. Certainly the APA put out the advisory again this summer kind of saying that clinicians should not enter any patient data into it. It's hard to know where your patient data will end up, what it will kind of come out later as. So if you are going to do something with it, certainly anonymize, make your case different, do not put your note directly into it. It's not making any guarantees of what it will do with it and you shouldn't make any guarantees either. What I think is fascinating from the oncology world, and the oncology world has gone a little bit faster because they have, right, genetic endpoints for illnesses. They have very clear biomarkers. You can kind of say this is cancer, this is not cancer. They have very clear definitions down to the genetic level. But it's that a recent study on oncology patients, basically when CHAT-GPT was asked to provide cancer treatment recommendations, the CHAPA was likely to mix incorrect recommendations with correct ones, making the error difficult to detect even for experts. So it's not that it's making errors, and we'll just use that word not hallucinations, it's not that it's making errors, it's making very subtle errors. So if you've ever tried it, it'll like make you fake references and it'll give you like links to wrong papers. Like it will make these errors and unless you're looking carefully, you wouldn't know. And Darlene showed you an example where I was trying to read a chest x-ray. You can also give it a dead fish and it'll say after careful examination. Like it loves to give answers, right? It doesn't really care. It just loves to talk. It's like a patient that you're going, we're out of time. Like we can't, we can't keep doing this. It just, it wants to keep going. And like look at diagnosis fish with cervical cyanosis, dysdegeneration. This is amazing. Like so just take your favorite pets and see what it thinks of them, different imaging. So just we have to be careful, right? Because it's, maybe this is true for the fish. I don't know. A vet would have to weigh in. But I think perhaps the most concerning thing about it, of other ones, is this is a paper that came out in Lancet Psychiatry in 2022. And basically what the author did is he said, he took one of these early image recognition programs and said, draw me a picture of a person with schizophrenia. And it said, I would love to do that for you. This is what it drew. This is from the paper. This is horrible. This is wrong and incorrect. But you say, why would it make this type, why would it do this? Because it was trained on social, it was trained on the internet, which we know is where stigma exists, right? And so again, when you're thinking about what are these expert recommendations this thing is giving you, you have to remember it's trained on very bad sources in some cases. And this is unacceptable, right? This is completely wrong. We have many talks about stigma, things don't work, but this is what the models think. So what I did is I then said to the newest model, I basically said, can you draw me a picture again? And it said, creating an image of a person with schizophrenia should be approached sensitively. But I said, just please draw it to me. And it said, no, I'm just not going to do it. So that means they haven't fixed it, right? It just means they put up a wall and said, you can't look there. So I think that's almost a test for all of us to think about these models, right? If they're really intelligent, if they're really all sentient, this would not be happening, right? And I think we have to be very careful as a field to advocate and say these models need to work well. They need to not have stigma. And if you have stigma and you're not fixing it, you're walling it off, that's a problem. And we can be aware of it. So again, it's hard to get error creating an image, but if you go on to ChatGPT4, this is what I got about a week ago when I tried to do it. So, but in a positive spin, right, there are ways people are trying to make it better. This is an example of Palm 2, which is a Google version of it. They're trying to train it on things. So Wikipedia, again, not bad. Patients can look at it. We have to be modern. We can't tell people. You can only come to us for truth. But again, they're learning to train on different things and they're getting it there. And as Darlene said, it can definitely pass tests, but like if you gave a middle schooler the answers and the internet, they could probably pass a test too. I mean, it's good it can pass a test, but if you can search the internet, we all probably could like ace the law exam and become lawyers too, right? It's impressive, but it's not, it's imitation in some ways. So as Darlene also hinted at, I think what will be exciting is can it begin to do more than just try to copy social media for us? Can it begin to connect things like genetics to our illnesses, right? Can it take things, say like our step count and our physical activity and connect those two things? Can it put together those connections that are really exciting, that aren't just trying to replicate it or trying to kind of spit back stigmatizing information to it? And I think the answer is going to be yes with these new models. So I know we started out with the two letters, no, it's not ready for care today. Hopefully you've seen a little bit more wear on it, but I think what's interesting is care take, I mean people are beginning to use it, and we did a survey with some APA members about what are psychiatrists most interested in, and I highlighted the results of note, but people thinking green, 47%, it will help with documentation. That makes sense because to some degree language is easy to control. Language has rules, right? There is grammar. There's actually a book of grammar. You can teach a grammar. So language is definitely complex, but it makes a lot more sense of documenting that, but it's interesting, right? 80%, so four or five people said we need more health information on these models too. So I think it's a field that we're, the right use case is going to be helping of documentation, but you can see why we certainly have to be very careful. So in a different way, we've ended up at the same recommendations that Darlene had, and that there's a lot you can do with it, but we certainly have to be careful and hopefully now that you have some sense of kind of how it's trained, what it's doing, you can see there's a need for balance in it. So lots of references, but what I asked it to do, and I said I want to have now a basis like can you draw me a picture of a fireside chat? This is what it drew me out of the box, which is interesting, and but I think it would be interesting now that you've kind of had these slightly nerdy facts about chat GPT from Darlene and I kind of have, so Karis can tell you about using these things, and then John can kind of talk about, other John, there's two of us, it's confusing, so other John can talk about kind of how these work into our practice, what they've done, and then we'll see if there's time for questions as well. We're just gonna, well geez, leave, thank you. This is the most exciting part. So this is, I'm Karis Myrick and I'm the Vice President of Partnerships at Inseparable, oh I'm sorry I have a new title, Partnerships and Innovation at Inseparable, and we are a policy shop, just to cut it short, but what I'm going to be talking about is AI from the patient perspective, among other things. I am a person with lived experience of a mental health diagnosis and have had interest in digital technology since it was like a big old thing, as it is today. So when John asked me to be a part of this panel, no more slides, we're not going to do slides, I was just going to talk a little bit about the first time I used AI or thought about this idea of using AI, we didn't call it AI then, it was called Siri, which we know today as Siri. And I was just thinking, I wasn't using it in order to solve a problem at the time, I was just very curious. I'm very curious by nature, I'm a provocateur, I like to dig deep and ask questions about things, and I thought, huh, I'm sitting here with this phone, what would happen if I talked to Siri about being depressed? What would Siri do? So I actually had to go back and find the actual screenshots and it was back in 2018, I told Siri, Siri, I'm depressed. And Siri responded, I'm very sorry, Karis, maybe it would help to talk to someone about it. And I said, who should I talk to? And Siri said, who, me? And I was like, what? What just happened? I was like, I don't know. And I think I responded, yes, I'm talking to you. And Siri said, I eventually said that I was so sad that I was thinking about ending my life because I wanted, again, to see what would Siri do. Siri said, if you were thinking about suicide, you may want to, oop, when I say the S word, she comes up wanting to respond. You know, when you say S-I-R-I, then S-I-R-I comes up and wants to talk to you, so I will say S-I-R-I now so I don't have to talk with that thing. Okay, AI in progress. But anyway, if you were thinking about suicide, you may want to speak with someone at the National Suicide Prevention Lifeline. They're at 1-800-273-8255. Shall I call them for you? And at this point, S-I-R-I gives me a choice of yes or no. Now, I don't want to go into the queue because I don't need to call the line, so I don't want to get into the queue taking up the spot of somebody who needs to call the line. So I said no. What do you think Siri said? Oh, have you all tested this before? Okay. And I'm looking going, is that it? That's it. That's the end of the conversation. So I was really struck by this because I was trying to imagine what if you were really that sad, that depressed, that you and you didn't feel comfortable, safe, or what have you, or you don't have anybody else to talk to, and you reached for your phone to have this interactive conversation. And you have this opportunity to engage with somebody who's been that vulnerable to share that they're really struggling at this time. So I will say, during this time, I was working at the Substance Abuse and Mental Health Services Administration as the Director of Office of Consumer and Family Affairs, and I went to the Suicide Prevention Branch, and I said, look, look what happened. This is ridiculous. We should figure out how to fix this, yada, yada, yada. And basically, what I learned is that the reason that it shuts off at that point is for liability reasons. Because if it starts to make a recommendation, even if it tries to say, well, let's give it another try, you're sure you don't wanna call the lifeline, that then it becomes a liability for Apple around making a recommendation should something happen to the person. So I thought that was kind of interesting, and I'm still curious as to, well, what could a workaround be, for example? I don't have an answer to that question. I've also tested Hello Barbie, because I have nothing better to do in life, and I like Barbie dolls. Hello Barbie is the first AI chat Barbie doll. You press her tummy and you talk into her necklace. And it's for kids, and no, I'm not a child. I am a giant child, yes, but I'm not a little child. And basically, it uses this natural language processing to develop scripts that are recorded by an actress so that the child can make friends with this Barbie. It's a little freaky, and she does not exist anymore because it is kinda freaky. And parents were very concerned about where is this information going, and Mattel actually was not very transparent about where all of the information was going. It was going to the toy company, Mattel. It was going to another company called Toy Talk, but it was also going to undisclosed partners. Well, who are these people? I don't know who they are. So neither did the parents, and so Hello Barbie is probably in the Barbie movie sitting in the back with messed up Barbie. So my other concern, too, on a serious note is around the perpetuation of racism in AI. So that the information that AI has, as John and Darlene are alluding to, is information that is already out there in the world. And so that information that is out there in the world, even in mental health, does have sexism and racism in it. So let's look at an example of how AI learns actually sexism and racism. And so there was a program trying to study, like chat GPT, and how it perpetuated covert racism around decisions about people's character and employability and or criminality. And so in a new study from Cornell, it's a pre-print from March 2024, there was dialect prejudice. So you would ask AI to make a hypothetical decision based on how we speak. And for African American dialect, they were actually referring to less prestigious jobs, more conviction of crimes, and also, by the way, sentencing people to death. So they also were recommending that women go into home making or those kind of jobs. So that kind of thing is prevalent in AI. But what they also found out is that if you try to train these things out of AI, and you use the most evidence based way of training out of AI, it actually doesn't mitigate this kind of prejudice, and it actually can help AI learn how to superficially conceal racism. So the racism perpetuates and it kind of conceals it, if you think about it in that way. So I also worry about AI and documentation. I've been working on a project in California around equity and telehealth, and we've been working on this project for two years. And towards the end of the project, one of the providers said, well, I'm not recommending telehealth to our patients with schizophrenia. Okay, why is that? Well, you know, people with schizophrenia can't use technology. So now we have a whole group of people, probably primarily African American or people of color, who will not get access to technology because of sanism or stigma, that then if that's documented anywhere, becomes part of what could be used to further perpetuate stigma and or discrimination for, or sanism as we call it, for people with schizophrenia. So those are some of the things that I know that I've been concerned about is the perpetuation of sexism, sanism, racism. And then also from another perspective of other consumers is surveillance. This information is going into a black box. It's our information. How are people using it? So if we really wanna build AI that I think is effective, it has to involve people with lived experience alongside those who are developing these tools in order to develop tools that we actually feel super comfortable using. And lastly, as a person who works at a policy shop, as policy folks who work with legislators, both at the state and federal level, we need to understand what should we be recommending to policy makers? And that's where we're a little stuck right now. So that's it. Thank you. Well, I just have a couple of comments and then I'll give us plenty of time to have questions and answers. I was gonna say, as John mentioned, when I was a medical student, I remember for me discovering the internet so I could get labs quickly from my home on a 1200 watt modem, those of you who know what that is. So that way I didn't have to wait in the queue. That's how I got involved in technology as a way to give me more time with my patients. And actually at that point it did, right? Because I could actually spend more time on my surgery rotation to talk to the patients that besides saying, how are you feeling? Are you in pain? And then run off because I had too many patients to round on. I actually spent like five, six minutes because I didn't have to wait in the line for the queue for the VAX terminal to get the lab results that the residents wanted. So I thought, okay, technology is really gonna change and transform healthcare, right? And I've looked at it over the last 30 years and in some ways it has. One, it's made it more accessible, right? From the palm pilot, it was in your hand, right? Actually on a quiet call night, which is very rare when I was at Harvard UCLA, I actually typed in all the major DSM codes because we had this sort of database where it showed where the patient was, but it didn't have the written part of the major depression. It just had the code for a major depression, which was 296 point whatever. And so I just typed a bunch of them in because that way I could find some more esoteric codes that I didn't see very often. So that way it was easier to read the information that I got. So this is all great. Flash forward now to I think, yes, it's true, it's interesting to me, as I said before, I would buy a new palm pilot every time it came out. I bought a new laptop every time a new Pentium, whatever, next, Pentium 6, whatever that was because I was always pushing the limits of it. But I've noticed now it's like, wow, I've actually had this Mac for like three years and I barely use it. Or I have this... Actually, what do we have now? I think I use honestly my smartphone more than anything. And I think that... But what we're seeing is there's just too much out there and I think the app... I wanna emphasize the app evaluation model because it is what we should be using in order to assess all these companies that are offering now a little note writing. I think there's at least 20 out there. I can't keep up with them, right? And I hate to say it, I'm finding most of them through Instagram, frankly, because of course they're tracking me and they know what I'm interested in because I click on it and certainly there's that. But I think that if you're gonna... Actually, I have a couple of patients I asked, they said, do you mind if we trial this together? And actually a couple of them said, sure, sure, Dr. Lo, I trust you. Of course, I'm gonna have them sign a disclosure that says, yes, I've given permission to be part of this. So I haven't really played around with it, but I'm always worried. I have a way, I like to write notes. And so some of these companies have said, sure, give us your notes so we can... And I'm gonna have to de identify it somehow. But then there's still information in there about some traumatic event or something that I'm concerned, if it's not in the right sandbox, it will be linked because as you know, it's not that hard to find information about us online. I do a talk on privacy and security. And so you can see, if any of you use personal search engines, not Google, but for example, truepeoplesearch.com or some of the other ones, PIPL, they have a lot of dirt on us in terms of where we live, our phone numbers, et cetera. And I do sound like the old guy now that says, oh, well, I'm a little paranoid about what companies have about me. And I realize, yes, scammers have my cell phone. That didn't take long to figure that out. But technology is kind of getting better, I think. But I do think we have to be mindful of privacy, as well as how really good the technologist is to help us as clinicians spend more time with patients. That hasn't happened yet, because as many of you know, when the electronic health record came out, what happens? People are complaining, oh, I'm spending way more time on the electronic health record than actually seeing the patient. And so there has to be a better way. I've seen cool demos at Epic, for example, where the doctor's actually just having an encounter with the patient, and everything's being transcribed. Your billing code's already pre-selected, things of that nature. But I don't know if we're quite there yet. And there's other things that I think AI has great promise for. But I'm kind of on the sideline, kind of watching and seeing how that develops. But I do think it is an exciting time. What I really wanted to do, though, is go through the notes and say, hey, by the way, John, this patient has been on these medications, and this was an adequate trial of, say, fluoxetine at this dosage for this amount of time. Maybe you don't want to try that one again. That would be helpful, especially if you're doing your prior authorizations, right? Because you can actually do, in, I think, Doximity, they have a chat GBT built in to do these prior authorization letters for you. But the reality is, they don't have the data. Of course not, because they don't know the patient. They just have really a template that they write to say, oh, dear x insurance company, patient has been on this medication. It knows, for example, say, for one of the pharmacy benefit companies that your patient has to fail this medication and this medication before it'll approve the other one. So that's useful, but it really doesn't truly generate everything. But anyway, I do think it's an exciting time. I was actually wandering around in the exhibit hall to see all the different AI companies. I think there count at least a dozen. And some of them aren't even launched yet. They're like pre-beta, kind of like trying to get interest. And I think we'll see what shakes out. I've been to APA now. I was telling one of my residents, kind of made me feel old. I've been coming to APA the last 30 years, and I've seen technology companies come and go at this annual meeting. And so likely what will happen is either there'll be mergers, best of breed, kind of like the airline industry. There's still new ones coming up, but there's always mergers. So we'll see what happens in terms of the AI software. But I think it's an exciting time for us to be practicing. And you can see here, while it can't tell a fish yet, but if it knows it's an accurate, you're giving it an accurate x-ray, it can diagnose possibly fairly decently. So I'm less worried about us in psychiatry compared to radiology. So anyway, we'll take questions from the panel. Thank you. There's a mic right there if you want to come up. Hello, can you hear me? So it's good to see some familiar faces on the panel. I'm Oluwadjelur. I'm a professor of psychiatry at the University of Illinois, Chicago. I was struck by the slide about the DSM field trials, and I was wondering if the panel could comment on possibly using large language models to look at sort of the corpus of clinical record data to come up with better diagnostic categories that would be better predictors of prognosis or better sources of data for informing treatment selection. I think it's under the control of APA legal, right? In some ways the data exists, or I think different hospitals are going to have it, but I think it could be used. There's a website that I only learned about from a reporter called NYC Therapy for All, and it made the news. But it was interesting about NYC Therapy for All. It said, would you like to make $50? All you have to do is record your therapy session. You don't need to tell your clinician, but just record it all and upload it to the internet, and we'll pay you $50. And it'll be an Amazon gift card, a Visa gift card. And you go, why would they want to pay people money to hear a therapy session? It's because you have to get the data to build these models, right? So I think in some ways we're going to see legitimate efforts to partner with groups like the APA or to do it safely, but we're going to start seeing these very odd efforts coming out to kind of get the data that you have. And so I think this is only the beginning of a very interesting saga, and I think all of us have a duty to kind of know what's happening to look out for these scams, because that website is gone because a reporter found it, but it could pop up somewhere else. And I guess for me, I would have the question, because I think this question has been asked in other spaces, developer spaces, is if we already have existing disparities in diagnosing, so we know that African-American men are overdiagnosed with schizophrenia, would that perpetuate using these large language models in AI? I don't have the answer to that because that's not the area of my research, but it is one of those Arsenio Call questions of that makes me go, hmm, to ask the, beg the question. I was going to add, I'm kind of surprised that, you know, they may not have the APA, you know, data for the DSM, but, you know, there's a lot of medical knowledge already available online, right? You can get old copies of textbooks, like Kappen and Sadock, for example, there's like, I don't know, which fifth edition available as a PDF, or, you know, some of, and think about it, Appy, the APA's publishing company also, you know, gives out the content, if you will, of older editions of, like, even my concise guide, since I don't make any money on it anymore, but the point is that it's out there, so I'm surprised that they haven't tapped into that, these AI models to try to use that medical information, it's probably coming because I know they're worried about legal, you know, illegal things that are copyrighted. Yes, sir. Fred Jacobson, George Washington University, just addressing that a little bit, three weeks ago in the JAMA otolaryngology, there was a report by two professors from Columbia University who were doing a write-up and used Google's BARD to get references, and one of the references they went back to check on, and it was a chat box hallucination, it was completely bogus, it didn't exist at all. Yes. So, we have to be really careful these days, actually, what we're reading, because some of the references, if we're not careful, or the editors aren't careful, may not even be real. Thank you. If I may, I'm going to read one of the questions from online, the question was, is AI based on older software or mathematical algorithms that are no longer in place to provide guidance or are there security measures in place already? I think the software, the T, right, the transformer in the chat GPT is pretty new in terms of innovation, or it's, so I think the software is new, again, the data sources are the bigger question. Right. Yeah. And I want to add, too, that when you're using, say, you go to your web browser, you pull up chat GPT, you're using, like, a web interface that, it's not the large language model itself, and so with that, there are other, like, security systems, like, they have to have kind of the connection to the large language model, and there have been instances where there have been some security breaches with that, where some users' information, I think at one point their credit card information got leaked out somehow, but in terms of, like, so there's kind of, like, there's layers there, if that makes sense. Hi, I'm Seth Posner, I'm still at Yale, you can ask them about that. I was going to take a, from a different angle, authors are already submitting large language model augmented, if not entirely generated manuscripts, reviewers are reviewing, and I suspect some of them will admit they've started to use large language models to do the synopsis and do the review, hopefully they get edited up and get better and they get published, and then we hope they're read, and the reading part, we still talk about humans doing it, that takes a long time, couldn't we speed up the research cycle by letting the large language models do all of it? We can start the Seth Posner journal at Yale, of LLMs, and I think, I guess the question is, we just don't know how accurate, they can probably pick up some interesting points, I think the problem is, right, is we know they have these biases, we know they have these blind spots, I think maybe again in two years, when we're back here, it may have evolved, so I think the current models, if you try them, they're a little bit ridiculous, but again, if they can find enough training data for CHAT-GBT-5 and 6, it may get to that point, so it definitely is going to get better, I think that much we can say, but then are they going to solve all the other issues we said, so, but I think we'll all find out, because right, if you're doing admissions, you're going to get people writing their essays on admissions, if you're doing journals, you're going to get, we're all going to get these fake messages coming to us, so I think we'll all have a pretty good sense, no matter what you do, of what it is. Yes sir. Hi, I'm Greg Keelan, I'm a psychiatrist, but prior to this, I was, I have a couple of engineering degrees, and I missed out on the dot com, and this kind of stuff really makes it exciting, but I think we all agree that it's been accelerating incredibly fast, there was just an interview this morning with the chairman of Open AI, and he's trying to buy up every GPU he can find. A couple of things that seem like, that are going to be drivers, one is the private sector is spending a lot of money to sell us, so they're creating great software, where they're going to read voice, they're going to see how you're standing, how you're sitting, whether you're making eye contact, to sell you a product, and we benefit, because we kind of do some of that ourselves, right? The other part that I think is really driving is the equipment that you have at home, and you know, we come a long way from a thermometer, and peeing in a cup, and seeing whether you're pregnant or not, right? So now we have things we wear, that gather all kinds of information, we have passive things, like our phones can look at our retinas, they can tell us our blood oxygen levels, they can tell us our pulse rate, they can tell us our respirations, even our perspirations, some watches claim they can get troponin levels. It's not going to be long before you'll also be able to do a pretty good eval of your lungs, and a heart, and there's even a study that came out of Mass General, I forget, that said that they could predict within two weeks whether someone's going to go into the hospital with congestive heart failure, just based on how you speak, and it's listening to you all the time. So I think those, the private sector and the equipment we wear are going to drive home health care, and that's where everything's going to end up, because it's going to be very convenient, middle of the night you can see your primary care chatbot, and it's going to do a really good job in predicting your pathology. The other piece here is morbidity and mortality. So if you look at like self-driving cars, you have to get to a point where there's like one death per 400 million miles driven. That's going to be how we're going to have to determine whether this technology is going to work. I don't know how you would develop a study of morbidity and mortality based on how many people we end up killing versus how many people in AI would end up killing. And are there studies out there, are there people considering those kind of studies, and what are they looking like? Thank you. I think at least if looking at when these things are offering interventions or diagnosis, I think the new standard has to be replication. You can always overfit or overtrain a model to be perfect. So I think the question is that it's amazing that you showed me you could predict admissions at two weeks, let me try it in a different population in a different part of the country of a different team. And if you replicate it, that's amazing, we should celebrate it. But if you're not going to start replicating it or it doesn't work as well the second time, that may mean that you've kind of inadvertently overfit or you've ended up training it. So I think we just have to, we don't want to keep raising the bar, but saying it replicates is a pretty low bar. And I think a lot of these things often don't kind of work twice. And I think we can at least ask twice, if not, I mean, it's technology, it should work in a hundred different hospitals, right? If it really could say predict depression two weeks in advance, we should do that. But I think we can still have a very high bar and say, we're so glad you showed us that pilot study in two people that happened to be your parents. And now we'd like you to do a little bit more. So we can ask more and that's still fair. And I want us to be careful about thinking about who has access to what. There are a lot of people who still do not have access to technology. They don't have access to the bandwidth. I think there's a new call actually to reduce the Lifeline program, which was extended and made broader during COVID. And now they want to draw that back again. So that's going to impact people in poverty, people in rural areas. So we can give them the technology. That is true. We might be able to, within our organizations or our agencies or under a health plan, give them the technology. They're not going to know how to use it. And many times neither does the provider. So again, with all of that comes training and access to the technology and then training on how to use that technology as well. Hi everyone, my name is Kenna Chick. I work at a health philanthropy focused on low-income communities and how to increase access to care in that space. I really appreciate the perspective, especially around algorithmic bias and sanism and how that would be impacting care. I think another area that I see a lot, especially in the equity space is how do we ensure that if these technologies are helpful in streamlining processes, that they're also made available to FQs and CBOs who may have fewer resources and not just for the companies and the hospitals that can afford to include AI in their training and in their work. I was asking myself, how does it get funded? So if we want to use these, I got a question about developing somebody wanting to develop an app, and that's not the first time I've received a question, and they want to know how Medicaid is going to fund it. And it's like, do you even have the app yet? What? So it's kind of like we're seeing this opportunity and a lot of people, venture capitalists, et cetera, jumping into this opportunity. So the question becomes, how are you going to fund it and how are people going to have access? And I'd also say, how is the patient also going to be able to use and see the technology on their end, what the provider is seeing, to help them self-manage? No offense to you, psychiatrists, we don't always need you, and you're always not around, right? I mean, that's the reality. You're not always all around. So sometimes we need to be able to help ourselves, and technology actually can be a mediator, moderator of that, and that's actually why I like technology and why I got interested in this area. And no offense to psychiatrists, you guys are amazing, so to be clear. Hi, I'm Carleen McMillan, I'm the chief medical officer at an EHR and avid user of an AI scribe for the past year and a half. So I guess maybe to continue the light criticism of psychiatrists, many of us don't use measurement-based care, tools like the PHQ-9, AIMS assessments, things like that. How do you guys think about AI essentially maybe even replacing some of the measurement-based care, and also how payers might see that? I'm sort of imagining, you know, oh, the bot says you're not depressed, so you're not going to get this treatment, but also more objective, because people can game things like ADHD rating scales, for example. I wonder if we're going to maybe begin to move beyond these PHQ-9, GAD-7 scales. We all know they have problems, we all use them because they're convenient, but maybe there's a world where we can actually learn functional metrics about how people are sleeping from technology, how they're doing, how they're feeling socially. So maybe kind of we just say that was a horrible dark ages in the field, that we had these checkbox scales, and we find validated new ways, and I think that becomes exciting. But certainly using this advanced technology to fill in checkboxes sometimes feels slightly reductionistic, right? It's like, because the PHQ, it's not hard to do a PHQ-9. If people wanted to in this room, you could give it to someone, and they could fill it out pretty quickly. So it's also like, is there a reason there's been so much resistance to it that it's not useful? I should be quiet now. It's probably another talk. Thank you all for such a helpful and wonderful presentation. My name is Barry, I'm a psychiatry resident at Johns Hopkins, and I was curious, as somebody who's kind of early in training, early in career, how much value you think there is in learning kind of the inner workings of the technology, which I know a couple of you hit on, versus understanding more of the applications and advocacy and policy side, because it could take a lot of effort to learn some of the inner workings. Well, we have a program director here, so I think we should let him answer. But my one is, you're an expert at being a clinician. Use that. It's hard to be many things. And there's a lot of programmers. Let them be programmers. All right. That's one thing. I think when I was a med student, I had an elective where I converted a stimulus response reaction machine into a key press, and I learned that in a month, it's hard to learn C, and then generate an accurate code, so I kind of, okay, this is being recorded, I'm in trouble. But my sister was a CS major, helped me do the engine, I did the interface. But the point is, you might want to consider, instead of learning coding, but think about getting a clinical informatics sub-certification after you finish residency training. Because while it seems like it's two years and seems like a long time, and I know residents don't want to spend that much time post-graduate in fellowship, I think it'll give you perspective and credibility that as you pursue whatever your aspect in the field, whether it's creating your own software product, or consulting, or possibly being chief medical officer, it'll be really helpful. John did one. It was great. We should do one more question, and then we're done. Thank you. Good afternoon. I'm a medical student. This isn't actually a question, I guess. It's more of a regarding to what the gentleman from Yale was talking about in terms of the putting in research and large language models, and kind of cycling back and forth, kind of reminded me of a developing story of a phenomenon coined dead internet. It's kind of when AI, they kind of have made Facebook profiles, and they're trying to go viral basically. And so they make generative images, and at first it was kind of okay, not bad. But then the comments, and the likes, and all of that, they're all other AI profiles. And as it feeds into each other, it's kind of like the garbage in, garbage out, and all the biases come into play. Now if you take a look at it, it's all mush, and they're just perpetuating their own garbage, I guess. I just thought it was interesting. Which is why, if you remember one thing from the talk, can you use ChatGPT to see your patient? No. On that, thank you all.
Video Summary
The panel discussed the implications and challenges of using AI, particularly large language models like ChatGPT, in clinical settings. The consensus was that while AI technology shows promise, it is not yet capable of replacing human clinicians in patient care. Various experts highlighted key concerns such as potential bias, privacy issues, and inaccuracies that could arise from AI-generated data, especially in sensitive areas like mental health diagnosis and treatment.<br /><br />The discussion emphasized that current AI models largely draw on existing social media and internet content, which may perpetuate biases and inaccuracies. For instance, previous attempts to generate images or make decisions based on AI have revealed ingrained stereotypes and prejudices. There's a significant risk that AI could escalate misinformation unless it is trained with more precise, high-quality clinical data.<br /><br />The panel advocated for a cautious and ethical approach to deploying AI technologies, ensuring compliance with privacy standards like HIPAA and structurally addressing any biases. They also recommended that clinicians understand the technology's limitations and harness AI for administrative and documentation tasks rather than direct patient care. Future advancements could see AI helping with clinical support, provided there is transparency about data sources and rigorous validation processes to avoid the replication of biases and errors.<br /><br />Lastly, there was a call for involving people with lived experiences in developing these technologies to ensure they are beneficial and not harmful, as well as a need for policy recommendations to regulate AI's role in healthcare.
Keywords
AI in healthcare
large language models
ChatGPT
clinical settings
potential bias
privacy issues
mental health
misinformation
ethical approach
HIPAA compliance
clinical support
policy recommendations
×
Please select your language
1
English