false
Catalog
Integrating & Deploying AI into Clinical Services ...
View Presentation
View Presentation
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
administrator, a clinical services director, or a physician, or a provider who works in mental health services. It's important to understand the distinction between the different types of AI, artificial or augmented intelligence technologies. And you've already gotten a strong foundation of this from previous prevent centers. Justin Baker has already shared with you about using machine learning and AI for psychiatric assessments in sensing psychosis. Monica Roots covered AI in childhood and adolescent psychiatry. Darlene King discussed the ethics, and Stephen Heiler discussed the use of chatbots. So I'd like to start with how we've been so enchanted by the different types of AI. And indeed, we have so many gems that have graced our technology, not just in the past few years, but over decades. Karen DeSalvo, and with authors Howell and Corralta, published in JAMA about the different eras of AI. We've seen that AI already has a presence in health care. If you think about the different technologies that we use, even as far back as, say, the 1950s, we see here in this first column, AI 1.0, symbolic AI or probabilistic models. And we've seen some of the core functionality and the key features with the training methods that go behind AI. So starting from the top left corner with AI 1.0, you'll see that AI in this era follows directly encoded rules, which we can also call if-then rules or decision trees. Now in order to train these, experts had to create these rules. And these rules were hand-coded into traditional programming. And these type of AI would follow these decision paths that are fairly rigid and coded in rules. So you could ask a series of questions to determine whether a pitcher is a cat or dog. And another example, we've seen systems like IBM's Deep Blue beat the world champion in chess. Now in health care, EHR systems use rule-based clinical decision support tools. So you might have a prompt that appears on your screen that asks you to prescribe naloxone if a patient hasn't been prescribed that in the past year, and they're at risk for an opioid use disorder. But one of the flaws here is that it's reliant on human logic. Human logic errors and bias may be embedded. And if you don't catch it correctly, things may not appear properly in real world situations. And we're going to fast forward to AI 2.0 or deep learning. And this has started around 2011, used to predict and classify information, but it's still task specific. It's not that generic, right? So it's used for specific tasks that require new data and retraining to perform things such as inspecting patterns across a large data set. And these patterns are based on examples. These examples would have to be labeled. So whether they're true or not true. So in AI 2.0, we can classify information based on the training that's given. So for instance, is this picture a cat or a dog? Or how many dogs will be in the park at noon? Some examples of this come up with photo searching without manual tagging. And we've seen photo album software that allows us to search for specific places or faces without us needing to tag things manually. We've seen voice recognition really get more robust over the past decade, built into our smartphones and operating systems, not requiring much training. And we've seen language translation software that allows us to translate between languages. Now in healthcare, we've seen examples of AI 2.0 used, diabetic retinopathy, breast cancer, lung cancer screening, skin condition classification, and predictions based off on EHRs. But some of the challenges here, again, are present, such as out-of-distribution problems, when real-time data might differ from the training data that was used to create that AI. There might be catastrophic forgetting, where it won't remember early parts of a long sequence of text, and of course, bias, which we see show up in those not just AI 1.0, 2.0, but also AI 3.0. And in the last few years, we've seen the explosion of foundation models and generative AI. AI can now produce new images, new text, new sound, and new video. It allows us to perform different types of tasks without data or retraining. We use something called prompt engineering to create new models and behaviors. And the way you train it, essentially, it trains itself. So it can help predict the next word or sequence in text, in generative AI for text. It can interpret and respond to complex questions, such as explaining the difference between a cat and a dog, based off text that was trained on. So we're seeing this embedded in things like word processors, software coding assistants, and specific type of chatbots that are much more open-ended. And on the table, it lists that chatbots are showing under Column 3, AI 3.0, but we see chatbots appear as far back as Column 1. You'll see specific types of language models used, MedPOM from Google, PubMed GPT, and BioGPT. But we see some challenges and risks of hallucination, creating plausible but incorrect responses based solely on predictions. Now, I know we have a special meeting for hallucinations. So another term that we've heard of is confabulations. The type of risk is grounding an attribution. So depending on the type of data or the truth that's presented, it may think that something else is true when it shouldn't. And of course, the term bias shows up again as a risk factor for AI 3.0. Now this is just to show you how far AI has advanced. We can think about all the different types of AI in our midst. In the past half century, Clippy, which is something I grew up with in the 1990s, came with Microsoft Office. But it really didn't do a whole lot, and it really didn't help a lot. And I can say that as a fact because Microsoft removed Clippy entirely from the Microsoft Office system. Other areas where you might see AI 1.0 are our traditional, I hate to say traditional, but it's true, traditional health apps. The Veterans Affairs is one such group. I don't have a conflict of interest with VA because they give these away for free for the general public to use. And a lot of these apps were created by the National Center for Telehealth and Technology and the U.S. Department of Veterans Affairs. They created all sorts of different apps that you can freely download, like PTSD Coach for reporting on PTSD symptoms. And you'll notice that a lot of the type of AI they use is very much AI 1.0, asking you questions based on your mood, your use of substances, your sleep. And it provides sort of hard code encoded rules and decision tree guidance. So let's look at a few examples, like Stay Quit Coach. This particular app was created for those who smoke tobacco products and had an increasing difficulty achieving abstinence. This particular app allows people to assess the use of tobacco over time. And we found that smokers are quite interested in this specific type of app, can be paired also with contingency management to help as a smoking cessation treatment and to reduce smoking. The next app, Vet Change, was designed to help develop healthier drinking habits. It provides tools for cutting down or quitting drinking, tools for managing stress symptoms, educating about alcohol use, and guidance for finding professional treatment. You'll see in these screenshots here, there are specific daily drinking logs, as well as logs for mood urges and logs for situations in which one might drink. Tools are presented that allow people to address specific emotions as well. This one is called Insomnia Coach. For those who have insomnia and inability to sleep, it provides guidance, training, and tips to track sleep patterns over time and providing consistent rules for falling asleep, tools to help you relax your body, and tools to quiet your mind. And there's checklists, guided weekly training plans, and certainly more interactive than some of the tools we've trained with, I myself, with static paper and books. The next app is Mindfulness Coach, also from the VA, allows people to reduce stress, improve their emotional balance, increase their self-awareness, coping skills as well. And it gives them techniques, how to notice and pay attention to what's going on in the present moment, and keeping track of progress over time, and providing assessments as well. So it's interesting because there are a lot of different mindfulness and CBT or cognitive behavioral therapy platforms out there. The wonderful thing about this, and the downside, of course, is that there's a lot to choose from, and they're generally fairly low risk. And in fact, I think that the quality of such platforms, as far back as I've tracked them over the past decade, has really increased over time in terms of the design, and the presentation, and the accessibility. And we've just listed a few such examples, but you can try them. Easy to download and give them a try. So these you could all consider as AI 1.0, symbolic AI, probabilistic AI. If you think about back to our chart, we've seen AI 2.0 all over this past decade. For instance, natural language understanding symptoms, NLU. Here, we see that there's an artificial intelligence system, and we're going to use Google Cloud as an example, and it allows us to detect what the sentiment is. Positive or negative. And one could theorize that negative sentiments should have a higher priority, potentially require more urgency. So for instance, if someone provides text like this in the red box, I think my depression and irritation is getting worse. My girlfriend and I argued and had a bad breakup. Can you refill my medications? This particular text is ranked by Google Cloud as having a much more negative sentiment, those red and yellow boxes, versus the ones that are in green. This is a much more positive message. I feel great. Exercise has been going on. And so you could potentially design a system that uses natural language processing that might uprank or provide more urgency for the items in red versus the items in green, which is more positive. There's a type of artificial intelligence. Oh, here we go. Example for how to deal with cues. This is an example from the late 2010s from Microsoft Azure. This was a screenshot from Microsoft's web page, where it provided an example of it detecting not just a variety of different emotions, smiling, anger, disgust, fear, happiness, sadness, et cetera. It can also guess one's gender, one's age, and whether someone wore glasses or not. Excuse me. Interestingly, though, this was actually withdrawn from the market in the early 2020s. Microsoft provided a detailed blog post that said, essentially, this technology is not really ready for commercial prime time after consulting with ethicists and other machine learning researchers. And this is not certainly commonplace or used much at all, and certainly not mental health care. But people were thinking about using this for customer service. There are organizations that have thought about how to use this for global health. And this one is particularly from the US Agency for International Development, USAID. And you can download this report for free. On the left-hand side of the column in blue, you will see a different overall umbrella of use cases. Population health, individual health, health systems. Excuse me. And pharmaceuticals and medical technologies. Under each row are different areas where AI can be used. So, for instance, under population health, you'll see 1A, surveillance and prediction. 1B, population risk management. For instance, you can sort of think about groups like the Veterans Affairs, where they use population health AI to predict one's risk for mortality or a risk for opioid overdose. Now, individual health, the second row, shows where different areas where AI can be applied for a specific encounter. So, for instance, 2A, self-referral. 2B, triage. And 2B, personalized outreach. And when you're thinking about how you can organize the AI technologies you're trying to implement, you can sort of circle which areas does it address in this chart. So, behavioral change, for instance, is 0.3. Exercise, diet, and wellness. And then we also see things like acute treatment, 5A. We've talked about clinical decision support in my naloxone example from earlier. Treatment guidance, medication prescribing, and then follow-up for chronic treatment, also in 5A, especially with medication adherence. It reminds me of tools that people have been using to detect whether someone has been taking on specific antipsychotics and infections, diseases, tuberculosis medicines. Rehab compliance and dietary compliance can also fall under this category. 5B includes monitoring, especially on inpatient and device monitoring. 5C for AI-facilitated care. Some of the things that we've discussed earlier, or say in Dr. Stephen Heiler's talk about psych-counseling, can fall under this. And then 5D, that's more for surgery and physical therapy. Under the third dark blue row in health systems, that can include hospitals and clinics and such, you'll see things like 7A, medical records. 7B, capacity planning, personal management. 7C, claims processing. 7D, fraud prevention. 7F, coding and billing. A lot of these fall under administrative tools. And the FDA doesn't necessarily claim jurisdiction over these tools. Then finally, in the last row, this has to do more with research trials and the management of medical devices and pharmacy management. 8A is clinical trial support and recruitment. 8B is drug discovery and medical technology research and development. We see other things that may play a factor more, excuse me, in administration and pharmacy and labs. So HEOR, for instance, is health economics and outcomes research. So you can download this as a way to sort of categorize all the different AI tools. And lots of companies out there are out there. We're going to move on to the stuff of the wildest dreams. AI 3.0 really, truly has a familiar sound and that of a human. And this is where we're looking for at this very last red column, column, red column three, AI 3.0. By my last count, there've been over 10, at least 10 different startups that talk up, that try to scribe or summarize things. So this one is an example, voice recognition, translating words into speech to text, and then generating a transcript. This is ABRIDGE as one such example that helps transcribe doctor's appointments into something that's more understandable. Some of these tools claim to be integrative epic, and this one particularly allows you to produce summaries and clinical patient handouts as well, or patient summaries there. We've seen AI Chat get more sophisticated over time. They may differ depending on how much the AI is really updated. So this one in particular is based off of Anthropic Clon, and this is an AI assistant that summarizes text, searches for information, asks questions. And so you'll see that companies are competing and claiming, this one particularly claims that this is quote, much less likely to produce harmful outputs. So you can get the desired output that you want. Microsoft Copilot is the specific example I'm providing because Microsoft has made a very large deal about embedding generative AI into all aspects of their Microsoft 365 offering, formerly Microsoft Office. And not only are you able to use Microsoft Word to draft and summarize and generate text, but you can also do the same in Excel, PowerPoint, Outlook, Teams, and other products. And one of the things to keep in mind too, is some of these are actually protected more if you buy their premium offering. And I'm not necessarily endorsing it, I think it's helpful and healthy to compare these different tools. This is Google Gemini, for instance. It's another tool that we can use. Again, important to distinguish between the consumer and the enterprise versions, because oftentimes we are faced with software that might look or feel the same, whether we use it at a work or school or in a personal setting. But it's important to check to make sure that there is a business associates agreement or a BAA in place in order to ensure that you're HIPAA compliant. So as an example, Google Workspace can be used with personal Gmail accounts. But we also have psychiatrists who are using Google Workspace Enterprise for their small business, their practice, or you might have large health systems. Similar for Microsoft 365, where a lot of enterprise health systems are using this tool. So with Gemini or ChatGBT, make sure that you are using, I'd say in a healthcare setting, the enterprise counterpart. You can protect the privacy of information. Also double check to ensure that the companies are not storing or using the information for their own development purposes. And if they are using it, they're using it in a manner that's legally protected and compliant. So if we think about the liability, it wouldn't necessarily fall on you. It will fall more on the AI vendor. The road to human level performance really truly got shorter. In this slide from McKinsey, a business consulting conglomerate, they have predicted essentially that generative AI, the solid line, has made an advancement on a lot of things like AI creativity, AI art, AI music. And so we're seeing that likely in later versions, we'll see more logical reasoning and problem solving. And you'll notice that some of the items that they predict will be farther away down into the future. Social and emotional output, social and emotional reasoning. Some of the things that we think about for mental health chatbots and use cases, may not necessarily be here. So chatbots can be notorious for losing track of their memory, right? After you go past the token limit, say. And that's a specific tech term. But what McKinsey's predicting will happen and will be refined in the 1930s is just this, social and emotional reasoning. These are screenshots from a journal article that discusses how a publicly accessible chat could be used in low and middle income countries or LMIC for health literacy, explanations, screening, triage, remote healthcare support, healthcare communication, documentation, and training and education. So this is actually chat GPT providing different show diagnoses based off of symptoms that's been given. And this one shows GPT providing feedback, empathic support when presented with mental health symptoms. Now you notice that it's very chatty. There's a lot of discussion. Will this be helpful or not so much? Well, there's been so many different articles that talk about how GPT is excellent at providing empathic sounding responses. And recently we've seen some research that say that responding for inboxes for physicians may be helpful, but it may not necessarily save time. The physician and other providers would still need to edit things. But yes, people are still using AI 1.0 principles for smoking cessation, but this just came out a few months ago. This is one that got recently announced from the Fred Hutch Cancer Center in Seattle, combining essentially chat GPT technology so that it's acting more like a virtual counselor on demand, receiving step-by-step guidance when quitting, tools to help deal with urgings and cravings to smoke. And it was created by Dr. Jonathan Breker and his research team in partnership with Microsoft's AI for Good. So we're already seeing examples of people trying to think of ways to make these chatbots more vividly accessible and more emotionally connected. So this is an example of AI 3.0. It uses an LLM, natural language processing, or NLP, and it has an avatar persona called Ellen here. You can choose what Ellen's appearance will look like, as you can see in the screenshot. It uses a user interface that's very similar to any other messaging system. And it's very fascinating, truly, how far these come. And we've also seen decades old technology come through into the mainstream, books have been written about virtual reality, particularly for the conditions listed in the top right corner, such as specific phobias, social anxiety, alcohol use, PTSD, and relaxation. Books have been written about using augmented reality or virtual reality headsets. And they used to be very large and cumbersome and expensive. They take up a lot of space in a room to power these headsets. These are used a lot for exposure therapy, distraction from pain. And there are researchers like psychiatrist, Dr. Kim Bullock at Stanford, whose research is devoted to this. This is a commercially available platform where a person can get the headset and headphones, get biofeedback measurement, so that the therapist can increase or decrease the intensity of an experience. And this one, we're seeing it deployed throughout VAs and veterans affairs and clinically other private practices. And in this slide, it shows some of the samples of different scenarios, environmental scenarios. Now, a lot of these are very vivid graphics, yes. They may not necessarily be AI to begin with, but you might see the need to have AI responsive characters, such as the people in the elevator in the top right corner, or maybe the behavior of the sharks. And these could use AI or dolphins here in this case. AI 1.0 or AI 2.0 can power this. And we're seeing actually examples in the video game space too, that people are using generative AI to power more vivid conversations with non-player characters and PCs. So you can talk to them as if you were immersed in the video game. In fact, kind of mentioned that there's a set of researchers in Cedars-Sinai who's created a system, Zaya, that essentially they've placed together AI and VR. And so their goal here is to provide immersive mental health support. They provided participants with snippets of human-like interaction, right? They used cognitive behavioral therapy, motivational interviewing, and supportive therapy to feed some of the responses that this robotic looking avatar on the bottom left corner can provide in terms of speech feedback and words to the user. And this actually is something that got deployed to the Apple Vision Pro. It's a 1.3 pound helmet, costs over $3,000. It's probably the first known AI powered chatbot inside Apple's Vision Pro. Something that Brendan Spiegel over at Cedars-Sinai has been spearheading only really at the feasibility study level. And eventually we'll need to actually see clinical outcomes, right? Now they notice in the paper too, if you read the paper, some of the systems doesn't really fully work well. They may ask too many questions or there might be too much of an emphasis on coping strategies. But it is interesting to see how this could work in hospitals who are also looking at using a VR to help onboard new employees more efficiently and improve staff confidence too. All right. So that's AI 3.0. Some of the applications we've been already looking at here, the combining of AR and VR and AI. We've also seen it embedded in things that look like regular apps. Now, a regular typical chatbot might follow AI 1.0 with if-then rules and directly encoded rules, but that quitbot from Fred Hutchins Cancer Center uses much more generative AI conversations. So how do we guarantee that it doesn't provide much trouble to us, right? Something to look for when you're using generative AI to embed in your systems. What AI does it use? What applications is it helpful for? How was it created? Who created it as well? So there's a very popular news article, if you're not familiar with, Hallucinations. Kevin Ruse here is a New York Times reporter who last year discussed his disturbing two-hour conversation with Microsoft's Bing, now known as Microsoft Copilot. Now this was about a year ago in 2023, which ended up devolving, it sounds like, into a very disturbing conversation. Then a year later, he actually released this article in 2024, he noticed that a lot of chatbots have reduced their hallucinations and actually really got less creative. So it's very interesting. He noticed in the AI chatbots, they're not as fun to converse with, they're not as charismatic, they're not as creative, kind of bland more. And so it makes you think, okay, if it's the same chatbot that he's interacting with, how do we keep track of all the changes that a vendor may apply to a chatbot? So we're gonna go over some of the issues to assess. This one is a graph from McKinsey again, that shows some of the things that organizations are concerned about with generative AI. So for instance, they are working to mitigate this with human in the loop, human in the loop. That means that a human actually has to double check the work, and that's because inaccuracy is the number one risk that they perceive. Then cybersecurity, we've been hearing a lot about security issues where people are posting things that are private or highly personal details into chatbots. And if there are no legal privacy or security issues in relationship chatbots, significant other chatbots, that sort of thing, that can be a huge issue. Compliance, IP infringement, copyright, and explainability, can the AI actually provide explanations as to how it came to the answer? Is it gonna replace jobs? Equity and fairness, and we talked about bias as a huge issue in a lot of the way these AIs may not be designed correctly, and then organizational reputation as well. On the next slide, as an example of how accuracy can really change, there was a study that was done on suicide risk assessments that were done through the eyes of chatGPT 3.5, two versions of that, and then chatGPT 4. So they found that even amongst the same version of chatGPT, degenerative AI may actually change the way it assesses suicide risk and the likelihood of suicide behavior, and when they were presented with hypothetical case vignettes. Now in red, you'll see that this was assessed by mental health professionals, and you'll see some variance there as well. They noticed that 3.5 underestimates suicide risk, especially in severe cases. So when you're thinking about the AI, think about the version, the data was created, the manufacturer, essentially is the vendor gonna keep on changing the chatbot or the chat engine? This also applies to when they were looking at suicidal ideation as well, N of 379. In this slide, we see an overview of the key issues, shaping the creation and adoption of LLMs in medicine. This was an image from JAMA, and it shows how some of the LLMs are created. So in this case, there's that pile in the top right left corner. The pile includes sources from PubMed Central, Wikipedia, preprint archives of journal articles. The LLM is created through self-supervision to learn patterns to predict the next word. It's basically auto-correct on steroids. And you can see the examples of instructions here in these blue boxes on the left-hand side, and the responses are either red, yellow, or green. So when it comes to instruction tuning, for instance, it shows in this red dot one here, how untuned LLMs are not good at following instructions. But if you do instruction tuning here in red dot two, the LLM can learn. Then you have red dot three, reward-based tuning. The large language model learns from experts that provide feedback, and then they reward the green correct responses and penalize a red unexpected, hallucinated, or inappropriate responses. So when you're thinking about implementing an LLM or AI, what training has it been given? What adjustments has it been made? Is there any reinforcement learning or reward-based tuning that allows it to get better over time? Or do you really want that to change over time? So things to think about if you're embedding in your health system. Summaries, clinical summaries. This actually shows you on the left-hand side, issues in clinical summary generation. An issue of variability, where summaries can actually change just due to the randomness of LLMs. Red dot two shows something called sycophancy, and that's a bias in which LLMs try to tailor it to what they think you want. All right, and red dot three, that's narrative, complete the narrative errors, and that's where they accidentally provide a small but clinically meaningful error, like a one-word addition that makes it think that it's completed. So you'll see some of the errors that may appear, such as in red dot four, you'll see some differences in the organization or phrasing. Red dot five shows that depending on the prompt you give it, it might emphasize the heart symptoms or the lung symptoms. Red dot six is actually where they accidentally inserted the word fever to the words chills and nonproductive cough, even though fever was not mentioned in the radiology report. So a lot to take on, I know. This is also just in general, you'll see that there are comparisons in the literature about different clinical AI models that are available, how they were trained, evaluated, and published. And so one of the things you can think about is, well, what was the training data? What was fed into it? Is it proprietary? Because some of the foundation models were based off of medical centers, closed private EHRs. Some were purely based off of PubMed, Archive of Scientific and Medical Journals. And then some companies actually keep it a closely guarded secret, and they don't reveal their sources. So some of the other things that we've noticed, regulatory challenges related to the rise of LLMs. First row shows patient data privacy as an issue, ensuring that patient data used for training LLMs are fully anonymized and protected from potential breaches is a huge issue. Row two shows intellectual property as a regulatory challenge. If an LLM generates content that's similar to someone else's work, someone might get an IP lawsuit. The third row shows medical malpractice. Who's responsible if a patient is harmed? Is it the AI developers? Is it the healthcare professionals? Or what if the healthcare professionals decided to ignore the AI recs? Or is it on the institutions that forced the adoption? Row four, quality control and standardization. If anything we've learned today is that regulation is not fully in place to ensure the reliability and consistency in answers. This might change after 2024, but that's the current state. So we need to make sure that in any sort of AI generated medical device, it has to be based in truth. Now there's about six more to go through. This slide shows informed consent at the top, you know, ensuring that patients know that AI is being used. Interpretability and transparency in row two here. How can we be sure that decisions are made or how they're made by the AI? And then fairness and bias. That's the third item here on the list. Regulation is needed to prevent biases. And we've talked about healthcare disparities and ensuring that it truly is helping the people that we serve. Another regulatory challenge is the top row here. Who owns the data? Is the LLM company going to learn from your data? And then the second row, what if we're too reliant on AI and less on human expertise? There's been a discussion about concern about how AI, if it's frozen in time, it's only learning, it's only really doing things that it's based off of what was trained. And so you might lose human experience since then. Third row, continuous monitoring and validation. This isn't just a simple medical device that you just drop in to a clinic. You want to have continuous performance, accuracy, and validity of AI tools over time and across different populations too. So there are so many different ways. You can shake it off and keep cruising and not miss a beat if you don't understand what to ask. This next slide shows some of the early work that John Torres, John Lowe, and myself and colleagues put together for the American Psychiatric Association's app evaluation model. When you're designing apps, you first think about the ground level, that purple row, the context and background of how an app was made and who made it, the pricing, where you can get it, how often it's updated. In blue, that's the risk, the harm. What are the side effects, in other words? Green shows the evidence. Is this based off of gold standard evidence? What's the potential for benefit? And ideally, of course, it'd be based off of rigorous clinical trials and not necessarily like compared to any sham placebo. So something to watch out for, some products claim to be evaluated by the FDA. The FDA may have simply cleared it, FDA clearance, which shows that it's no further harm than say sham or placebo treatment. Yellow, ease of use. I think this is improving a lot with apps over time, the usability and adherence. And then finally, in red, how can we ensure that the data is transferred or shareable with an EHR? The APA has a section devoted to evaluating apps and examples of how to do this. And a lot of apps, as we've seen earlier, embed AI. Another site you can also look at is mindapps.org. It's a database of apps supported by John Torres at Harvard University. And if you're thinking about different apps to embed into your health system, you can think about all the different filters and criteria here. Think about also the different certifications or frameworks that you may need to comply with. We know that HIPAA's legal requirement includes the privacy rule, the security rule, and the breach notification rule. And there are different certification bodies out there. HITRUST-CSF is one such body. It's been around since 2007. And it includes common compliance and risk management elements that can be essentially presented in a much more logical fashion. So people achieve this through audits, automation reports, and such. But it's a third-party audit and certification process. So you could look for vendors that do this certification and see how it embeds other legal privacy and security rules. HIPAA itself is a federal law, so it's not a certificate. And then HITRUST here, that is a certificate that you can potentially pursue. Now, there's another type of ability to check one's privacy and legal and security for a vendor. And this is called SOC2. Not a lot of health systems are necessarily using this for IT. This is actually born out of the accounting industry, established by the American Institute of Certified Public Accountants. But if you think about it, financial information is very much protected, much like health information. And SOC2 covers things like security. How are people accessing your data? How do you disclose unauthorized access? How often are your systems available? How is the information kept confidential and private? And these are very, very sort of lengthy things that can go on in a very robust report. You'll see that in the next slide, there are different types here of SOC1, SOC2, SOC3. They can all provide different levels of assurances to you as a company or you as a clinic that a vendor has some security and privacy in place. There are different types of reports, like a type 1 and a type 2 report. And you'll notice that type 2, SOC2 type 2, can take up to 12 months to complete. Much more thorough, costs more time, costs more money, and really, truly only large companies end up pursuing SOC2. It might not be the most common, but it's just another tool you can potentially use to evaluate software as a service vendors, so you know exactly what's going on behind the software company's infrastructure and how they're functioning. You can also use other guides out there. The Veterans Affairs, again, you can freely download this, and they have a mobile health practice guide that you can use to think about how to implement these technologies. The American Medical Association, the AMA, also has their own framework of their behavior health integration patient journey. You'll notice on the blue line, from left to right, it shows the journey of a patient, and the patient goes from dot to dot to dot, intake, screening, evaluation, treatment, and engagement. And you'll notice that all of the rows here span different aspects of the timeline. This is very similar to the USAID AI report that I showed you earlier. But you can use this as a framework, again, to assess where AI falls in your patient journey, even if this is specifically for primary care. You can apply this to other behavioral health clinic workflows. The report is freely downloadable, and it provides specific examples from different groups, such as large academic health systems and national capitated health systems as well. Now, a lot of different groups out there have been thinking about what kind of guidances do we need for AI and how to implement it. Again, citing the AMA, they produced a pretty helpful health AI policy document. We have some cartoons here to make it a little easier to digest, but it's a fairly brief policy document that you can read in probably about an hour. And they address AI as augmented intelligence to emphasize how AI must be accompanied by expert oversight. The policy discusses, especially for physicians, how physicians need to be engaged in AI policy development and risk assessment. This particular slide shows how there really, again, isn't a whole lot in terms of national frameworks. CDS isn't really regulated, clinical decision support. And ideally, we'd have national governance policies for patient safety and equity. And we need to think about the different items that may be harmful to us when it comes to AI. The next slide talks about the transparency that we need in such AI technologies. And here, you'll see that, ideally, there'd be disclosure and documentation of AI use in patient care, access in medical decision making. There could be disclosures on AI-generated content and their impact on patient care. This next slide shows what to disclose, particularly for AI developers. And we've talked a lot about these elements already. Ideally, AI developers will disclose liability and key information so that we can evaluate their technology. What kind of regulatory approvals are in the works or have maybe failed, data use policies, consensus standards utilized. And then, what kind of problems is it solving? This next slide shows that data can be used for training, validation, as well as how much data is being used, the size, the completeness, timeframes, the diversity and the labeling accuracy of it as well. And whether clinical experts were involved in the validation of this. And what sorts of issues have been addressed to help resolve hallucinations. And finally, how was the team composed? Did it include physician involvement? And what kind of conflicts of interest are there? Thanks for sticking with me. We're going to continue on and talk about some of the issues too here. We know about general AI. It summarizes some aspects of general AI and also where it can be used for. With generative AI, these are things we should watch out for, incorrect, falsified responses, out-of-date responses, and stereotypes as well. We do want to keep in mind the risks and limitations associated with generative AI and watch out for non-HIPAA compliant tools. This next slide shows how we may want to think about liability as well. Who's going to be liable for issues that may arise with use of AI? And then this slide shows how it's important to protect patients' rights and also the data holders' responsibility for protecting those rights. And ideally, we'd get generative AI training on topics such as risks, data breaches, and more. So it's very, very consistent with HIPAA. Some of the things to keep in mind too are the growing concern of cyber threats using AI in healthcare, especially since attackers can craft convincing or authentic emails and use phishing techniques that entice people to click on links, ensuring strong protections against manipulation and attacks, and monitoring, continuous monitoring, which is what we talked about earlier, for anomalies and behaviors in AI outputs. One of the things that also we should be concerned about too is automated decision-making for prior authorizations and also access barriers and limited benefits as well. So a lot to consider. You can download this from the AMA website for free, but in general, use really is truly getting started. And there's a lot of ethics, a lot of factors to think about in this. So I'm so, so glad and grateful that we're here together to learn about the elements of AI, much like we did with biology, biochemistry, pharmacology in our training. We're now understanding these elements and building blocks for how we can best use AI and avoid some of the harms and risks as well. Thanks so much for joining. So at this point, what we can do is we can field any questions you might have. What you can do is you can post them into the chat, the Q&A box, and we'll do our best to answer some of the questions. I do see one particular question about taking care. So thank you so much for your support here. So some of the things, while you're thinking about some of the questions that you might have, you might think about, okay, well, how do we sort of influence these policies? And how do we have a voice in this, right? This is going to be our practice, right? And one of the things I think I could encourage you to consider are the call for APA components. There are several committees that are dedicated to mental health information technology or mental health IT. There's one on telepsychiatry, and there's the committee on innovation, which in full disclosure I'm part of. And so you can consider applying for these components and being part of these committees as well. So anonymous attendee, one particular attendee asks about a lot of the different technical terms. The jargon has been new. Are there any sources that we can use to better educate ourselves? First off, being here is certainly one area. Another area that I found particularly interesting is training courses on AI. So for instance, on LinkedIn Learning, there are a lot of courses. You might find some on Coursera as well. And you can just listen in on how people are using generative AI for research, generative AI for streamlining and improving your efficiency. And I do predict that we're going to see more used for clinical practice as well. All right. The next question asks about 60 to 90 to 180 day goals for AI integration into clinical practice. I think about sort of organizational change principles. And so this is an area of study in healthcare administration where you think about how we can best bring along people for the ride when it comes to any sort of changes to implement. This can include interviewing the affected users, interviewing people for their needs, and then trying to find and identify super users or champions, much like you would do with an electronic health record systems implementation, which you probably have witnessed over the past 15 years, for better or worse. And I would think that, you know, AI is not automatic. It's not like just dropping an app in and saying, okay, go ahead and see what happens. I think it's important to guide people in slowly, see if there are resources available. We call it adoption guides, too. I think Microsoft particularly is good at this, where they have adoption guides and flyers, pre-made emails, things to help people adopt technologies. You can see how people like it or not like it. There are also journals out there, like the Journal of American Medical Informatics Association, JAMIA, where people discuss evaluating technologies and what works and what doesn't. And there are also user conferences, too. So if you're using Epic, for instance, Epic has several user conferences a year where they're trading stories and tips and tricks. So some things to think about when it comes to user adoption. I don't think I would force a specific timeline necessarily, but this all has to be in service of the patient and us clinicians, too. So the next question that we've got is about what would be your recommendations for a graduating fellow who's interested in generative AI? Well, one of the things that you can think about is the realm of clinical informatics. This is a subspecialty under the American Board of Preventative Medicine. And there's actually a whole field of study where physicians and other professionals, other healthcare professionals are learning how to best implement technologies. So you can learn from some of the review courses such as the AMIA review course, for instance, generative AI is still in its infancy. And in which case I would look at some of the commercial training opportunities such as LinkedIn learning, Coursera, even sort of publicly available videos, very, very easy to experiment and learn. The thing to keep in mind is HIPAA compliance. Just watch out for HIPAA compliance. Do not use any of the patient identifiers. You might think it's not an identifier, but it might be so. And then another person, one of our attendees asks, how do you feel about AI generating notes from a few sentences? I think as long, so this is an opinion of course, but in the days of, even in the past 15 years, we've had transcription AI, right? Where you may dictate into a tool and it would transcribe your words for you. But we would always have to review those transcripts to make sure it actually transcribed things correctly. So I've heard of some studies where they found it really saved time. But my caution is not every tool is going to implement things correctly or from the get-go. So we know that from transcription engines too, the way that Apple versus Google versus Microsoft transcribes things, all different, right? So you want to test and test and test and experiment. Another question by an attendee, what additional considerations should be considered when working with different populations like CAP or justice involved? Is there any possibility of using AI scribes in the jail population? I think it depends on how your institution is set up. You do have to talk to your legal team or your risk management team and make sure that they okay whatever technologies you use. There are some for private practice, of course, like say if you're a CAP or Child and Adolescent Psychiatry where you can evaluate the technologies. But as long as you're signing some sort of business associates agreement, BAA, that's gonna be the number one thing to make sure that so that the data doesn't leak out. And you transfer the liability onto the vendor. Some of the other questions we have, the ones that I'm able to address, I will address. An attendee asks about evaluating the confidentiality of an AI scribe vendor for taking clinical notes. So part of it is, well, you'd have either you actually dig into the code, check out the infrastructure set up, are the servers set up so that there's proper security measures in place. Frankly, we don't have time for that. And we actually need to, and it's not feasible, right? Cause it's a lot of things to wade through. So some of those certifications, some of those reports that I mentioned can be a reasonable shortcut because they are on the hook for reporting to an outside auditing agency. Much like joint commission, much like CARF, that this other certification agency is putting their stamp of approval that yes, a vendor is keeping things confidential or private. So that's how it would go about it. Another question we've got, so many good questions now. Does one have to develop our own AI tools or do we use previously developed ones? You know, this is the buy or build question, right? Do you buy something that's pre-made or do you build something on your own? And I think that what we're finding is building is on your own can be very expensive. It's upwards of at least 700K to, if not million plus to develop your own foundation model. So you ideally use existing tools that are out there and then you can customize it for your own purposes. So a lot of questions here. I think that if I may just answer really quickly, AI to help with board exams, give it a try. I would love to see what you do. We may ask Dr. Heiler as well. Hi, Dr. Heiler, good to meet, good to see you. There's also another question about reputable websites and resources to learn more and then AI for helping with psychometrics. And then finally, this is something I don't have as much familiarity with, AI applications screening potential patients for psychedelic assisted psychotherapy. So I'd like to introduce back to the stage, Dr. Stephen Heiler and welcome back. And then Dr. Manu Sharma, thank you so much for joining us. So at this stage, we have seen so much over the past hours or throughout the day, our artificial intelligence for mental health. And I'd like to ask you each and we'll go around the room about our key takeaways, the things that we want our audience to keep in mind. And maybe what we can do is we can take two or three minutes each for some concluding remarks. If it's all right with you, maybe we can go in the order of people who have appeared first. So Dr. Heiler, who joined us, then Dr. Sharma, Dr. King, welcome back to the stage. And then myself, we'll go take two to three minutes each with concluding thoughts on what the audience should take away. Dr. Heiler. Hi there, I hope everybody can see and hear me. Stephen, am I on now? You are great. Okay, I'm sorry I didn't get to hear all of your talk. I did get to hear the tail end when you were talking about assistance for board examinations. And I have my own personal feelings about that in the sense that I really think we've missed something by not having live patient interviews anymore and replacing that with, I guess, multiple choice questions or whatever. That's a whole other topic. I believe that artificial intelligence can help us a lot, whether it's in our preparations for the boards or in our clinical skills evaluations. And I think at some point, just like in major league baseball, we're going to have balls and strikes replaced by computer assistance rather than a live umpire. I think at some point we might consider whether or not we have examinations which are done with the assistance of artificial intelligence, getting away from human bias, and then we just have to deal with artificial intelligence bias, which is a whole other topic. And I think with that, I'll turn it back to anyone else that wants to comment. Dr. Sharma. Yeah, I think from my two talks, I think the biggest takeaways for me or in conclusion, if I want them to remember one thing is oftentimes we compare technological applications to the best therapists, to the best level of care out there. Unfortunately, that's an unfair comparison because I would challenge everybody to ask themselves a question. I was like, how many people do have access to good quality therapy or good quality psychiatric care, right, even when they're close to big centers? So the idea is that we have to compare these technologies to nothing, right? Like what, are they better than nothing? So that's the bar they have, at least. And then be very mindful about where we deploy these apps and where we deploy such technology. And again, as far as markers go, I think psychiatry has been in pursuit of biomarkers for several decades now with not very little to show for it, right? So I think the idea is that language and technology-based tools can offer that scalable solution. And hopefully if we reach enough scale in terms of the number of participants we're able to learn from, I think we can come up with generalizable solutions. So those are the two biggest takeaways for my two talks and from the questions that I came across. Thank you. So I'll pass it on to Dali. Dr. Kang, sorry. Yeah, well, I think that it's been really wonderful where we've gotten to learn so much about AI in one day. So some key things that I would have listeners take away from my talk is to be mindful of AI solutions that you're considering or thinking about using. I think that it's important to be aware of automation bias. Just because a computer is giving you an answer, it looks great, you still need to check it. And even if it's embedded in your EHR, still kind of doing a thorough review and just making sure and not always seeing, okay, this is gonna be true and accurate because I think this technology is still in its infancy and there's still a lot of room for improvement and development there, especially with how we integrate AI systems into our healthcare organizations. And so I think there's a lot of research to be done, new things to learn. And that's why it's so important to really try to evaluate and think through what AI systems you wanna use and be mindful of HIPAA and how you input in different patient health information. Great, Dr. Qian, back to you. And so some of the things I think about and I'll spend maybe a minute or two with my final concluding thoughts before I hand it over to Dr. Baker for his concluding thoughts on key takeaways from the day. This is just so exciting. I gotta say, when I was growing up, we had Dr. Spazzo and some ELISA-like tech chatbots, adventure games. And now we're thinking about technology that may have much more enhanced understanding and capabilities, but we still have a ways to go. I think that if anything, we've heard some questions from the Q&A about, how do we learn more about generative AI? How do we make sure that what we're using is truly sound? And already people are thinking, our audience here is thinking about different ways of using AI from getting coaching, right? And to helping with psychometrics. So I would just encourage our attendees to brainstorm with us. Their AP annual meeting will have a lot of talks with the people that you see here on our screen, as well as in courses, symposia, and the mental health innovation zone. So if that's the key takeaway is, know whom to call and where to look for to keep learning more. Dr. Baker, what are your thoughts? Yeah, thanks, Steve. So, I mean, I think for me, the key takeaways are one, that as a field, we need to move more towards starting to measure things. It sounds simple, but AI systems can't work without the data that feeds them. So I think as we become, reach our maturity as a discipline, it's gonna require being willing to record sessions, put Fitbits and trackers on people, record their behavior. We're ultimately here to help people optimize their behavior and their mental, but also what they do on the outside. So that would be one takeaway, is just that whatever kind of practice you may have, there are opportunities for beginning to add measurement, more formal measurements, not just surveys and paper and pencil tests like what Steve was mentioning, but really capturing the nuance of what happens with your patients. And with that, I would also just say that I think a lot of the work to be done going forward is gonna really be tight collaborations between more quant type people and more people who are really masters of their craft and to take clinical pearls, things that you may have learned over your lifetime of practicing that are really great cues for you and help design tools that could identify those clinical pearls in larger populations. So I think what I spoke about was really more like using AI and machine learning for all of these things we can do for the assessment. And ultimately that's gonna be a way to move this forward so that it's being used with really the best clinical minds being able to contribute to it. Thanks, Dr. Baker. You know, in the remaining time we have, we have about five or so minutes before we conclude. And I think what we can do is we can answer the remaining questions that our audience has and share our thoughts there. I will start with the last question that I posted in our internal chat. This question asks about recommending reputable websites, reputable resources to learn more. Would anyone on our concluding team like to answer their thoughts? I think every year with the Mental Health Innovation Zone, we cover a lot of topics. So I would encourage everyone to come in and to our panels during the annual meeting. I think we have similar recorded sessions throughout the year as well. I remember having one last year and the year before that, and they're available online on APA for people to view and learn from there. I can't think of any other reputable sources per se. But again, you can, I think there are lots of ways you can use the same kind of skepticism you would use when you are introduced to a new drug. I would say that can be the same strategy with digital interventions as well. Great, thank you, Dr. Sharma. Any other responses to recommending reputable websites or resources? Well, this is a reputable website, Stephen, but today right in the New York Times, Kevin Roos posted another one of his interesting articles about how do you judge the chatbots themselves? So if anyone has the Times today, I'd recommend that you read that because they're trying to decide. I spoke about the chatbots as having different personalities, but I guess the question is, how do you know when to go to which chatbot for what? And I'd like to know what the rest of the panelists think about that. And for context, Dr. Roos, Kevin Roos was authored a series of articles over the past year and a half about how some of the chatbots have maybe done a little bit more damaging conversations and then more recently, not so much. Thank you for mentioning that. Anyone wish to respond to Dr. Heiler? I don't have any good response aside from the fact that we will at the APA, Mental Health Innovation Zone, be hosting a whole session on chatbots and chat AI that if you're gonna be at the meeting, I'd encourage you to come and join the conversation there. Awesome, thanks for mentioning that, Dr. Baker. And there are also quite a few other sessions too. Jesse Ehrenfeld from, I believe, the American Medical Association is also giving an opening morning talk on Saturday. So that's exciting to see at the APA annual meeting. And then Dr. King, you also, do you have something that you'd like to share too? I see something that you've posted. Oh, well, so this was a question about repeatable websites. And if you wanna learn more about particular large language models, their Stanford Center for Research on Foundation Models has a website called HELM, Holistic Evaluation of Language Models. And you can dive as deep as you want into at least 81 models. So that could be something if you wanna look into, you can. That is remarkable, 81 models. So Stanford Center for Research on Foundation Models, HELM. Great. I guess one other observation you might've read about recently, but it seems like the LLMs are running out of data. And that's gotten to be a very big issue. In fact, they've gotten past the fact that they've gone through all the texts and all the social media. And now what they're trying to do is extract all the data they can from the videos that are on YouTube. And I believe I was reading about a program called Whisper, which is able to actually do that, extracting data from videos, which I think is interesting. The whole idea of the internet and the LLMs running out of data is just mind boggling. And thank you for mentioning that. There are lots of video AI generators now that are out in the market. And so it's interesting to see where things go from here. Truly just amazing. Well, we are unfortunately out of time together. And I just wanted to thank our esteemed speakers here, Dr. Heiler, Dr. Sharma, Dr. King and Dr. Baker, especially a shout out to the APA students and especially a shout out to the APA staff whose hard work made this possible. Shazia Khan and her education team have really made this a truly successful event. So this concludes the program for the virtual immersive, exploring artificial intelligence in psychiatric practice. And thank you so much for joining us today. Your engagement and contributions have been invaluable. And we hope you found today's program enriching and informative. All sessions from today have been recorded. They will be accessible to you on the APA Learning Center following the APA annual meeting, which is scheduled for May 4th through May 8th in New York City. You can now begin claiming credit for the sessions you attended today. Should you have any questions or concerns, please contact the APA Learning Center at learningcenteratpsych.org. On behalf of the APA, thank you once again, and we look forward to connecting with you soon.
Video Summary
The transcript outlines a comprehensive discussion on the evolving applications of AI in mental health, highlighting the different generations of AI technologies and their implications. AI's development is segmented into eras: AI 1.0, 2.0, and 3.0. AI 1.0 includes rule-based systems like decision trees and early examples like IBM's Deep Blue. These systems are used in current healthcare technologies like EHR decision support systems, but are limited by human logic and potential bias.<br /><br />AI 2.0 focuses on deep learning and pattern recognition, improving tasks such as voice recognition and language translation. However, it faces challenges like out-of-distribution problems and biases. AI 3.0 introduces generative AI, capable of creating text and visual content, promising applications in healthcare for tasks like summarizing medical appointments or patient data.<br /><br />The discussion emphasizes the integration of different AI technologies in healthcare apps and systems. Various mental health apps and VR technologies are explored, showcasing their uses in therapy, patient monitoring, and clinical training. Challenges like bias, hallucinations, and data privacy are recurring concerns across these advancements.<br /><br />The conversation points out the growing utilization of AI for administrative tasks in healthcare, such as medical records management and drug discovery, with a strong focus on maintaining ethical standards and implementing robust privacy measures to protect patient data. Lastly, numerous resources and recommendations for learning about AI, along with strategies for integrating AI tools in clinical practice, are suggested, underscoring the necessity for regulatory frameworks and continuous education to mitigate risks.
Keywords
AI in mental health
AI technologies
AI 1.0
AI 2.0
AI 3.0
deep learning
generative AI
healthcare applications
data privacy
ethical standards
regulatory frameworks
×
Please select your language
1
English