false
Catalog
Evaluating AI, Apps, and Digital Solutions
View Presentation
View Presentation
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Welcome. It is a pleasure to be here with you all, and I am so excited to be sharing about how we can evaluate artificial intelligence, apps, and digital solutions. I am a chief technology officer for a mental health technology startup, and I also teach physician trainees at the Stanford University School of Medicine. First, I wanted to start off with the American Psychiatric Association's app evaluation model. This is something that we put together approximately sometime around the 2017-2018 time, when the APA started to do a lot more work into technologies, especially with the explosion of mobile apps. And the evaluation model, while created some time ago, has been timeless and has been cited widely in a lot of journal articles, a lot of press, and has been seen as a model for other groups to base their evaluations off of. So I wanted to provide this for you so that you, too, can evaluate any sort of technologies that might come your way if you're considering getting the technologies incorporated into your individual private practice or for your large health system. So let's dive in. The APA app evaluation model roughly focuses on these steps of this shape here that starts from the bottom to the top. Now, I'm showing a screenshot or a diagram from a specific article from Current Psychiatry, but you'll see this essentially replicated similarly across the literature, as well as the APA's own website. We'll start with the purple, which is ground, where we'll look at the context in the background. What's in blue is where we then assess for harm, risk. Then we'll look at the green row. That's the evidence, the potential to benefit, whether the clinician's benefiting or the patient is benefiting. Then in yellow, we have the ease of use. That's the usability of the technology and how well someone might adhere to using it. Then finally, the very top of the pyramid, here, we see that star. That's interoperability. That's where we can share data in a meaningful way between apps and electronic health record systems and between apps and other apps as well. So let's dive into the different levels. What we'll do over the course of the next few minutes is talk about this framework and then how it's been applied in other websites and other resources. Then later in this talk, we'll talk about the different types of artificial intelligence, the different things that can go wrong with artificial intelligence, in other words, things to look out for. We'll also go over some of the frameworks that other groups have put together to evaluate these digital solutions, as well as the different coalitions of groups that are looking at AI evaluation. So let's dive in. So before prescribing an app or device, you can evaluate it from one to five. In this diagram, number one is ground, the ground level. What is the digital solutions business model? Who's developing it? How much does it cost? Are there any in-app purchases? Is it free? And if so, how are they supporting the financial sustainability of this digital solution? One of the biggest issues with apps in the past has been that apps would come and then come and go because they weren't financially sustained. And so that's important to know so that you don't adopt a solution that's just going to disappear after a year or two. Are they using advertising? One of the biggest issues with apps that has been highlighted over the last few years has been that health systems, in particular for their websites, incorporated advertising technologies that potentially could have identified patients. What operating system is it using? And when was it last updated? You want to sort of see whether a developer is regularly updating their software. And if it hasn't been updated for a few years, that kind of goes hand in hand with the financial sustainability. Let's take a look at number two, privacy and safety. Do they have a privacy policy? What kind of data are they collecting? Are they protecting or de-identifying data? Are they opting out of data collection? Are they deleting data in accordance to policies that work with you and work with the laws in your area? Where are they physically storing the data? Are they securing and encrypting the data? And are they compliant with HIPAA and CFR? Now, you'll notice that I don't say anything that's prescriptive, like you have to have X, Y, or Z. These are questions you want to ask and you then want to evaluate whether this is something that your health system or your clinic would work with. What's the evidence? This is number three here. What's the evidence? What's the functionality versus the claim functionality? Are there any clinical studies? Are there any data evidence that can back this up? And what's the feedback from users to support the claims that are coming up from the vendor in app stores and review sites? And does it appear to be of reasonable value? Finally, near the very top, ease of use. No one wants to use a hard-to-use product. No one wants to use a product that's not that engaging. So, we want to look at whether it's easy to access, whether it's customizable, whether you can use it offline, and whether you can use it in hard-to-use settings, such as places with limited internet functionality. You especially want to look at accessibility. Is it something that those with impaired vision or other issues can still use? And then finally, at the very top, and we place this at the very top because a lot of apps and digital solutions may not necessarily have this ability, but we feel that it is important to just include because it does factor into ease of use. So, interoperability and data sharing. Can it share data with electronic health record systems? And can you get your data out of the system? Can you print it, export it, or download it? And can you share it with data with other tools? So, those are some things you want to consider when you are looking at things, this sort of framework from one to five. Now, the American Psychiatric Association has a website, and a work group of us got together to look at different apps, and we provided these as samples for you to follow. You can look at these sample app evaluations on psychiatry.org, and these are different apps that we have gone through with this process. Other groups, such as one of the work group members, John Torres, has, and his team, put together this website called the mHealth Index and Navigation Database, mindapps.org, and there you can actually search through a lot more reviews and filter through things like cost, and who the developer is, features, and then evidence and privacy. And those are sort of things that you can freely look at on this website. John Torres is based at Beth Israel Deaconess Medical Center, affiliate of Harvard Medical School. So, now we've looked at how to evaluate apps, but, you know, there's so much more to consider now with artificial intelligence, especially with the explosion of large language models. Large language models, again, are essentially a part of artificial intelligence that has enhanced exceptional language capabilities, understanding your language and responding to it with very, very natural-sounding, very confident-sounding language. So, let's take a look at how you can conceptualize AI. If a software vendor comes to you and says, hey, I want you to purchase my AI solution, you can then ask, well, what kind of AI is it? And there's a table I want to present to you in just a moment that divides up the types of AI, and so that you can understand the different types more, and so you can look out for some of the pitfalls and the nuances with AI. So, this is a lot to absorb, but we're going to go take a look at this a little bit more in depth. Just take a look at these three columns. The first column that you see highlighted in red, which I'll actually highlight with a red dot one, that's AI since the 1950s. Can you believe that? Almost a century ago at this point, I would say, we have had something called AI 1.0. And AI 1.0 is what we call symbolic AI, probabilistic AI. You'll also see here in red dot two, AI 2.0 from the 2010s and onwards. That's machine learning, deep learning. That's things like enhanced voice recognition systems, photo recognition systems, and language translation. And then finally, the reason why we're so, so enthusiastic about AI in the last few years is AI 3.0. This is since 2018. We've seen the explosion of foundation models that can help generate new content, text, sound, and images. So, AI 1.0 follows directly encoded rules, if-then rules, or decision trees. And these are rules based on hand-coded traditional coding, and it follows a decision path. So, you'll see that a lot of this is implemented in rule-based clinical decision support tools. But human logic errors can occur if the person who's coding it has a specific bias, whether they know it or not. AI 2.0 in the center column, that's predicting and classifying information. And that's things like where you can photo search, right? Or without manual tagging, voice recognition, language translation. And we use this quite a bit more in radiology, and ophthalmology, and other medical specialties. But some of the things to watch out for are where you're using, presenting your data that isn't something that's ever seen before in training data. Then finally, in AI 3.0, which we'll highlight right here, that's where we're generating new content. Some of the issues are something we call hallucinations. Now, we have a different meaning with hallucinations in psychiatry, but hallucinations in the context of AI is where it creates plausible but incorrect responses based solely on predictions. And so, you can have errors that are produced depending on the quality of the foundation model, and depending on how it's trained, essentially. So, we're going to go and take a look and see some of the issues that come up with, particularly, these type of artificial intelligence. Now, one of the issues with AI is a lot of vendors, software companies, and groups may say they have an AI solution, but there's so much that goes into the AI black box. And so, you want to be able to ask, well, how was it created? How was it made? What kind of data was it trained on? Which techniques is it using? So, this is very similar to, say, if you're presented with a medication, you want to know, essentially, what are the pharmacokinetics? What are the pharmacodynamics? What are the side effects? All these things that we took years to learn. For AI, I think if we know and understand the lingo and the vocabulary, then we can better evaluate whether this is helpful or not. So, if you just look at this large blue box, that's AI in general. Machine learning is a subset, and then deep learning is an even more specialized subset of AI. So, some of the regulatory things you want to keep in mind with LLMs are listed in this table. There are three parts of this table, three slides. And then what we'll do is we'll dive into each of the different components. So, the first one is patient data privacy. The second one is intellectual property. Third is medical malpractice liability. And the fourth is quality control and standardization. And so, those are the things you want to kind of keep in mind for the AI, particularly, especially medical malpractice, consult your risk management team, your malpractice insurer as well. The second part of this table, I'll just share with you the highlights of the first column, regulatory challenge. That's informed consent. And second, interpretability and transparency. And that kind of goes hand in hand with what I shared with you about black boxes. And then the third one here is fairness and bias. And how can we prevent biases in AI models? And is that something that you can then use? Because we want to avoid disparities in healthcare outcomes. This next slide, data ownership, then over-reliance on AI models. We want to ensure that we're not over-relying on AI to make decisions on medical care. And then the final point here is continuous monitoring and validation so that we can ensure that performance does not degrade over time. There are some other errors that we want to keep in mind. And I think that one of the challenges we have is that it's not quite clear how some developers are mitigating these. But whether you're using Google Gemini or other software products that record your conversations, software that provides helpful sounding chat companionship and search answers to things that are embedded in our regular office software. These are things that we also too want to ask. And I think it's telling that if you look at say Microsoft Copilot or Google Gemini, any of these solutions, they always have a small disclaimer that say AI generated answers may be incorrect. So you want to watch out for this and always check your answers because issues like this will come up. And I'll give you here's an example of something that came out fairly recently. This was featured in WIRED and this is based off of an upcoming article for ACM, which is a renowned software engineering society. Researchers from Cornell University and the University of Virginia studied thousands of audio samples and found that an open AI tool added non-existent violent content and racial commentary to neutral speech. They found that 1% of samples included entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. Can you believe that? And this next slide just further sort of pinpoints what a potential source of the problems are. They've noted that this engine has a tendency to produce outputs like thank you for watching, like and subscribe, or drop a comment in the section below when you give it silent or garbled inputs. So they surmise that open AI trained it on captioned audio scraped from YouTube videos. Essentially where again what the question you want to ask is how did you train this tool and where are where is it getting its answers? Some other things you want to note for for this for issues with LLMs you want to you you may want to look for variation and again these are models that are not they're probabilistic they're stochastic and what that means is that there are different ways for it to output each time. So you might have different types of answers that come out. You might have each time you run it and so that can mean that your summaries that are produced might vary even if you give it the same input. So the organization the phrasing and even whether what details from the clinical clinical encounter might include may be omitted entirely and errors of omission is one concern. Another concern is sycophancy and sycophancy is a form of bias in which the LLM tailors the summary output to perceived user expectations embedded in the prompt and so you'll see the issue here that's highlighted in red is that the LLM is is emphasizing specific parts of the history or parts of the infection history depending on what it thinks you want. And then finally there are complete the narrative errors. Now I think that this is actually much more concerning but they hear a small but clinically clinically meaningful error has been added that completes a clinical narrative or illness script. So they added the word fever to the summary although it wasn't in the original radiology report. There are different ways to evaluate the LLM and this particular one shows us comparing text strings that are manually searched with an LLM's output and then you can compare the two. And then also some some people have harnessed this to train LLMs to produce the right kind of output or even train peer counselors as well. This is research that's coming out from Stanford University on empowering novice peer counselors and how it can peer counselors can potentially score different answers that come out from an LLM. So there's a lot of things to consider and I think that's why it's extraordinarily difficult and it can take months to years to evaluate these systems just to make a purchasing decision. So when some of the things that you also want to consider when you're evaluating technology are what are the requirements and certifications that you need for the app or the digital solution and we'll go over different frameworks out there that are the equivalent to joint commission, equivalent to CARF and some other accrediting bodies. HIPAA will go over, HITRUST, then SOC2. So let's dive in. So HIPAA we know is a legal requirement that's enforced by the U.S. government that regulates how you handle patient data. HIPAA is something that we all are aware of. There's privacy rule which is how are you keeping the data private or what parts are you keeping private. Security rule which is what kind of safeguards have you placed around the data and then the breach notification rule which is what do you do if there's a breach? Who do you notify? Now HIPAA is not exactly a certification and in fact it's just a law, a regulation for us to follow. Now there are some certifying bodies out there but these are not legally required. However your institution may recommend particular certifications but it's not a be-all end-all and I think that in my experience in working and discussing with hospital vendors. Some hospitals may require some of these certifications and some may not, depending on their human power, how much IT staff they have to vet these digital solutions as well. So I'll give you one example. This one is called HITRUST. So HITRUST is a third party audit and certification process. And HITRUST incorporates aspects of HIPAA, but HITRUST also adds other elements from frameworks like the NIST, ISO, which is worldwide, PCI, DSS, which is a more of a finance standard and more. And so that's the HITRUST approach. And you can see that it's a continuous cycle, similarly to how we might continuously certify for board certification or how we might continuously certify in a large health system. And if you compare the two, and HIPAA is a federal law mandated by the US government, but there's no formal certification process. And HITRUST itself is third party. It's certified by a group called the HITRUST Alliance and it offers very, very prescriptive compliance guidance. So, you can ask your vendor if they use HITRUST, HIPAA, certainly, of course. And then there are differences between the different HITRUST frameworks as well, E-1, I-1 and R-2. The differences here are actually quite meaningful and they go from left to right in terms of complexity. E-1 is the least complex, R-2 is the most complex and takes the most time. So, E-1 takes just a few months to obtain, is less expensive, provides, they say, low to moderate assurance. It itself does not necessarily guarantee HIPAA, but it's a stepping stone to their other assessments. I-1 is a bit more complex and R-2 is the most complex. And that includes policies, looking at what kind of policies you have, what kind of maintenance you do and what kind of things you're looking at in your system. This is more from HITRUST's own website that covers it. And you can see that getting a certificate is expensive and costly, but depending on how large your health system is, it may be worth it. So, it's not as simple as taking a CD-ROM and plugging and playing, but this is the assessment that the vendor would do, the developer would do. And so, that's why you may not see this in a lot of solutions that are out there. There is something called SOC-2, but this is more of a finance framework. Some health systems may ask for a SOC-2 report, but again, this can be something that either they optionally ask as part of their existing assessment. And the SOC-2 report is fairly thick. This is essentially something that a vendor will have, and they may ask you to sign a non-disclosure agreement because it contains all sorts of sensitive information about their inner workings. How do they code? Where do they store the servers? How do you access the servers? Who has access to the servers and things like that? And so, this is one single report that takes a lot of time and money to put together. It encompasses different things like privacy, security, confidentiality, quality assurance, and monitoring, and availability. In other words, how often does the app or solution go down for a crash and how well does it recover? You can sort of see this is very similar to the evaluation model that the APA has, but this one comes in a very, very thick report that the vendor would provide. This is another similar diagram that another group has put together for SOC-2, essentially covering the same sort of aspects. There are a SOC-1 and a SOC-3. SOC-1 tends to be a much briefer report about financial statements and reporting. Again, this is a finance report. This is built for the finance industry, but some health systems may be using this as part of their evaluation process. And then SOC-2, right there in the middle, talking about all those elements earlier. SOC-3 is not something I've really heard of much. So, it sounds like it's more for general marketing, general to the public, which is probably why I've not heard of it as much. This kind of summarizes what's in a Type 1 versus a Type 2 report, again. And we're also going to just check out this particular summary slide, you know, which compliance framework is right for you. And in general, you definitely want to have an app or vendor that's HIPAA compliant. That is the bare minimum for legal compliance. And the rest of these are additional assurances for security and privacy practices. And there is a continuing framework where they have to continuously certify. And this answers the question, well, what happens if an app goes stale or out of date? Well, this one can assure you that they're continuously working on it. Costs a lot. Like I said, costs a lot of time, costs a lot of money. For the SOC 2 report, as an example, can cost upwards of a quarter million dollars to obtain a report. But, you know, depending on your needs, you may want to still ask for it anyways. Now, some of the other things you may want to consider are whether it fits policies that medical societies and other groups are providing. And I'll share with you a preview of some of these different frameworks. The American Medical Association, AMA, calls this augmented intelligence. The reason why they call it augmented intelligence instead of artificial intelligence is because they want to emphasize that AI should support physicians and other providers and augment their abilities and not replace them. So the AMA produced a report. It's fairly, fairly straightforward to read. I recommend that you check it out. And it just discusses how physicians can, should engage in AI policy development and risk assessment. And in general, they talk about how there's a lack of national policy or governance structure. Now, we're seeing this today, that there are more and more structures. And I'll give you a preview of those after we go over this, you know, briefly. But one of the, some of the things that we also want to be mindful of are some of the things that we already covered, that there's a need for transparency. We need to have disclosures on how AI uses data and how it provides this content. We also need to ensure that AI developers provide this key information as well as any risks of liability too. Regulatory approvals, consensus standards used and data use policy, as well as what does the AI help with? Are there any particular things that should not be used for, for instance? So data used for training validation, ongoing evaluation, validation data, and the composition of the development team. Which physicians are involved? Are there any conflicts of interest as well? There's a new tool that's being developed called Generative AI that can essentially be used for all sorts of things. We're not there yet. So I'm going to move forward, but it's something you watch out for because a lot of generative, let me backtrack, general AI. I got this mixed up with general AI. General AI is something that's in the works and may be applied to a lot of different situations. This slide talks about generative AI. This is AI 3.0. It's already here and in commercial products. So for generative AI, there's some things that we want to watch out for, particularly just as we talked about, falsified responses, out of date responses. We want to look for bias, discrimination, stereotypes. We want to keep in mind the risks and limits of generative AI. We also, especially, and I want to point this out for you. If you're using chat GPT without any commercial safeguards, if you're using Microsoft Copilot without any enterprise agreement, if you're using Google without a business associates agreement, then they may be using your data for their own internal purposes. And that's a big privacy issue that you want to address. You want to make sure you have a HIPAA, a BAA. You also want to see who is liable. A lot of vendors will have in their clause that they are not held liable, that ultimately, that the clinician has the ultimate say in clinical decisions making. Again, this goes over privacy and security, as well as how risks are mitigated. And all the different cyber threats that may happen, the risks posed by AI-operated ransomware and malware as well. So some other things to look for. And then finally, who's using this? Are they using it for sort of issues that might limit benefits or enact barriers to care? So their policy is very, very good read. I would just encourage you to check that out. You can also check out these slides as well, which will be made available to you in the APA Learning Center. The American Psychiatric Association, we released a position statement on AI. It is also a good read. It's a brief read that talks about all the different principles in a much, much more concise summary that I encourage you to look up as well. Now, I wanted to dive into some of the coalitions that are taking place. And these are groups of people, groups of health systems, groups of clinics, groups of vendors, even, that are working to define guidance for use of AI in healthcare systems. What I'm about to show you are just sort of the different names in this space. For the most part, I'd say these groups are still in its infancy. These groups have not necessarily produced any, except for maybe the last one I'll share with you, the DIME, that's the Digital Medicine Society. But for the most part, a lot of these groups are still putting together a skeleton framework of what they think AI evaluation should entail. And we're gonna see a lot of these, more of these in 2025 and beyond, certainly. So a lot of alphabet soup, I recognize, but I wanted to just introduce these to you because you'll see these as players in the space. This is CHI. CHI is the Coalition for Healthcare AI. And you can see that they've already produced here a blueprint document, that's the very top, and they are putting together an assurance standards guide, assurance standards checklist, as well as best practices frameworks to address all these different things. Their vision is to have guidance for before you deploy AI, when you're implementing AI, and then after you implement AI. And this includes testing beforehand, benchmarking, safety beforehand, to all the way to how do you continuously monitor for performance issues. Performance drift is when it goes off key. In other words, the music goes off key after some time, and it may not perform as well as it used to. This is a draft of what the AI nutrition facts label may look like. This is from the CHI website. It's based off of another group's report. Their model card looks a lot more in depth lately. So this was in a recent report on their nutrition label for health AI. And it goes over, again, a lot of the similar things that we covered in the American Psychiatric Association's model that we introduced earlier, including security, transparency, risks, biases, as well, that may be more AI-focused. It may cover what kind of model engine you're using, what kind of data you're using, and then any ways to mitigate biases. So that's CHI, C-H-A-I. Another group is TRAIN. That's Trustworthy and Responsible AI Network. This one's led by Microsoft. So Microsoft is essentially working with numerous health systems listed in this paragraph. All you really need to know is that, from the discussions in different conferences I've seen with the chief medical officer of Microsoft, they view that TRAIN is going to be complementary to CHI and potentially look at different aspects of AI. It looks like their focus is more on operationalizing AI. So that's TRAIN, and they're also working with numerous groups in Europe. The reason why I'm presenting these slides is because this is pretty much all we have about TRAIN announcements. It's mentioned multiple times in conferences, in discussions behind the scenes, but there really isn't much that's searchable out there as of this recording and presentation. Valid AI is another group. This is led by UC Davis. It was spearheaded by Ashish Atreja and colleagues with 30 founding partners. And Valid AI stands for Vision, Alignment, Learning, Implementation, and Dissemination of Validated Generative AI in Healthcare. Again, this is another group. We're still awaiting more details from the team, but what they've been focused on more lately is social determinants of health. This next group is the Duke Health AI Innovation Lab and Center of Excellence. Essentially, this one, I think that they're, this is a collaboration between Duke and Microsoft. This may be more of a developmental partnership, but I wanted to just highlight their Health AI Oversight Program. That's the Algorithm-Based Clinical Decision Support Oversight Program, or ABCDS, that looks over the clinical decision-making processes that are in their health system. And their ABCDS basically looks at different rounds of evaluation, different rounds of committees before something can get deployed. So you can see that there's a lot of vetting. In this case, this takes a lot of resources and depending on which group you go to, they'll emphasize, okay, is a vendor going to do a lot of the evaluation, or is an external entity doing the evaluation, or is the health system doing the evaluation? So depending on which group you go to, they may emphasize different aspects of this. One of the problems, I think, with health systems taking on the burden of AI evaluating tools is that they may already be stretched thin for IT staff and support and clinical informatics support. Precise AI is a program that ARPA is putting together. ARPA is the Advanced Research Projects Agency for Health. And what they're looking at is performance and reliability evaluation for AI. And again, one of the things with AI is that it degrades over time. It may degrade over time. So we wanna make sure that it's continuously up-to-date. Now, this just got released. This is from the Digital Medicine Society. And they've put together something called the DIME seal. And this is very similar to a lot of what we've seen. Another entity, essentially what they'll do is they'll evaluate them, a software product against a framework of standards and practices. They get a seal of approval and they go through annual review where providers and patients can then potentially be reassured that they meet one level of scrutiny. So this is something you can watch out for. This is how DIME is positioning themselves amidst the regulatory landscape. Again, third-party external entity. You've got multiple different entities that are looking at healthcare services at the top, things that we're used to like the Joint Commission and then technology products at the bottom. You'll see that they didn't really talk about the FTC or the FDA much. Those are two things you also wanna look for. But I'd say by and large with psychiatry and mental health, the FDA has taken a very hands-off approach. More recently, they've cleared a lot of solutions, but cleared means basically that there's no discernible risk that they can see from a software product that claims diagnostic or treatment purposes. So this is why FDA authorization or FDA clearance may not be enough for us. We have additional sort of assurances that privacy and security is being taken care of. And you can sort of see this here in this diagram here that the FDA only covers a portion of this whole certification style of landscape. You can also look at DIME's site to see all the different things that they're offering, different products that have their seal. This next one I'm gonna show you isn't a certifying body. It's not a health system. It's not a vendor. It's actually good old journalism. Good old journalism. So Stat News is a well-known healthcare industry publication that was born out of the Boston Globe. So they're Boston-based. And some institutions, you may check your university, for instance, or some clinics may have subscriptions to Stat News. And Stat News has a generative AI tracker that lists all of the different developers, applications, and the different health systems that are evaluating these generative AI solutions. So one way you can use this tool is to find which product you're interested in, which health systems are implementing it or evaluating it, and then ask the health system, phone a friend, if you will, ask them, hey, what's your impression of this? And you'll notice that as you go through a lot of these enterprise software products, a lot of this information is behind paywalls or word of mouth. This particular group has a paywall. So you have to have a Stat Plus subscription. And then similarly, there are other evaluation groups like Forrester or Gartner that may provide these enterprise reports. Now, I alluded to phoning a friend, and I wanted to also encourage you too to also meet other people who are looking into these technologies. There are so many conferences out there. It's incredible. HLTH, the Behavioral Health Tech Conference, McLean Hospital has their own technology and psychiatry summit. Stanford Psychiatry has some resources as well. And there are other groups. The U.S. Department of Veterans Affairs actually has a freely available lecture series as well as Psych Congress and Elevate, which has their own sessions on AI and digital therapeutics. But one of the ones I wanted to highlight is the APA's very own Mental Health Innovation Zone. And the American Psychiatric Association has different committees. There's a committee on IT, committee on telepsychiatry, and a committee on innovation, as well as different courses that are available at the annual meeting. I also wanted to point out that the American Psychological Association also has programming as well. For instance, they have one at the Consumer Electronics Show in Las Vegas, their expo as well. But I do hope that you will join us at the MHIZ. My colleagues and I have put this volunteer time to put this together at the annual meeting, as well as other virtual immersives as well. So with that, I wanted to open it up to any questions you might have. You can submit questions with the Q&A section of your Zoom control panel as well. So, while we are waiting for folks to submit their questions, you know, one of the things I, some of the, one of the most common questions I think we get is, okay, which product should I buy? And which app should I implement? Which AI scribe should I use? And the answer, I hate to say this, is that it's complicated. I was at a recent conference where one of my colleagues at Massachusetts General Hospital talked about how they have been evaluating two different scribing solutions. And they noticed all sorts of, you know, all sorts of great things about the AI solutions, how it saved people time, how it saved people a lot of energy. And they really liked their job again. They really felt positive. They felt like they were able to make connections with their patients. On the other hand, this group noted that there were errors of omission. Sometimes they would make up medication names. I know, I can't believe it. But, you know, those are some of the errors that they found. And so people would still have to look through their notes and such. They also presented on how it didn't really save them time for inbox message drafting. And so that's one thing I think you want to watch out for is, well, okay, AI solutions, you can certainly evaluate them, but it may not work for every, you know, everyone. They may vary in quality. The ones that they looked at were integrated into their electronic health record system, which is epic, but that was their purchasing criteria. That was what was important to them. Your needs might be different. If you are a psychiatrist who's covering 10 different hospitals, all with different EMR solutions, paper solutions, et cetera, you probably want to have your own, you know, HIPAA compliant with a BAA scribe for your own practice that you could potentially apply to different EMR solutions. So that's something you want to consider that there's no one single answer. But, and another thing too, is we're not at the point where we're having a single body that's continuously evaluating these, like consumer reports or something like that. You know, what I've seen is word of mouth, discussions in committees, different presentations like this one, as well as even discussion forums like Psychiatry Network on Facebook. Those are just different venues where you can potentially look at. And also another forum is Physician Site Gigs or Physician Network. There's a lot of different people who are sharing their experiences there. So this next question is, at the last APA meeting, a section showed avatars for therapy psychiatry interviews. Any updates on that? So there are some groups and on full disclosure, I am leading a software company that does that as well. But I'll just say in general that it is a new area and there's not a lot of action I've seen where there's a lot of different software vendors that are producing avatars for therapy or psychiatry interviews. There are just a handful of those. But I'll leave it at that. But those are available from different software vendor groups. While we're waiting for other Q&A, I invite you to also ask more questions as you like. Some of the other common questions that I've heard about AI and apps are basically revolve around payment. How much does it cost to implement this or how much will it cost me? And I wanted to just kind of share a few thoughts on that. Some of these, I would say most of these solutions now are subscription software. In the past, we would do one-time purchases, one time and you just buy it and use it forever. I'd say most, nearly all software vendors have moved away from that for a variety of reasons. One, with subscription software where you pay per month per user, it's a good barometer of how well a software is doing. If software vendors note that people are using it less or they're unsubscribing, they can change the way they design the software to make it more appealing to use. Also, it's more affordable in many respects. You're only really paying for what you want to use. And if you stop using it for a specific month or you feel like it's not worth your time, then you can unsubscribe from it. There are other things to consider. If there are installation fees, setup fees, that might be important for you because then you want to set yourself up for success. And there are some organizations that may have de-emphasized that and where they implement software and then the software vendor just gives it to them without training, without onboarding whatsoever, and it ends up becoming less optimal for clinical practices. So some of the things you want to just look for are that subscription cost, the total cost of ownership as well when you're making these decisions about evaluating for incorporating into your practice. Now, I'll just share a few more thoughts as we are... Oh, okay. The other question is what AI scribe to use. So that's an excellent question. Thanks for that. Again, depending on your needs, AI scribes, you want to look for the different features. I'll share with you one example. I shared with you the example of how one health system prioritized EHR integration above all else, but they did find errors. They found that it didn't really write notes the way they wanted it to, and I'll just leave it at that. There are some vendors where they will let you customize how your notes are output, and sometimes, sometimes they just give you rudimentary customization, like do you want bulleted points or paragraph format for your summary? So you can just evaluate the vendors that way. The ones that I know of off the top of my head, they're major software vendors. I know Doximity has their own, Abridge was another major vendor, and then another one is Microsoft. Microsoft purchased Nuance, which created Dragon Dictate and Dragon, naturally speaking, and they produced something called DAX Copilot. And then the other things I have noted, NABLA as well is another one. So there's all sorts of ones out there. The way I like to make these decisions, create a spreadsheet, put together all the different vendors' products, and then write out a list of all the things that are important to you. Does it summarize? Does it customize? Those sorts of features, and then rank it based off of what you see in the marketing. You can also go to the individual websites to then check out what kind of demos they'll give you, and they'll give you free demos and looks into the software. And that way you can make a decision too. You can also ask about trial as well. This other question is about sustainability, AI, and I think it's more for energy consumption, and over due to the CPU chips. That's a major concern, and I will share, I confess that I don't know as much about it as I should, but the energy requirements for each prompt does take up a lot does take up a lot of energy. So that's something that we need to be mindful of as we use these. And then finally, I wanted to take this last question before we wrap up. What is the most evidence-based therapy, AI platform, or app? I've not seen barometers of evidence-based, you know, how much more evidence-based are you versus the competitor? I've seen mostly that people are using this in their marketing language, but what things you can sort of ask these app vendors are, how long did you vet your solution? Did you have any psychologists, psychiatrists, or other medical experts involved in the solution? It's very easy for people to claim that they are evidence-based and they're because they just use CBT principles. So those are some of my initial thoughts, but again, it would require us to look through all the different apps out there. You'll see them mentioned in forums as well as conferences such as the APA annual meeting. All right. Well, I wanted to thank you so much for being with us today. There's more to come. So thank you. Next, I'd like to invite you to join our next lecture. This is Integrating Data from Smartphones and Wearables into Psychiatric Clinical Practice with one of my friends and colleagues, Justin Baker, who's an associate professor of psychiatry at the Harvard Medical School and the co-director for the Institute for Technology and Psychiatry at McLean Hospital. See you there.
Video Summary
The video addresses evaluating artificial intelligence (AI), apps, and digital solutions, presented by a CTO of a mental health tech startup and Stanford educator. It introduces the American Psychiatric Association's App Evaluation Model, which was crafted around 2017-2018, providing a structured method to assess digital health technologies. This model comprises several assessment layers: grounding, privacy and safety, evidence and benefit, ease of use, and interoperability. It stresses examining aspects like the business model, data security, and ease of use before adopting tech solutions.<br /><br />Additionally, the video explores AI's evolution, dividing it into three stages: symbolic AI (AI 1.0), machine and deep learning (AI 2.0), and advanced models like those utilized in voice and image recognition systems (AI 3.0). The speaker highlights AI's applications in medicine but notes pitfalls such as data bias, the unpredictability of AI outputs, and legal considerations regarding medical malpractice and data privacy.<br /><br />The presentation also touches on regulatory frameworks and certifications like HITRUST and SOC-2, necessary for ensuring security and privacy compliance. Lastly, the speaker introduces various coalitions and frameworks aimed at governing AI's healthcare use, emphasizing the intensifying role of digital solutions in mental health and the importance of thorough evaluation and compliance.
Keywords
AI evaluation
digital health technologies
App Evaluation Model
mental health tech
symbolic AI
machine learning
data privacy
regulatory compliance
healthcare AI
×
Please select your language
1
English