false
Catalog
Word to the Wise: Informing Clinical Decision Maki ...
View Presentation
View Presentation
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
All right, good morning everyone. Thanks for joining me. I didn't know that it would be such an intimate setting, but welcome anyway and thank you for being here. So maybe to start with a little bit about me. My name is Sunny Tang and I'm an assistant professor of psychiatry at the Zucker Hillside Hospital and the Feinstein Institutes for Medical Research. If I'd known that the lights would be off I would have put a picture of me there as well so you could see what I look like, but that's a up top that's a picture of the behavioral health pavilion where a lot of our inpatient facilities are and both the Feinstein and Zucker Hillside Hospital are a part of Northwell Health which is a large group of hospitals and outpatient centers in the greater New York area. So my work there is centered around technology in psychiatry and particularly in using natural language processing, machine learning, and artificial intelligence tools to better understand what's going on in mental health conditions and hopefully to improve clinical care as well. And my work in the past clinically has been in early psychosis interventions. So psychosis is really near and dear to my heart and we're working on improving outcomes in this area, but currently I'm mostly focusing on research except for a little bit of psychodynamic supervision of residents in case that connects with anybody else here. And this is my son Liam. He's four years old and that was his first poster presentation in preschool this past year. So in terms of my disclosures, I get research funding from the NIH, the Brain and Behavior Research Funding, as well as industry support from Winterlight Labs. I'm a paid consultant for Winterlight in the University of Maryland. I'm an advisor and I hold some equity at SIRIT, and I also am a co-founder and hold equity in North Shore Therapeutics. So of course I need to thank the amazing team that works with me to make all of this possible, including my clinical research coordinators Layli and Sarah at Zucker Hillside, our postdocs Yan and Amir, Arush who's a student who's been with me since high school, some of my mentors Anil and John Kane, as well as people from the University of Pennsylvania, and also you know clinical leaders at my place of work. So I wanted to talk to you about how we take the measure of thoughts and emotions in this field. So I think they are such fascinating constructs and so important to who we are as humans, and yet they all necessarily lie underneath the surface. So in our field we've come up with, I think, some pretty ingenious ways of trying to measure and detect brain and mental functioning so that we can understand some of the mental processes that make up who we are. Scientists in our field use everything from single cell measurements to EEG and MRIs, but so we're going to make this a bit of an interactive presentation, and I was wondering, let me see if I can switch over here, if you all wouldn't mind taking out your phones and perhaps joining us here to tell us, let me see, let me try to switch over and start this poll. I'd like to know, despite the, oh sorry, I think this is the wrong, the wrong set of questions, and we'll get this started. Okay, finally this is the poll that I wanted you to help me with in responding. We should be able to see your responses as they come up. So we were talking, for the people who just walked in, we were talking about how the many ways that we've come up with as a field to take the measure of what's going on in terms of mental processes, and I was wondering if any of the, if anyone has, you know, used any of these in clinical practice or on a, especially on a regular basis. Give it another minute, looks like some of you used, it looks like some popular answers are MRI, EEG, some genotyping, biopsy, really interesting. Yeah, but a good percentage of you, half of you, or a good proportion of you haven't really used any of these advanced techniques that we use for, that we've developed as a field for understanding what's going on in the brain. And that's pretty much what we, I expected, because if you look in our clinics, it's still, the way that we understand what's going on for our patients still looks very similar to what we see in that picture, right? For the most part, whenever we make clinical decisions in terms of diagnosis or treatment response or any of these things, we talk to our patients. We ask them how they're feeling, we ask them to tell us about their experiences, and this works pretty well. We make diagnosis, we're able to come up with treatment plans, and it works for helping people get better. And I think this fact, this fact that despite all the work that we've done in the research area to try to come up with many different ways of understanding what's going on in the mind, still the most practical tool has been through speech and through interpersonal interactions. So that's led to my conceptualization of speech as being a window into understanding what's going on. So in the model here, we can understand the schizophrenia or any other mental health disease processes as happening underneath the surface where we can't see. But that these disease processes will lead to change in thoughts, also which we can't see, which happen underneath the surface. But these thoughts are then reflected in observable changes in speech. So what better way can we understand what's going on than through looking at speech and language and how people communicate? So to give an overview of our talk today, my main goal is to discuss what speech and language biomarkers are and talk about what they are good for, what can they help us with. So we're going to first go over some general approaches to quantifying speech and language, how this work is being done in general, and then I'm going to walk you through some specific examples of natural language processing methods. Hopefully by the end of this talk, you'll understand that even you'll have a bit of a grasp of how these tools work and how they could work. I'll talk about one particular project where we combined multiple different NLP methods to track psychosis symptoms changes. And then we'll just have a short bit of a discussion at the end. So first, how do we measure speech? Well, when we speak, there are physical sound waves that are produced, right, that go through the air as signals to all of you that all of you can receive and decode. And so when we look at speech from a computational method, we can first approach it from this kind of physical direction, so as acoustics and phonetics. And so we can record a patient as they speak to us, or perhaps even real time someday, and based on the physical signals that we get, we can extract computationally quantitative measures for first the tempo and pauses of speech. So how quickly, how much of the time is there sound? How many pauses are there? How long are the pauses? And how frequent are the pauses that are happening in our discourse? And we can also look at the pitch and the vocal quality of the sound waves. So the frequency, the amplitude, how they vary, and also the certain resonances that have to do with the configurations of your vocal production organs. So we can look at how speech sounds, and we can also look at how speech is organized and delivered. So if we take the speech, the sound waves, and then transcribe the speech into text, we can then measure, for example, the quantity of speech in terms of words or sentences, which are sometimes called utterances in this field. We can tag the parts of speech in the speech content and count those up, you know, nouns, verbs, adjectives, all the grammatical terms you learned in middle school that you tried to forget. We can count up disfluencies. Are people saying um, uh, are they repeating themselves? And we can also use more advanced computational methods to take a look at the mean, to mathematically represent the meaning and the organization of speech, which I'm going to go into a little bit more depth later. And finally, we can also look at the content of the speech itself. So we can look at, is it positive or negative sentiment that's being conveyed? We can look at characteristics of the words. So for example, are these, you know, really, are these simple words that a child would use? Man, dog, woman? Or are they really complicated words that are more technical, whether the language is metaphorical or concrete? So these are all the many ways that we can quantify the speech content. And just to give a little bit of terminology in this field, all of these analyses fall under the umbrella of computational linguistics, where we use computational tools to model natural language. So natural language as opposed to computer language. So natural language is anything that anybody would use to communicate human to human. And within that, you have acoustics analysis, or sometimes called signal processing, which looks at the physical sound waves and those characteristics. And also NLP is a term you may have heard of, natural language processing, refers to usually the text content and automated understanding, recognition, or generation of speech. And most of the studies, if you look up any papers in this area, most of them will follow a fairly similar study design. So it always starts with our research participants, or patients, and from these, and normally our interactions with them are recorded. And then from the recordings, we extract those sound waves, the acoustic signals, and then we also generate text-based transcripts. So then these feed into automated analysis pipelines with computer code, and that's outputted as objective speech and language features, which we then, you know, and ultimately the important part is the clinical outcomes of interest. So we then relate the speech measures to clinical measures of interest, like diagnosis, relapse, treatment response, symptom severity, things like that, or functioning, of course. And so, you know, so that summarizes the approaches that we have in general to quantifying speech and language. And next, I'll jump into some specific examples of methods that you'll see in the literature. So first, I'll start with LUKE, which is one of the more simpler and straightforward NLP methods. So LUKE stands for Linguistic Inquiry and Word Count, and it's a dictionary-based approach developed by James Pennebacker in Texas. And so with a dictionary-based approach, you have human, you have, you know, experts who define a specific set of predetermined parameters that we're going to rate words on, and then the words are rated by human raters. So as an example, just some of the parameters included in LUKE are shown here, all of these. So they, you know, Dr. Pennebacker and his colleagues decided that it would be important to look at qualities like whether speech content or text convey analytical thinking or clout or authenticity, whether they seem positive or negative in sentiment. They defined, you know, parameters like lifestyle relatedness, whether words relate to leisure or home or work. They also look at parts of speech, so different kinds of pronouns and other parts of speech and grammatical elements. And of course, you know, of interest to this field is whether or not words relate to psychological processes, so whether words tend to convey a sense of all-or-none thinking, like definitely or like absolutely, or whether they seem to refer to insight, insight-related processes, anxiety, politeness, and many others. And so, you know, if you were to, oh, and you can actually get a demo, which I'll show you as well, so, and it's freely accessible if anybody wanted to play around with it. So I played around with it a little bit before I came here. I took the first couple paragraphs of Freud's interpretation of dreams and plugged it in to LUKE, and you get an output like this. So you get quantitative measurements based on standardized human ratings for the various parameters that LUKE assesses. And you know, you can see that Freud uses a lot of first-person, first-person, personal pronouns, and that there's less emotional tone, less positive tone, and less negative tone than the average written text. But there's a lot of words relating to cognitive processes. And then I threw in the first paragraph or so of Jung's Man and His Symbols, and you see that he actually has no first-person pronouns and uses nothing with, no words with an emotional tone, and also conveys a lot of cognitive processes in his work. So I was wondering if we could put, maybe we could make this, you know, try to dive in a little bit and try maybe some APA Mad Libs, if you guys wouldn't mind logging in and putting in some of your favorite adjectives. And let me see, I should be able to, yep, let me, oh, sorry, I think I need to advance to the next one. I can't, I can't actually see the buttons from where I am. Yeah, okay, so hopefully if you start entering the adjectives, we should see them come up. And if you see some that you like, you can upvote it, maybe enter in the same thing. Is it working for people? It's not working? Oh, there we go. Great. Thank you for watching! All right. Great. I think maybe, well, you can keep going, but I'll take funny, unique, weird, amazing, and phenomenal. Those are great words. And great, too. Okay. So let me switch over here to the Mad Libs part. Okay. So here is the Luke tool, their demo tool that you can access online. And so let me, I'm going to first check this kind of bare bones text that I made up. So I woke up on this day. Now I'm giving a presentation at the APA. And if you analyze that, you can see the text is normalized, the measures are normalized for length. So I'm using a lot of I pronouns. But there's very little emotional content and thought processes involved. But a lot of analytic and authenticness. Great. So if I go back and analyze this text again, so I think I heard phenomenal. I saw phenomenal up there as one of your suggested adjectives. And I think I remember great. And amazing. So let me go back to this. And amazing. Okay. So if we plug in this and analyze it, now you see that the positive tone becomes really high. And this is automatically detected. So you can imagine that if you use this for patient speech, that if you were speaking to someone who was feeling really down, even without an expert clinician trying to interpret the information, we'd be able to automatically pick up some of the tone that's going on compared to someone who's either euthymic or perhaps even manic. All right. Thanks for participating in that. So this just in terms of so now I wanted to look at this kind of a tool that Luke dictionary based NLP method in practice. And this is work that was done by one of my colleagues, Dr. Michael Birnbaum, where he used Facebook data. So Facebook posts and tried to detect relapse and hospitalizations in young people with psychotic disorders. So this study looked at 51 participants between the ages of 15 and 35 with a primary psychotic disorder. And in his study design, he first acquired the patient's health data. So their medical history in terms of when they were hospitalized and when they were relatively clinically stable. And with the participants' consent, they were able to access all of their Facebook posts. So getting the language, the speech, the text that was included in the posts, as well as the frequency and other parameters. And then they determined the timeline of the participants' health records. So identifying re-hospitalizations. And they took the one month before the hospitalization as a critical period of relapse. And all the other times were considered to be relatively healthy and relatively stable periods. And so they used a combination of speech features as well as posting frequency and features like that to detect the periods of relapse. And so here they used, and as a part of that, they used Luke, the tool that I just showed you guys, and collected many of the, I guess the features that I introduced. And they found things like changes in the parts of speech. So differences in the use of second person pronouns as well as indefinite pronouns in the period just leading up to relapse. So people were referring to themselves and others in a different way. And I guess, you know, kind of logically, they also found a decreased use of words relating to friends as well as achievement, certainty, and work. You know, it makes sense in terms of what we see in clinical patterns leading up to relapse. And they also found more negative affect, more use of anxiety-related words as well as swear words. And so combining this with other Facebook-related features, they were able to generate a machine learning model that identified social media patterns that were indicative of relapse happening soon of the month leading up to rehospitalization. So now I wanted to pivot to a different and slightly more complex way of looking at natural language processing, of trying to quantify what's conveyed in speech in a psychiatric context. And here, instead, you know, we're trying to solve one of the limitations of dictionary-type approaches. So, you know, as you can imagine, there are a lot of words in the English language, and it's a lot of work to get human raters to come up with, you know, reliable ratings for all these different dozens and dozens of parameters. And many of these parameters are, by definition, going to be subjective ones, because a word is going to evoke a different feeling for you versus someone else, different memories from person to person. And so, you know, and moreover, you can only rate words that make sense to you as a human, right, along dimensions that make sense to you as a human. However, with machine learning and artificial intelligence, we can get past those barriers by using mathematical embeddings, representations of semantic meaning. So basically, meaning like addresses that are not tied to human interpretations. So here, as an example, if we looked at words like cat, kitten, dog, and houses, and just simplistically assigned ratings along these different parameters, you get kind of an address for what the content of these words are. And if you were to represent those vectors, mathematical meaning vectors, in a graphical format, you can tease apart relationships and meaning, like for example, the fact that cat and kitten are close together, and maybe somewhat related to dog, and that all three of them are, you know, a little bit of a distance away from houses. And similarly, we can use these mathematical embeddings for word meaning content to also represent the relationships between words. So for example, seeing that the relationship between man and woman is recapitulated, sorry, the directionality and the change is recapitulated in the difference between king and queen. So this can provide a really powerful tool for trying to quantify concepts like tangentiality, circumstantiality, derailment, or flight of ideas. Here is an example based on work done in my group. So we, just to illustrate how this works, in some larger language, some real language models that are used in this area in research, we came up with these four simple sentences up top. So the golden crown sat on the head of the king and queen. The silver tiara sat on the heads of the boy and girl. The duke heads to the wooden throne sitting under the clear sky. Oh boy, I really hope that the sentence is clear and concise. And so on the left, we have the results from a latent semantic analysis model, which comes up with these embeddings for words based on whether or not words co-occur in the same documents. So the assumption is that if two words show up really frequently in the same, you know, Wikipedia article, the same newspaper article, et cetera, et cetera, that their meanings will be more closely related than words that don't tend to show up in the same document. So if this model reads millions of documents to then come up with machine learning generated embeddings for the meanings of words. And here you can see that it kind of follows what you would guess intuitively with king and queen being kind of closely related. Boy and girl having some relationships. And also clear being related to concise more than to the other words. And you can also see that, yeah. And then on the right hand side are results from a more modern language model. So latent semantic analysis is from decades ago. Like the 90s or 80s or something like that. And BERT was developed in 2018. And it's a deep learning network. A deep learning language model that also uses language embeddings in its analyses. And you can see that it's different from the latent semantic analysis. But it's also coming up with similar results. So here king is more closely related to boy than to queen. And et cetera. So there's basically different ways of understanding these relationships. So one of the pivotal works in our field using embeddings was completed, was done by Brita Elvivag back in 2007. And I think, I believe, to my knowledge, this was the first paper looking at embeddings in a psychiatric context. And here they had 26 participants with schizophrenia and 25 healthy volunteers. And they used that latent semantic analysis model to generate word embeddings from patient speech. And what they do, did, was that they used windows of different sizes and looked at the cosine distances of the mean embeddings of each of these windows. So as an example, if we have this sentence here and you took a window size of three, you would calculate the average embeddings for the first three words, the quick brown, and you'd calculate the average embeddings for the second three words, fox jumped over, and then you'd get the distance between the two. And that would be one measurement. And then you'd move your window down another three words and do the same thing with the next, you know, pair of three words. And so when they did this and averaged the embeddings across the participant's speech, they found that people with, who had schizophrenia and high thought disorder deviated from the, in terms of their semantic coherence. These are objectively measured semantic coherence using word embeddings. They started to deviate away from people with, either who are healthy controls or people with psychosis who had low thought disorder. So here was the first evidence that we can use objective computational tools to do something like what we do as clinicians, find, you know, assessing that patients are becoming tangential or becoming derailed. And here in our work, in my lab, we wanted to go beyond these, you know, really foundational work by Elvivag et al. So a major limitation of traditional word embeddings from decades ago is that the way that the address is assigned are very concrete. You know, a word is a word. So bank has the same address, whether I'm saying that I'm going to the bank or that I'm taking a stroll by the river bank. And it doesn't incorporate different senses of the word or their structure within the speech. So if I use the word it, for example, and in the course of many sentences, it not only doesn't have just its own meaning, it's also referring to other entities that I've probably already mentioned. So the more modern language models that have come out just in the last few years and have made possible tools like ChatGPT are based on transfer learning and attention, where instead of mapping words in a unidimensional way in terms of their meaning address, instead it's able to dynamically look at the way that individual words relate to other words in the text and come up with the mathematical representations that enable further linguistic analyses. So here are those same four example sentences. And instead, we're going to focus on the words that come up multiple times. So we have heads of the king and queen and the duke heads to the wooden throne. And you can see that with heads, let's see, where is it, okay, here. With the older latent semantic analysis model, it's all ones, it can't tell the difference between heads of the king and queen and the duke heads to the wooden throne. But when you have, when you use BERT, the more modern language model, the large language model with deep neural networks, it can very easily tell the difference between the two heads. It doesn't think that they're very similar at all. And the same thing, you can see the same thing with looking at clear sky versus clear and concise. They're considered identical by the older language model, but if you look at the new one, it can very easily tell those apart. And not only that, but if you look at two different words that are used in similar contexts, so the crown sat on the heads of the king and queen versus the tiara sat on the heads of the boy and girl, BERT does, so here you're using sat in very similar ways, right? And you do get a very high similarity between those two instances of sat. So these are much smarter, more complex language models that are capturing a lot more of the nuances of human language. And in our work, we wanted to explore and really illustrate and to test whether or not this made a difference in terms of detecting differences that are related to mental health, to detecting clinically important differences in speech. So we did a study where we tested out four different language models with GloVe being an older generation model and these three being newer generation models. So GPT-3, for example, is related to the underlying language model for chat GPT, which is based on GPT-3.5. And we processed the language models in different ways, so looking at them verbatim versus filtering out disfluencies versus just looking at content words that are really key to the information content being delivered. And we used various analytical strategies. So we can compare words that are side-by-side or larger windows of words, or we can compare whole sentences or skip a sentence. So there are many different ways we can approach this from an analytical perspective. And the sample that we were looking at was a total of 111 individuals with 37 healthy control participants and 74 people with schizophrenia spectrum disorders. And just to illustrate the power, the scalability of this, how easy this could eventually be to implement in a clinical setting, we used really short samples. So this is based on two-minute responses to open-ended prompts like tell me about yourself or how are things going recently. And from those two minutes, so you don't need to look at all the detailed values here, but the point, but these numbers illustrate the effect sizes comparing people with schizophrenia to the healthy controls. So a larger number or a number that's farther away from zero indicates a stronger effect, a bigger difference between schizophrenia and healthy controls. And on the y-axis, we have the many analytical strategies I briefly discussed. The three panels look at different ways that we cleaned the data. And then within the panels, you see the four language models that we looked at that we tried to compare. And the gist here is that the leftmost model was the older generation model and the other three were the newer generation models. And they are all capable of picking up speech differences in schizophrenia versus healthy controls. But it does look like that additional subtlety and an additional precision offered by the newer large language models based on deep learning, they do seem to confer an advantage in terms of detecting these clinically significant speech differences. The other thing to note is that as you see, the colors are going in different directions. So the language model that you use does seem to make a pretty significant difference. So there's a lot of work to do here in terms of figuring out how to best go, use these kinds of methods. So pivoting again to our third specific example of natural language processing. So far, the methods that we've looked on at are focused on speech content and the meaning of speech, right? So whether it's a dictionary-based model where you're just using representations from human raters or if we're looking at computer-generated embeddings and looking at distances. So this third technique that I'm going to discuss is based on graph theory. And instead of looking at the content, graph theory is really, I just like fine-tuned for representing the organization of speech and trying to quantify it from that perspective. So what is a speech graph? Well, from a graph theoretical approach, a graph in general is defined by nodes and edges. So nodes are entities, and in terms of speech, it's an element of speech, and that could be a word, a sentence, or a part of speech tag. And edges are relationships. And as an example here, we have a participant response to a cookie theft picture description task. And we can generate a word trajectory graph by identifying words as the nodes and identifying sequentialness as the edges. So if a word follows another word, then we'll connect them with an edge. So if you follow it along, you have I, you know, I might use this here so I can see better. No, you can't see that very well. You have I see the cookie jar. I see the cookie jar. And I see a chair, a sink, and a window. So I see a chair, a sink, and a window. And it goes back to I. So I see the kid. And the kid is grabbing the cookie jar. So that's how this whole speech becomes, these sentences become represented in a graph format. And the work here, the pivotal pioneering work here was done by Natalia Mota from Brazil. And this is one of her key papers using this strategy and introducing this strategy. So once you generate these graphs based on speech, you can then mathematically quantify the way that they look. So if you have this speech graph here, for example, you can quantify things like the number of the components. So how many nodes are there? How many edges are there in this speech graph? We can look at the overall organization of this graph. So for example, if you look at the largest connected component, that's the size of the section of the speech graph where every node is connected to another node. Because as you can imagine, bringing it back to clinical understandings, if you have somebody who's really distractible or who's jumping from one completely different topic to another, you may end up with multiple unconnected speech graphs rather than someone who's speaking the same amount but on a single subject. Then you'll have a larger connected component in that graph. And a larger strongly connected component is similar but just where the nodes are all connected in a bidirectional way. And you can also discover patterns in speech organization. For example, when people are repetitive or perseverative, you'll get loops where the nodes are connected to themselves or in a sequence. And you can also quantify the number of repeated edges and parallel edges. So when people are perhaps, again, looking at perseveration and repeated patterns in speech. And what's really interesting and kind of evokes the psychiatry of perhaps a century ago is that Natalia and her colleagues found that dream reports were the most informative speech content that they were able to discover. So here they had people with different psychiatric disorders tell about a dream that they had recently or a recurrent dream that they experienced. And they looked at the content that's generated versus asking people to tell them about their days. I think it was like, tell me about yesterday or something like that. And you get much more elaborate and also much more different speech patterns when you ask people to talk about their dreams than when you ask them to talk about their daily lives. So really interesting that our psychiatric forefathers may have been onto something with dream analysis, I think. And they found that from using machine learning models, the dream reports were able to predict the different diagnostic categories. So another way of constructing speech graphs is not by the sequence of words, words following other words, but by how the words are acting on each other. So our group developed a semantic speech graph where words are connected to each other by their actions on other words. So for example, so who is doing what to whom, basically? Because I can tell you that, let's see, that the boy threw the ball or the ball was thrown by the boy. So in those two example sentences, you have opposite sequences, but the content is the same, right? The same actor is acting on the same thing with the same action. So if we use speech graphs from using, if we use semantic connections to represent speech graphs, perhaps this is a better way of understanding the organization of ideas rather than the organization of words, per se. So this is from work that Amir Nixad, who's a postdoc in my lab did. Oh, and he's applying to psychiatry residencies next year. So if you see him, come across. Give him a, please give him an interview and a second look. So, and here, so Amir, you know, separated different graph features in terms of the features that look at the size, so the amount of speech that's generated, the connectedness of speech, how ideas are interrelated, and also the organization. So whether or not speech graphs looked different from randomly generated graphs. And, you know, again, there's a lot of data here, but on the left-hand side, these are all the features that were generated using the semantic speech graphs that we developed. And on the right-hand side, we were using the sequential graph, the word trajectory graphs I showed you in the beginning, where it's just based on word sequence. And on the y-axis are various different clinical measures of importance, like disorganization, speech poverty, you know, psychosis severity, anxiety, withdrawal, asociality, et cetera. And if you just zoom out, you see that in general, the semantically generated speech graphs seem to confer a certain advantage in terms of relation to these clinically meaningful outcomes. So we think that this is a promising method moving forward. And we further developed this method to be able to quantify additional speech relationships, so not only who's acting on whom, but also identifying relationships like setting. So the boy threw the ball in the park, or timing, so the boy threw the ball in the park after dinner, et cetera. So being able to identify all these various ways that ideas can be related to one another. So we did put our tool up. It's publicly available for you to test out. And I thought that we could do that together if I can manage to switch screens and all of that. So let me, excuse me. Okay, let me change back to this screen. Okay, and let me pull up this one here. Okay, great, and let me move on to the next poll. So I was wondering if you guys wouldn't mind participating in a group free association and list some words that come to mind, and we'll make a passage out of this together. Okay. Great, this is great. So let me pull up our speech graph example and let's make something up here. The top words are coffee, tired, morning, travel. So I was tired in the morning and really wanted to pick up some coffee before my travels. And then we also have science, language, home, cat, learning. You know I am traveling. My travels are finally taking me home to my cat and the cat likes being at home and meowing, but I can't understand his language. The spelling here. All right, so we'll pick these are different semantic role labelers that exist and here we can look at different types of relationships. So if we look at predication and action, so that's like who's doing what to whom, we can then generate a graph based on our group free association. And you see all the things that I am doing like picking up coffee, travels, wanting, understanding, and you know coffee being picked up. So you're seeing those relationships there and here you're looking at the various different graph metrics that can be generated. So let me actually add something completely unrelated. So let me use fingers fascinating shuttle. The fingers on the shuttle fascinate the bowl. Let me see what happens if this happens. Hopefully, I think that might generate a separate graph. But no, it looks like, oh yeah, I think it is. So they're superimposed on top of each other, but you can see that that's that last sentence becomes separated from the first. So you can tell, so you can see how we might be able to quantify the disconnectedness of speech. And if we instead looking at, let's start, instead if we want to look at setting, see what happens here. So we can see, you know, so I was tired in the morning. So you have a connection between being and morning, between cat and taking and finally, and between travels and picking. So anyway, so this is a tool you have the address and you can play around with that. And let's see, let me go back into our other mode and move back to the presentation. Aha, there we go, okay. All right, so to summarize, we first went over a bit of what natural language processing means in the psychiatric area and the general ways that we're looking at this as a tool. And then we went through three specific examples of how natural language of processing methods for a dictionary-based method, an embedding-based method, as well as a graph analysis method for different kinds of natural language processing methods. And so now let's go through one study where we took a lot of these kinds of features and tried to look at psychosis symptoms as in the context of treatment response. So for this study it was a longitudinal study with four sessions. We got people with psychotic disorders as they were admitted for an acute inpatient admission. We assessed them again around the time of discharge and then around three months later and then around six months later. And we got a bit of a drop-off. It turns out it's hard to follow people who are acutely psychotic after they leave the hospital. But anyhow, so we first wanted to look at just this first time two time points where the first time point is around admission and the second one is around discharge. And so here we had 54 people who we assessed at both time points and we were, you know, we were trying to get younger patients because we thought there might be more changes in their psychosis symptoms. We got the, you know, as is common for the psychosis population, they were predominantly male. We come from a, we're located in a racially diverse area, so our participants reflected that. The majority of people had schizophrenia spectrum disorders. Again, by design we were focusing on that with three people who had bipolar 1 with psychotic features. And the mean follow-up was two weeks. We didn't want too much variability, so we did, as many of you might know, you know, discharge can take forever sometimes for reasons that are not related to clinical symptoms. So we restricted the follow-up period to one to three weeks. And here, for some reason my, not all the data is showing up, so that these should be spaghetti plots where you're seeing individual changes in symptoms from time one to time two. And for, and it's what you would expect intuitively. So positive symptoms generally went down, and we looked at thought disorder as well, which also went down. And negative symptoms were kind of the same, or maybe even increased a little bit from admission to discharge in an inpatient setting. And what you can't see here are, is all the individual variability. So if you saw that, you would see that, you know, many, some patients improved a lot, some people improved only a little bit, whereas a small number of people got worse, actually, from time one to time two. And they weren't telling their inpatient teams because they wanted to go home. And the data that we collected was from a range of unstructured to structured tasks. So we asked them, you know, those open-ended questions again, like, tell me about yourself, how have you been feeling recently? We also asked them to describe pictures. Now tell me what you see here. Describe this picture for me. We asked them to provide fluency, do fluency tasks. Name as many animals as you can think of in one minute. Tell me as many words as you can think of that begin with the letter F. And then we also had a standard paragraph where it was just the same task, and we asked people to read that aloud. And we used the approach that I discussed earlier in this talk, and we got features, various types of features from these different tasks. And I should mention that this study was done in partnership with Winter Light Labs, and these features were generated from their pipeline and not from mine. So then we explored different ways of making sense of the data. So we actually started out with thousands of features. So that's one of the main research limitations in this field, is that it's so easy to get so many different kinds of objective readings on the speech. So, you know, I think their pipeline at the time generated 612 features for every single task. So you multiply it together, I think there were eight tasks, and you get several thousand features. If you drop the ones that don't make sense, like for example it doesn't make sense to look at parts of speech for a paragraph reading task because it's all the same, you still get several thousand. And so, you know, we had to do a lot of feature filtering, and this is an area that we're still trying to figure out, work out how to do this the best way. But anyway, so here we also explored a few dimensionality reduction techniques. So either looking at a principal component analysis, or t-SNE, which is an AI version of PCA. So reducing lots and lots of features, so 511 in this case, to a much smaller number of variables to look at. And for the PCA, we derived a three component solution, a nine component solution, and a hundred and one component solution, based on different quantitative criteria. And so our main statistical analysis was using a linear mixed model. So the clinical question that we were trying to answer was, do changes in the speech features predict changes in the psychosis symptoms? So can we use this, can we use speech as almost like a vital sign for psychosis? Can we objectively measure how people are doing, and how they're responding? So it turns out that our three component solution worked particularly well in this context. So one of the components significantly tracked changes in positive symptoms, one significantly tracked changes in negative symptoms, and the other one just significantly tracked changes in thought disorder. So to back up a little bit, our second component decreased as psychosis positive symptoms increased. And this component was related a lot to pronoun uses, usage, and syntax, and the nature of the words that were used. That, I think I've heard that described or interpreted as being that people who are acutely psychotic become internally focused, and so that's reflected in their pronoun usage. But I'm not sure if I, you know, I think we still need to do some work to fully understand what's going on there. For the component that was related to negative symptoms, that seemed to be fairly easy to interpret. A lot of the features had to do with, you know, the quantity of entities that they were able to name in the picture description. So how rich of a description they were able to generate for the given picture, how many, you know, words they were able to speak in the fluency task, and also the number and the length of pauses. So people with negative, more negative symptoms had more pauses and longer pauses. Pretty intuitive and, you know, face valid on the surface. And then in a thought disorder, a lot of the features that came up were related to those semantic embedding tools that I described earlier. So those measures that are related to looking at the word embeddings or sentence embeddings and looking at distances from successive words or sentences or distances from each sentence to the centroid or the overall summary of the discourse. So those came up for thought disorder, and thought disorder was also related in the opposite direction to the one that was related to negative symptoms. So, you know, also making sense. And some of the other feature selection methods we tried came up with some that clinically are statistically significant results as well. So just to me showing that this isn't a total fluke, that there really is something going on. So just to wrap up, you know, we've talked about the, we started out going over the general framework of how this work is being done and what we're interested in in this field. We went over some specific examples, hopefully providing you with a little bit of a nitty-gritty understanding of how NLP works and what we're looking at when we're talking about NLP. And then we talked about a project where we saw how many of these different techniques can be combined together to try to achieve a clinical goal. So I think in general, I conclude or I feel strongly that speech and language features are really promising biomarkers for psychosis. And I think that some really key clinical applications are that they can be a real step forward in terms of measurement-based care, where they can be a quick and easy and low-cost intervention that every patient does, just like vital signs. When they come in to the hospital, you know, they're checked every day and you get a quantitative reading for how they're doing, so that when there's a shift change or when there's a change of care, there's an objective reading for how patients have been doing. And there's an objective number that we can look at when we think about whether people are ready for discharge, whether we should increase the dose of a medication, or we should just wait because they're already starting to respond. I think that speech and language biomarkers give us the promise of personalized medicine in psychiatry, because we might be, because there could potentially be speech signals that tell us whether or not we should be starting an early psychosis patient on Clozapine or Abilify, you know, because they might have different brain features that produce different patterns in their speech. We can hopefully detect early on whether people are progressing to psychosis, whether they have a risk of relapse, and hopefully if we do a better job of knowing when people are responding and when they aren't, hopefully that means that we can also decrease the medication burden, because we won't, you know, we'll be able to see that people are getting better and we won't have to add, keep adding the doses when we can't, you know, sensitively figure out what's going on for them. In terms of future directions, feature and task selection, so figuring out exactly, there are so many different ways that we can objectively quantify speech. Figuring out what are the best ways and what are the best ways of eliciting speech is a major goal, I think, for people working in my area, and I would also like to see whether or not we can predict outcomes, so if you have a first episode patient, you know, is this someone who's going to be stable and do well, or is this someone who's going to struggle and have an experience relapse, and certainly that there are ethical concerns in terms of bias and also thinking about, you know, the things that we really do want to know versus the things where we really still want to keep it on a more interpersonal and human level, and where AI maybe isn't necessary or helpful. So that's it, so we're ending a little bit early, but leaving some time for questions, so this is, you can follow me on Twitter if you are interested in updates from my lab, so thank you. Thank you for the talk, it was very interesting. I'm a resident, fourth-year resident in Italy, and I've just started in the last few months recruiting patients to try to do, analyze speech biomarkers, and I've noticed there are a couple of things in your talk that made me think of a few problems that we've encountered. The first is you, in the LPOP study, which seems very interesting, and we're gonna look at the publications on it, it's always, in our experience, it's always complicated to get consent in inpatient units because, of course, of the acute psychosis situation. So we usually start out in the outpatient unit, and I wonder if you had any thoughts on that. And the other thing I've noticed is the big interference that medication have on biomarkers. I even had a patient who explicitly told me, I was about to ask him to do a verbal fluency task, and he explicitly told me, they gave me five extra drops of Heloperidol this morning, so I won't do that well. And I feel like that has a significant effect and I wonder if there are, if you know of studies that analyze that bias, because I feel like it's really interesting work, but these little things, they're kind of, they're something that I feel like I should take into consideration. Yeah, absolutely, thank you. So just to summarize, I think the first question was about, the first question was about consent and on the inpatient unit, and the second one was about medication effects. Yeah, absolutely. So consenting-wise, we did have an issue with not only just ability to consent, but safety reasons, because our research coordinators have to be alone in a room to be able to get their research samples. We weren't always able to get people on the day of admission, or sometimes it took like five days or a week for research purposes to be able to access the participant. And to be able to collect the data. So yeah, that is a concern. Our IRB did allow us to have a consent quiz, a fairly simple one, you know, do you understand this is voluntary, like do you understand that you can withdraw at any time, et cetera, et cetera. So we found that to be a fairly, like IRB-wise, they were able to consider that an ethical screening tool for people who are able to provide consent. And in terms of your second question, that's a great question. Actually, I worked on looking at medication effects with Michael Spilka. So we actually have a symposium later today, also in a similar area. There's a tiny bit of overlap, but most of the material is actually new. So Michael and I looked at the relationship, so we thought that negative symptoms in terms of psychosis were probably the ones that, and the vocal speech biomarkers, like the sound and the pausing and all the tempo and those kinds of features, were probably the most likely to have interactions with medication effects. And so we did a mediation analysis with a Bayesian analytical approach, which I think Michael can explain better than I can. But we looked, so what you find is that medication, oh, and we also looked at EPS, so extrapyramidal symptoms that may result from the medications. So we found that psychosis, negative symptom severity, was related to medication use. Yeah, we kind of already knew that. And we also found that negative symptoms were related to, were also related to extrapyramidal symptoms. There was a relationship there. And we found that speech was related to negative symptoms, which we hoped for, and we've seen in multiple other studies. But what was promising was that for the most part, the relationship between speech and psychosis was not affected by the relationship between psychosis and medications, as well as extrapyramidal symptoms. So does that make sense? So medications definitely produce a side effect, and you see that reflected in terms of clinical symptoms ratings. But speech can, speech seems to be able to pick up those variations in symptoms without being affected in an independent way by medications. Yeah, so thank you. This is certainly such an interesting area, and thank you so much for a great presentation. My question is about other disorders, like schizophrenia used to be called dementia precox, and for a reason. And so the speech and language biomarkers and measurements, I wonder if they apply to, and there may be some overlap for people with dementia, Alzheimer's, that's one condition. But also, on the other hand, the developmental disorders of language, receptive and expressive language disorders in pediatric population, there could be some overlap, and I wish to understand that part of you. Yeah, absolutely, thank you. Yes, no, this is, looking at language in schizophrenia and mental health is a part of a much larger, I guess, effort to use these biomarkers in many other neuropsychiatric areas. So there's a lot of active research going on in neurodegenerative conditions, not only Alzheimer's disease, but Parkinson's disease, as you can imagine, and also language-heavy conditions like primary progressive aphasia. So we do see promising results in terms of being able to detect, I believe I saw something in a pharmaceutical trial where the objective speech and language biomarkers were able to more sensitively detect the effect of a trial agent compared to traditional ratings, clinical ratings and assessments of disease progression. And so once we have that knowledge, it begs the question, are we able to then use this to detect early, to better detect MCI, mild cognitive impairments and risk for dementia? On the pediatric side, there's a lot of work being done in autism. So as you can imagine, there's some diagnostic ambiguity with young children, whether or not they have autism spectrum disorder or something else. So this is being, speech and language biomarkers are being employed there as well. And I actually have a grant under review at the moment where we're using, along a similar line, where we're using these tools to look at delirium in older adults in a medical setting. So I don't know if any of you have had similar consult experiences like I did as a resident, but you're constantly being called by our healthcare colleagues to diagnose and evaluate delirium. And they don't do a good job. So the statistic is that 75% of delirium diagnoses are missed in a clinical setting. So we have some pilot results showing that these tools can be helpful in that context as well. Yeah. On that note, we are running a study on mania and looking at speech and vocal biomarkers. And so I think any illness that has a mental process probably can, has some impact on the speech and language. Yeah. So, and where are you, which institution are you from? I'm a resident at Mayo. Yeah. My question, you spent a lot of time on the natural language processing. I wonder if you could talk a little bit more about the acoustic side. Is that, is that a primary focus or is that kind of, you know, next steps or what's going on with kind of the acoustic and raw audio file? Yeah, thank you. I've been, I have been focused on the NLP side. My career development grant is more in that angle, I guess, and using the deep learning and large language models. So in my lab, we've used more of like the, the standard acoustic analyses that are out there. So, and for, so we're not really innovating, I guess, but we are looking at the features and combining them with others to do our clinical prediction models. Winter Light Labs, who, you know, founded, who funded the study or who are my collaborators for the study I presented, they do have many acoustic variables relating to a lot of the dimensions I mentioned, like tempo, pauses, and tone. They do come up a lot, you know, certainly for negative symptoms, as I was mentioning, we see a lot of really strong signals in terms of pauses, pause duration, particularly. So yeah, they're interesting. I think in terms of like innovate, you know, like really new stuff that's coming out, I think there's a group in Maryland that's able to, that found that depression was related to how much blending there was in the acoustic signals of words. So when you speak, you don't say one word, stop, and speak the next. You actually, the sounds actually blend together, and they found a really significant effect with how the degree to which that was happening and depression severity. So in the field, there's definitely still innovation going on in the acoustic front as well. Yeah. Up until now, clinicians have believed that they can tell whether somebody is thought disordered or not. We listen, and we can think, oh, that's very thought disordered, mildly thought disordered. I think compared to my trainees, I can pick up subtle thought disorder in a way that they can't. That's a comforting belief, but it could be challenged whether that's true. I imagine if you got a group of senior clinicians and asked them to rate people on their degree of thought disorder, they could hopefully have some reasonable agreement. How does that compare with your findings on the computerized AI evaluation? Because it seems to me that some of these studies don't make any assumptions about where the clinicians think this person is thought disordered. What do you, what does the program, what does the algorithm generate? Are the two linked? Yeah, are the two things, like is clinical impression linked to NLP results? Yeah, yeah. So, I mean, I completely agree that this is, I mean, I think this whole line of research is based on my clinical training and what I've observed in that setting. So I absolutely believe that experienced clinicians can pick up a lot of really subtle signs that are there. I think there's probably like, which, there's like probably a Venn diagram of what clinicians are able to pick up that computational tools can't at the moment. And also perhaps a computational features that clinicians can't pick up at the same time, right? Because you're, a clinician might have a general sense of what's going on, but you're not gonna know if somebody spoke 213 words or 217. So little, really subtle things like that, or if their pitch is this versus this. And while, and I think in clinician-rated dimensions in general, a really serious limitation is whether, like while many people will be able to pick things up, what you pick up will be different from what another clinician picks up. And there's not a really great way of great language to connect what's going on, not to mention the years and years of training and experience that it takes to even get there. I don't know if you, many of you agree, but for me, when I've trained medical students and residents, I found that actually specifically thought disorder is one of the hardest things for them to learn. And the ones, you know, something that really takes a lot of experience to convey. And furthermore, we know that, you know, clinicians change. So if thinking about the trajectory of care of an individual patient, they may present actually to primary care first, as is often the case. And then they might be in the emergency room and see a different physician. And then they, or a nurse practitioner or a PA, right? Who don't have residencies in psychiatry. They then show up in an inpatient unit where there's residents changing clerkships and, you know, or changing rotations and different attendings. And then there's someone different on the weekend. And then they are discharged and they see someone completely new in the outpatient sector, and who may, you know, who in some cases may be able to see the patient, you know, who in some cases may be able to see them often, but other times see their patients every three months. So, you know, I can't really accurately remember what somebody sounded like three months ago. So these are the kinds of things that I'm hoping that speech and language features can help. Hi, thank you for such an awesome presentation. I wrote coffee in the free association thing. I just wanted to admit. Oh, yeah. But I was wondering if there could be any clinical utility in this in kind of figuring out what therapeutic targets might be helpful. Like if you were to kind of put someone's speech through this and make one of those maps. So I've noticed that, you know, when you speak with a patient who's very disorganized, it can be, you can hear them as they're talking. And then as you go to tell someone else what they said, it doesn't hang together even enough to repeat it. And so I think sometimes we lose some subtlety and like, you know, people are trying to tell us what's wrong, but it just is all abstracted. And I wonder if this could help get through, you know, we listen, you know, the normal way, but a computer can be like, here are the main points. Talk about this. I'm curious what you think about that. Yeah, that's a really great idea. So I think that could really be very helpful. There are some related efforts going on. So people have developed some NLP tools to use and to deploy in psychotherapy in settings. And they may have something like that, like a topic, you know, puller outer. But some of the other things that have been done is looking at psychotherapy training and fidelity to specific models, mostly cognitive models. So they look at, so they're able to automatically detect how much a session adhere to a specific format. We did some work looking at psychotherapy, but from another different perspective. So kind of looking at the specific targets. So we had, it was a small sample. We only had five participants, but 25 sessions each. And these were people with psychosis undergoing a merit. It's a metacognitive therapy. So where the targets are really a more coherent understanding of the self and relation to the world. And so we were able to, so we looked at how people refer to themselves and whether they, versus others, and whether they talked about themselves in a passive way. So things being done to them versus they acting on the world in some way. And we were able to track some trends in these kinds of features over the course of the therapy. So perhaps, going back to your question, perhaps we were able to see that we would be able to see whether what you are, it doesn't perhaps suggest exactly what you should do, but it tells you if what you're doing is working, hopefully. And if you're hitting what you're, the target, the psychotherapy targets that you're looking for. Thank you. Great. So thank you all for coming and being engaged. Thank you for all the lovely questions. And yeah, that's all.
Video Summary
In a comprehensive talk by Sunny Tang, an assistant professor of psychiatry specializing in technology's role in mental health at Zucker Hillside Hospital, several innovative methods for measuring speech and language were discussed. These methods are crucial in diagnosing mental health conditions, with a particular emphasis on psychosis. Tang detailed the intricacies of employing natural language processing (NLP), artificial intelligence, and machine learning tools to dissect speech patterns related to psychiatric conditions.<br /><br />The talk covered various means of quantifying speech, including acoustics, phonetics, and syntax. Tang presented methods like the Linguistic Inquiry and Word Count (LIWC), which employs human-rated dictionaries to assess speech for emotional and cognitive attributes. She also highlighted advancements in NLP that employ embeddings—a machine-learning technique for understanding language nuances beyond human interpretation.<br /><br />Moreover, Tang delved into using semantic embeddings for identifying speech disorganization patterns, representing thought processes through computational models. Case studies showcased how speech biomarkers could predict clinical outcomes like relapse in psychosis, emphasizing the potential for speech analysis in early detection and personalized treatment strategies.<br /><br />Further discussions included the challenges of incorporating medication effects into speech biomarker studies and the ethical considerations of deploying AI in clinical settings. Tang concluded by underscoring the promising future of speech and language biomarkers in mental health care, offering insights into personalized medicine, measurement-based care, and possible applications across various psychiatric and neurodegenerative disorders.
Keywords
Sunny Tang
psychiatry
mental health
psychosis
natural language processing
artificial intelligence
machine learning
speech biomarkers
semantic embeddings
personalized treatment
ethical considerations
×
Please select your language
1
English