false
Catalog
Diagnosing Schizophrenia in the 21st Century: Natu ...
View Presentation
View Presentation
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Good afternoon, ladies and gentlemen, and welcome to our afternoon session here today. It is my utmost privilege to moderate, chair today's session with my esteemed colleagues here today. Today's session will be interactive, and if you haven't already done so, I'd please ask you to get involved by either scanning the QR code or going to slido.com and using the code over there so that you can participate in today's session as we'll try to have some of it interactive using the Slido medium for Q&As both during and at the end during our Q&A. So without any further ado, I'd really like to welcome everyone today to our presentation. So you're correct, you're here at the right session. Biomarker Diagnosing Schizophrenia in the 21st Century, Natural Language Processing as an Emerging Biomarker, and this is really an exciting session that we've really seen a lot of research coming forward over the last number of years, and we really are so fortunate to have such an esteemed panel of really giants in this field today. So our outline for today is to, one, provide an introduction to computational speech and language analysis, which will be provided by Dr. Sunny Tang. We'll then shift towards applying natural language processing to speech samples from individuals with severe mental illness by Dr. Justin Baker, and finally we'll look at applications, barriers, and future directions for computational speech analysis in schizophrenia led by Dr. Michael Spilka. I'll provide a bit of a discussion at the end, and it's my pleasure to be your moderator. My name is Dr. Marlon Danilewicz. Just to take an opportunity to introduce our speakers and their disclosures, each one will subsequently at their respective session outline it, but I'd really like to warmly welcome Dr. Sunny Tang, Dr. Justin Baker, Dr. Michael Spilka, and you're stuck with me. And so below, at least for myself, are my disclosures. I have personal fees from a variety of sources, including ACI, Otsuka, Lundbeck, Winterlight, and the Ontario Brain Institute. I'm the conference chair for the Canadian Psychiatric Association, and receive a stipend for that, as well as some support from my hospital, where I'm the associate medical director at Ontario Shores in the greater Toronto area, and I've worked at doing some consulting for Winterlight related to this work today. So without any further ado, I'd really like to welcome Dr. Sunny Tang. Thank you. Thanks so much, Marlon, and thank you everyone for being here for, I believe it's the last session of the meeting, right? So thanks for coming. My name is Sunny Tang, and I'm an assistant professor of psychiatry at the Zucker Hillside Hospital and the Feinstein Institutes of Medical Research, both of which are affiliated with Northwell Health, which is a large healthcare provider in the New York and metropolitan area. So I think our disclosures came up already, but my research funding comes from the NIH, as well as the Brain and Behavior Research Foundation, and I collaborate with Winterlight Labs on some of my projects as well. And I'm a consultant for Winterlight Labs, the University of Maryland, and I am a co-founder of North Shore Therapeutics, and hold some equity there. So for me, one of the reasons that I was drawn to psychiatry as a medical student was because I felt like this field got to delve into the thoughts and the emotions of humans, which to me was really engrossing and perhaps represented the core of our humanity. So I feel like it's a privilege to be able to work in this area and to think about these themes and these things that make up who we are. But the trouble with that is that it's really hard to measure, and it's really hard to see what goes on inside of people's minds and in their internal mental processes. So speech, however, has been recognized from the beginning as a hallmark of psychiatric disorders. So one of the early founders of our field, or one of the major figures of our field, gave this quote. So, incoherence of the train of thought is usually distinctly noticeable in the conversation of patients. Actually, I can go over to slide out, look at it here, and I will. Arrow? Wait, no, it worked, I think. It worked, we're good. Yeah. And so, incoherence is usually distinctly noticeable in the conversation of the patients. The most different ideas follow one another with most bewildering want of connection, even when the patients are quite quiet. So I wanted to get your thoughts, your guesses on who may have spoken this, or written it. All right, so one more second. And so the audience is correct. It was in Diehl, indeed, Emil Kraepelin. So the quote comes from his work on dementia praecosts and paraphernalia, which is often recognized as one of the first descriptions of schizophrenia in modern psychiatry. And over on the right-hand side, you can see a writing sample from one of the patients that Dr. Kraepelin was following. And I just thought that even though it's in German, you can immediately, for any of the clinicians here, you can immediately see the disorganization and the thought disorder that's going on, right? The words turn into figures and numbers and abstract squiggles at the end, kind of emblematic of the changes in thought patterns in this disorder. So for me, speech is not just a symptom of psychiatric disorders, but also a biomarker of what's going on. So the idea is that there are underlying disease processes which then correspond to changes in thoughts, all of which is happening beneath the surface internally, which we can't observe directly, but that these are then reflected in observable changes in speech, which we can then measure readily and easily. And people will just give it away for free. So another audience poll, just to illustrate the amount of information that we're able to glean from language and from text, here's a psychotherapy transcript. And I'd like for you to think about the kinds of impressions and inferences that you can make about both the patient and the therapist. So here it goes. Good morning. How are you feeling today? I'm OK, I guess. Some days are better than others. I understand. Can you tell me what's been on your mind lately? Well, I still have these thoughts and feelings that are difficult to shake. They just won't go away. Can you tell me more about these thoughts and feelings? They're just so negative. And it's hard to see things clearly sometimes. I feel like I'm stuck in the cycle. That sounds tough. It's not uncommon for people to have thoughts and feelings like that. Do you have any theories about where they may be coming from? I don't know. Sometimes I think it's just me being weak. But other times, I feel like there's something more to it. So moving back to our online poll, so this is a single word responses. And think of just what kind of feelings or impressions did this invoke in you about the patient? It's multiple submissions, so write as many as you like. Fantastic. So I'm seeing sad coming up a lot, coherent, unsettled, insecure, depression, vague, pleasant, ruminating, insecure, getting upvoted, some more, supportive. All right, give it one more second. I think we'll move on to talking about the therapist. I see preoccupied. Isn't that amazing? All the impressions and all the information that we're able to glean from just looking at a few lines. And so moving on, sorry to anybody who was in the middle of that, but any thoughts about the therapist as well? Like where do you think they were coming from or did you have any thoughts about what they were what they were going after? Yeah, supportive, calm, neutral, caring, empathic. Hunting for symptoms, gentle, informative, interested, validating, reflecting, chatbot, trying to listen, answers, superficial, fascinating. Okay, great. So let's switch back here. So hats off to the person who said chatbot because that transcript was actually generated by chatGPT just to illustrate another dimension of language, right, and AI. So I asked chatGPT, please generate a psychotherapy transcript between a psychodynamic therapist and a patient who is recovering from psychosis and still has some symptoms of depression and don't be too obvious about it. I found that I had to add the don't be too obvious about it because if I just had the first part I was getting transcripts like, gee, like, you know, how are you feeling today? My voices have gotten better but I'm still feeling depressed. And then like, you know, why don't we help you get better by seeing how this relates to themes from your childhood or something like that. And it was too easy for you guys. But for anybody who actually, for some of us, those of us who do language work and read a lot of transcripts, there were a couple of tells that the transcript wasn't natural because in natural speech people actually have a lot more disfluencies than what you saw in the transcript there. People will start words, switch to something else, they'll start sentences and switch to something else as well. So when we speak, sorry if anybody attended my session earlier this morning, there's just a couple slides of overlap, but I wanted to give some background on how we do computational speech analysis. And so when we speak we generate physical sound waves, so just signals that are propagated through space. And we can measure this by first looking at the acoustics and the phonetics of speech, so properties of the sound waves, how much how much pausing is there, how much speaking is there, the rate, also the pitch, the volume, and the variations, and all these factors. We can look at also the organization and the delivery of speech, so how how much speech there is, the parts of speech that are used, disfluencies like I mentioned, and also more complex features looking at the how meaning is related to each other, as well as graphical representations of speech. And then we can also look at the content of speech, so the actual, you know, words that are being expressed. Are these simple words, big words, are they positive in sentiment, negative, are they concrete, are they metaphorical? So there's multiple modalities through which we can approach computational measurement in speech, and all of this is encompassed in computational linguistics. The idea of computational modeling of natural language, as in languages that people speak to one another, in contrast to machine language. And for the sound signals we look at acoustics analysis, and for text we can look at natural language processing, or NLP. So many of the studies in this area follow a common study design, and you'll hear about some of those studies later on from my colleagues here, where we'll capture participants speech, usually through a recording, and that's then broken down into the sounds aspect, so the sound signals itself, as well as the content of the speech, which is captured in transcripts. And this then undergoes automated analysis through computer codes, which produces the objective speech and language features that we're interested in looking at, and interested in relating to clinical measures of importance. So many of the, much of the work in this field has looked at diagnostic classification, trying to predict symptom of treatment outcomes, like relapse, or treatment response, as well as trying to model symptom severity. And it's also, you'll hear about later, that these kinds of speech signals can also be integrated with other modalities of measurement. So we can look at video, or faces, and look at people's facial expressions, also from a computational perspective, look at their body movements, as well as capture other signals from the brain. So one of the pivotal works in this area was from Elbevag et al., and her work was in quantifying incoherence in speech. And her approach was to use word embeddings to objectively measure incoherence. So word embeddings are numerical addresses that represent the content, or the meaning, of specific words. And if we look to the right-hand side, for a relatively simple example, if we were to take cat, kitten, dog, and houses, and assign them values along these seven dimensions on the top, and if you were to then collapse those dimensions so that we can look at it in a graphical format, you're able to tease apart relationships, like for example, the fact that cat is more closely related to kitten, and they're both sort of related to dog, and all of these are related but not so much to houses. Sometimes you can also mathematically or quantitatively represent the nature of the relationships between words. So for example, the relationship between man and woman is recapitulated in the relationship between king and queen, using this kind of a word embedding approach. And so in this really breakthrough paper from 2007, Elvivag used this approach to quantify the meanings of successive windows of words in the speech of people who are healthy controls, as well as people with schizophrenia who either had high degrees of thought disorder or low degrees of thought disorder. So in the example below, if you were to look at this sentence in a window of three, you would take the mean embedding from the first three words and compare it to the second three words, and then you moved on and on and on and on and get those values. And so what Elvivag et al found was that for increasing window sizes, people with high thought disorder with schizophrenia started to deviate away from the language, these objective measures of coherence in healthy control individuals and people with low thought disorder. Another pivotal work comes from Cheryl Corcoran and her colleague Giger Mocheci from IBM Research. So they used those semantic embeddings, those coherence metrics that I described, as well as syntactical features relating to different kinds of parts of speech to predict transition to psychosis among people with clinical high risk, you know, youths with clinical high risk. And in a really impressive result, so this was a two-site study, one sample was from New York, I believe it was from Mount Sinai in Columbia, and they used that sample to train their model and was able to achieve an AUC of 0.87. And then they cross-validated these predictions in a separate sample from Carrie Bearden's group, I think in California, in UCLA, SD, something like that, and they were able to also achieve an impressive accuracy of 0.72. So this is the taking speech when people are at risk for psychosis and predicting who was going to develop a psychotic disorder and who was not. And so moving on to a pilot study that I did a couple of years ago, this is an example of how we can use NLP to classify schizophrenia diagnosis. And this is from a paper from, I think, 2020 now, and it's from work I started doing when I was at the University of Pennsylvania. And we had a small sample that was collected there of 11 healthy control participants as well as 20 people with schizophrenia spectrum disorders. And of note, so it's a small group, and of note, the people with schizophrenia were not enriched for people with noticeable thought disorder symptoms. So they were generally like stable, you know, chronic patients who were well enough to engage in research and to be interested in research. We applied the similar framework that I mentioned earlier with generating transcripts from speech, running an automated analysis, and in this study we specifically looked at the frequencies of different specific words, we looked at different uses of parts of speech, and we also looked at sentence-to-sentence cohesion through a slightly different method. So basically we looked at speech on a word level, a syntactic level, and a sentence-to- sentence organization level, and we used those features to predict whether or not people had a schizophrenia diagnosis. And we also used a gold standard clinical ratings as a benchmark so that we could, you know, have something to compare it to. We took the speech and in a blinded fashion rated participants on the scale for the assessment of thought, language, and communication from Nancy Andreasen, the scales from 1986, but I believe it's still one of the most commonly used scales for rating thought disorder. And so we also, as a comparison, we also used these clinical ratings to predict schizophrenia diagnosis. And here's what we found for just the clinical rating scales. So now recall that these are generally patients who are functioning well and doing well in an outpatient setting. So actually the great majority of people with schizophrenia did not have any clinically noticeable thought disorder, and in fact there are only maybe three to four people with a clinically detectable or rateable level of disturbance. So there was no statistically significant difference in clinical ratings between the two groups. When we looked in a computational context, so I think highlighting some of the increased sensitivity that we can get from these computational methods, we saw that we were able to detect like words that were associated with being in the schizophrenia group versus the healthy control group. So people with schizophrenia tend to be more focused on themselves in a singular voice, so first-person singular pronouns, whereas first-person plural pronouns came up more frequently in the healthy control group. And you also had words incomplete words, so people would start saying something and cut themselves off and say something else instead, so perhaps related to negative symptoms or cognitive difficulties. And you know they had some words like, uh, well, use, people, no, never. And whereas in the healthy control group, I thought it was interesting that laughter came up, so when people were laughing during the interviews, and a lot of qualifiers like usually, great, awesome, super, recently, definitely. So this is also reflected in when we looked at parts of speech specifically, so whereas people with schizophrenia showed increased use of pronoun, people who were healthy controls had greater use of adverbs and words adjectives over here, showing they've had richer speech, so content that was more richly detailed and described. And here we look, we used a modern natural language processing model, a deep learning large language model called BERT, to look at how cohesive sentences were from one to the next. So basically the the y-axis scale is a measure of the distance between a particular participant sentence and whatever the interviewer said just beforehand. And as you can see for the yellow line, it's for healthy controls, it goes back and forth a bit, but actually as you move on farther and farther from the interviewer's prompt, the distance actually is sort of decreasing, so perhaps even getting closer to the initial topic at hand, whereas for people with schizophrenia, their sentences tend to go deviate farther and farther away from the initial interviewer prompt. So we saw this as potentially a reflection of something like tangentiality. Okay, and so then we we poured all these features into a naive base machine learning algorithm to predict schizophrenia diagnosis, and we found that with clinical features alone, we were able to detect this with 68% accuracy, whereas when we use the NLP derived features, we were able to get 87% accuracy. And combining clinical and NLP features did similarly, if maybe even a little bit worse. So our conclusions from this pilot study from a little bit ago was that natural language processing derived features predicted schizophrenia diagnosis more accurately than gold-standard clinical rating scales, and we thought that perhaps this would lead us to imagine that NLP may be more sensitive to subclinical disturbances in speech and language than clinician interpretations or ratings. And so that's it for me, and I'll let Dr. Baker come up. All right, thank you. Hi everybody, my name is Justin Baker. I'm an assistant professor of psychiatry at Harvard Medical School, and I'm the scientific director of the Institute for Technology in Psychiatry at McLean Hospital. I do have some disclosures that are unrelated to this talk, but just want to make sure those are reflected here. So I wanted to bring us back to sort of the way that we think about how to make sense of patients' mental status in a way to sense the mental status based on a lot of features that we can derive from signals. So in 2023, we have many more types of raw data that we can use to assess people's mental status than we did, say, you know, 30 or 40 years ago, or even just 10 years ago when we were focused more on things like the, let's see if I can get a cursor, things like brain scans and blood and trying to attach those to things from self-report. Nowadays we have many more types of sensors that we can use, so data from video, from audio, from wearables, phones, etc. But critically, it's not just the raw data that we're interested in. We need to really design good feature detection algorithms for any type of raw data that we're looking at. And so today we're talking primarily about speech, but just wanted to highlight that we really think about this as trying to map between those raw data sources through some kind of feature extraction into the latent features that we really care about from a psychiatric perspective, whether that's speech function, executive function, circadian effect, etc. And so it's really the ability to map between these that we think is going to be critical as we move into sort of a 21st century diagnostics. And so this is, to zoom out even further, it's sort of the way we think about all these things coming together in the future of mental health assessment and treatment, where we have at the top the kind of more computational phenotyping strategies from all the data sources that we can now access and are relatively straightforward to capture, going through that feature extraction, but ultimately tying those latent psychiatric variables, whatever we're computing about these hidden variables, and then tying them to user experiences such as apps and other things that can harness synthesis among the different variables and detection of change between over time and individuals. And at the Institute for Technology and Psychiatry, we develop things really across the spectrum, from the designing of new comprehensive phenotyping through to the design and consumer focus, and then building tools for research and clinical practice as well. So today I'm really going to focus in on a set of projects where we've been focusing on capturing and characterizing dyadic interactions, so the kind of bread and butter of psychiatric interactions between a patient and the doctor. And we're going to use combined data from surveys, from video, as well as from electronic health records to be able to do that. So probably not surprising, but I think the way we think about the way the dyadic interactions are captured today is it takes months or years to train people how to do this, whether it's in a clinical context or research context. The administration and scoring, if you're talking about a SCID or something like this, is going to take multiple hours. At the end of the day, it's still highly subjective, and it doesn't have very good intertemporal or inter-rater reliability. If you have the same person doing that SCID later and looking for change over time, it's just a very sort of noisy system. And so we really determined, for our own work, that in-person dyadic interviews are not ideally suited for capturing robust point estimates of someone's symptom severity across multiple dimensions over time in the individual, especially as we try to scale those observations to larger studies. Another way of putting this, today doing this is very human resource intensive, it's very expensive, and it's ultimately unreliable. So how are we going to address this? So what our group has been doing is trying to take what has been occurring over many hours and boil it down into the essential elements that we could get. And rather than using an automated or manual coding approach, we've turned to computer scientists such as this group at Carnegie Mellon, led by Louis Philippe Marantzis, who developed systems for feature engineering, video, and audio signals. So here I'm showing an example of something called MultiSense, which just takes data from cameras and microphones and turns those signals into features of use, like facial expression data, gaze, pose, as well as how people, the acoustics of how people speak. And so we can combine those signals, but then the question was where to test them. And so we focused in on a common scenario, if you're an inpatient psychiatrist, you know that the daily clinical rounds is a place where over the course of just a few minutes, a doctor can make an assessment with a few questions about someone's affective condition, their psychotic condition, they make billing, you know, be able to bill for that, they make important life-and-death clinical decisions based on that. So if it was possible to get all of that information by a trained clinician in just a few minutes, surely we should be able to do something similar by recording those interactions or something simulating those interactions and taking not just the words being used, but all of the nonverbal stuff as well. So here I'm not going to go through it for the interest of time, but we have a semi-structured interview that we use to probe that, things like an open-ended question, what brought you into the hospital, as well as these follow-up questions, which, like the chat bot, is a little bit more indirect. It's not always asking, are you experiencing hallucinations, but it might ask, you know, is anyone bothering you in the hospital, right? So we designed those with clinicians' input, and then what you're seeing on the bottom is a sort of, I think of as like a behavioral EEG. So over the course of a 10-minute encounter, you can see how both the participant and the interviewer's facial expression changes. So for instance, we intentionally put the participant by themselves for the first two minutes, the way you might observe a patient down the hallway before they're in the middle of rounds. That's the kind of alone period, and you can see in this case the participant is making a lot of brow raising. That's that first row in the heat map showing a lot of red there. So they're doing a lot of sort of looking around the room. As soon as the doctor enters the room, you can see that you have suddenly some signal in the smile effectors, so the eyes, the mouth. And then over the course of the interview, you can see how the patient's face is moving. And for instance, you can see that the evaluator in this case isn't really smiling at all through the entire encounter, but they're doing a lot of brow raising, a lot of squinting to kind of indicate interest. So I mentioned this as just something we can now take raw data from video and convert that into features that might actually have clinical utility just by being able to visualize in a quick way what happened during the course of encounter. But so that we can go beyond that and try to tie specific responses, not just over the entire interview, but in response to those specific probes to other outcome variables. So for instance, here, the brow furrowing of this individual, sorry, the standard deviation of brow furrowing goes down as a function of how much depression you might have. And so to kind of see that that's somewhat intuitive, you don't expect that someone who's depressed to be frowning constantly, but you do expect them to have less affect than usual. They're just not, the variance in their affect really does go down as a function of how depressed they were. And then that alone period I mentioned, we saw quantitatively that the more social behaviors such as brow raising, this individual, or across the population of about, I think in this case it was 50 interviews, you saw that patients who would ultimately get scored as having unusual thought content, those kind of delusions and things, they also showed more social behaviors when they were by themselves. And so you can take this basic approach to map and use machine learning to map between whatever observed and predicted variables you want. And so this is just showing that we can take a combination of optimal features from these kinds of studies, look at specific contexts and build models that do a pretty decent job of estimating, for instance, the negative symptom scale from the PANS. We really wanna turn focus to voice and language, so we also are very interested in voice as a psychiatric vital sign, something that, like other branches of medicine, something that is really kind of fundamental to how you're doing and has normative ranges that are gonna change with age and sex. Something called vowel space ratio is just how different your vowels sound from one another. So ah, ooh, ee have different sounds within the formant space, but when you're depressed, we know that those vowels start to sound more similar to each other. You're not making as much articulation. Ah, ooh, ee sounds less distinct than the first version. And so we were able to show that this is another way you can estimate not just depression, which constricts the vowel space, but also things like mania and psychosis, which seem to expand the vowel space. So you may have seen patients before who were over-articulating and have really intense kind of speech. So again, this is just a demonstration that we can use computational acoustic analysis to be able to begin mapping between some of the more traditional clinical outcome variables and these ones we can compute from really any kind of speech samples. And so visually, the way I think about this is similar to what Dr. Tang showed briefly before. You can take large, high-dimensional variables. In this case, I'm showing, you probably can't read very well, but these are all of the individual items from conventional clinical assessments. So the Madras assessments of depression, YMRS assessment of mania, and PANS assessments of psychotic symptoms. These are how those symptoms spread out on a symptom landscape. Now what I can do is I can just superimpose on top of this the same computed variables from the video of that encounter, show how conceptually we're gonna begin wanting to map between the computed features and the ones that are part of the gold standard. So over time, as we get more data, we'll be able to further push these groups out and do better and better prediction. So I just wanna shift now to talking a little bit about a study we did in linguistic markers. So now turning back closer to what the theme of this session is about. So we have 99 sessions from inpatients at McLean Hospital where I'm based. Each interview was scored for positive and negative symptoms. And then each interview was, then we looked at a number of different features from the speech samples. The lexical word category, so this is using something called the linguistic inquiry word count. This is basically similar to what I think was one of the categories that Dr. Tang showed before about basically looking to see which categories of words are showing up most frequently. We also then looked at these coherence features that she mentioned, as well as these disfluency features, like when you make errors, like in the example here, when you make different kinds of having to restart your sentences. So we looked at those three different types of features. And then importantly, because we were following individuals over time, we really wanted to distinguish within-person change from markers that really signal changes within an individual versus markers that distinguished individuals from one another. That first type would be within-person slopes, and the other type would be between-person slopes. And just to make sure we're all thinking in the same way is that you might think that the variables that are signaling the difference between people would be the same ones that signal change over time in a single individual. And that's known as Simpson's paradox or the ecological fallacy. So with all of these studies that we do, we really try to make sure we separately model the between-person and within-person slopes. And so this paper has been published, and you can find it online. But just to kind of highlight the results that we have, we were really trying to develop two models, one for positive symptoms and one for negative symptoms in this sample. And so you can see here the distribution of positive symptoms that we had across those 99 interviews. And then what we found was that across all of the different lexical features, so you can see we looked at positive emotion words, negative emotion words, cognitive processes, perceptual words, these are reflected over here, maybe a little easy to see in this plot. You can see that in the between-person slopes, it was those perceptual process words and affiliation words that showed significant increases in the patients with high PANS. So the higher, the more perceptual process words meant that you were likely to have a higher PANS positive score. And then the achievement words and the reward words were less common in that group. So the lower the scores of those meant higher PANS positive. But interestingly, in the within-person group, so these are now patients who have changes in their PANS positive over time, we saw that the negative emotion words were actually better predictors of who was having the higher PANS positive. And then the coherence, that perplexity measure similar to what Dr. Tang mentioned, that was actually changing, oops, okay, go back. That was actually changing as a function of PANS positive and normalizing as patients had less PANS positive. Okay, so now looking at negative symptom words. So these are again, these are now features of the lexicon and how people speak that were correlated with how many negative symptoms they had. As you can see here, we had a range of negative symptoms, probably not as high. These percentiles here are from the original K papers. So we don't have any people in a really high range, but nonetheless, we saw that we still had a few significant findings here where negative emotion words were actually correlating with between persons. So higher negative emotion words meant you were more likely to have PANS negative. Then relativity words were actually lower in the patients with high PANS negative. And the edits, the frequency of how many times you would edit, that was also more frequent in the high PANS negative patients. And then again, within person was a little bit different. So patients who had high PANS negative showed high cognitive process words when their negative symptoms were high, but that normalized as they improved. And then likewise, these power words, which I'll show you a couple examples of what this looks like in just a second. Those were also more common in individuals while they had higher negative symptoms. And so it's hard for you to keep track of these different findings. Here I'm just showing visually what some of the words associated with the two groups were. So on the left, these are words that were associated with high positive symptoms in red, and then low positive symptoms in blue. And then on the right, these are again, for the negative symptoms, words that were correlated with high negative symptoms in red, and then low negative symptoms in blue. And so again, the point here is not the specific words that they used. It's really more about the categories. And the idea is that, just like in that earlier slide about the latent constructs, it's not the specific raw data of the word that they use necessarily, but it is that latent space of which category are they more using words in that could give us some insight, not only into maybe what's going on with them, but also in cases where you might wanna predict their scores later. So how much time do I have? A few minutes. Okay. So I just wanna turn in the last few minutes to a different study that we've done where we've been focusing less on dyadic and more on individual remote monitoring. And so this is an intensive longitudinal case series that we did in patients with bipolar disorder and schizophrenia. We collected over 100 person years of data from this cohort of about 72 individuals, starting back in 2016. And then what this is showing on the right is that COVID happened right in the middle of our data collection, but we were still able to collect data across multiple types of data. So accelerometer, phone, GPS. Let's see if I can read. This is a watch, like a wearable data we had. And then we did have clinic visits to capture speech and clinical status. We also had a phone app that collected EMAs, as well as voice audio diaries. And then we also collected some cognitive condition data. So I wanna just highlight, we think that audio journals are a really exciting way to collect speech samples from individuals. So this is a 24 year old male who had schizoaffective bipolar type. And what we were very interested in is whether we could start to use signals both from the phone and from the interviews themselves to predict how disorganized someone is. So focusing here in on the clinical construct of disorganization with the idea that that is often a very high predictor of long-term function. And so what we did was basically capture in this patient and other patients, these are individual interviews. And so we can quantify how they were scored on certain of the kind of classic disorganization measures from the PANS, so conceptual disorganization, abstract thinking, and poor attention. And you can see that this individual does have changes in their PANS score over time, their disorganization subset of the PANS scores over time. And then you can see that here's an interview where he was being scored as moderately disorganized, and you can see the text from that interview. And so we did notice that there were changes in the disfluencies per sentence, as well as the words per sentence that were highly correlated with conceptual disorganization. And I'll come back to this in just a moment. Here, I'm showing now this individual's phone, so activity derived from their phone. So this is from their accelerometer. And then this is from the screen in their accelerometer. Each column here is a day. And so you can see that down here, these are showing weeks of time. So this person's been recorded over 90-plus weeks. And then you can see that their sleep pattern is quite abnormal. So if I were to show you a healthy individual, you'd see a pretty consistent sleep-wake period of time. This person's showing some very erratic patterns of sleep, just that you can pick up via their cell phone. And so what we were very interested to know is whether you could predict the disorganization score directly from this sort of irregularity of their sleep pattern, specifically in the phone usage. And so we did see that there was some correlation between the two features, at least in this participant. So what we next did was to look at this across all the patients in our sample. And so just to kind of cut to the chase, because again, this has been published recently, we saw that the language features were, there were quite a few language features that were tightly correlated with the conceptual disorganization, particularly restarts per sentence, repeats per sentence, as well as all disfluencies per sentence, so those kinds of errors, as well as words per second, words per sentence, and total number of words. So the more words people used in a sentence, and overall how much they were speaking correlated quite a bit to how disorganized they were. We did see that phone use and missing days of phone data were also correlated, but it was not as strong of a correlation. And then in my last slide, I just wanted to again highlight, that analysis was actually focused on the audio data from the interviews themselves, but the audio journals is an area that we've also been very interested in. This is where we found that sentiment scores are the most tightly correlated with sentiment derived from an ecological assessment. But in the interest of time, I'm just gonna wrap up here and say that, you know, we can do quite well to estimate someone's mood state just based on these sort of free open-ended audio questions. So I think they're really a promising area for people who are getting into this type of work. With that, I will wrap up, and happy to take more questions during the discussion. Thank you. Okay, great. So I'll be talking about applications, barriers, and future directions for these computational speech analysis approaches in schizophrenia. My name's Michael Spilka. I'm a scientist at Winter Light Labs, and in terms of my disclosures, I'm a full-time employee of Winter Light Labs. So, when we're thinking about integrating these approaches that we've been hearing about into clinical care, we really need to make sure that these are going to be practical, easy to use, and accepted by both patients and clinicians. And thankfully, assessing speech is something that's typically low burden and has good ecological validity. Speaking is something that we do all the time in daily life. It's not complicated. It doesn't require a lot of instruction compared to some of the other tests and measures that we get our patients to complete. And we've also seen rapid growth in the use of computer and smartphone applications for assessment and treatment in different areas of psychiatry, including cognitive and functional assessment, treatment adherence monitoring, cognitive behavioral therapy, cognitive remediation, and even virtual reality-based therapies. And so, there's an opportunity here for us to be able to integrate speech assessment into these existing technologies going forward. Now, it's also important for us to determine whether the patients that we treat are amenable to completing these assessments, especially if they'll be deployed in a remote capacity for people to do at home on their cell phones. And there was a survey study that was looking at examining prospective attitudes towards speech recordings in a large clinical sample. And here, participants mainly had depressive disorders or anxiety disorders, but they're also participants with bipolar disorder diagnoses and psychotic disorders. And what they found was that 66 percent of the respondents indicated no objections to having their speech recorded. And an additional 24 percent indicated that they'd be interested in this if they were well-informed about the process. Fifty-four percent indicated that they would consider downloading a smartphone app to manually record their speech, so as opposed to having this be passively recorded. And those who reported no interest in this, they cited privacy as their primary concern. Forty-nine percent would not be willing to download a smartphone app for the automatic or passive recording of speech. And so these findings are interesting, but we can also ask, what about individuals with psychotic disorders? Are there any special considerations there? And this was a study that compared mobile mental health app use between individuals with psychotic disorders, a group of 72 outpatients, as well as a group of outpatients with anxiety and mood disorders. So this study wasn't looking at speech assessment per se, but mobile mental health app use. And what they found was high rates of mobile phone ownership in both groups. However, there were low rates of downloading and even lower rates of actually using those mental health apps, which hovered around 10 percent in both groups. Both groups indicated privacy and accuracy as their top concerns. And interestingly, the psychotic disorder group reported having greater discomfort about these app features, including the gathering and transmission capabilities of these apps. So when we think about the results of these two studies, I think this is encouraging in the sense that patients are generally receptive to speech or app-based assessment. However, clearly there are some barriers, notably privacy is a key concern. And I think it will be important to examine whether these concerns interact with symptomatology in psychosis. So is it the case that patients with paranoia have greater privacy concerns? Or is there a risk here that some of the features of this approach, like passive speech collection, might even exacerbate symptoms in vulnerable patients? So we can also ask about how mental health professionals feel about incorporating these assessments into clinical practice. And I think for the sake of time, what I'm going to do is I'll leave these questions until the end. And so you can kind of think about this question as I'm going through the slides, and we'll bring up the slide at the end. Okay. So I do want to talk about this question of how feasible incorporating computerized speech assessment is in clinical care. And we have some preliminary feasibility data coming out of a study that was run at the Center for Addictions and Mental Health in Toronto, Canada. And the goals of the study were to address the feasibility and acceptability of app-based, high-frequency remote speech assessment in individuals with first-episode psychosis. And the other goal was to identify speech features that can predict relapse. So I'll stress that this was a research study, but it was designed to mimic how this technology would be deployed in an outpatient setting. And the participants were first-episode psychosis outpatients who either had primary psychotic disorder or a primary mood disorder with psychosis. And in terms of the assessments, participants completed weekly assessments, weekly speech assessments for 52 weeks. And these were administered through a mobile app that they completed on their phone. And the app had notifications that would prompt participants to complete these assessments each week. And we had several different speech tasks, including a verbal journaling task, a paragraph reading task, and a picture description task that I'll show on the next slide. There were also clinical assessments conducted at baseline and every three months to look at symptom severity over the course of the study, as well as a chart review to document relapse and admission events. And then at the end of the study, participants will complete some qualitative interviews and questionnaires about their experiences to get a sense of patient attitudes to this technology over the course of the study. So here is an example of two of the tasks. On the left, we have the picture description task, which shows kind of a complex scene and the participant is asked to describe everything that they see in the picture. And on the right, we have an example of a journaling task. And the prompt, I know the font is a little small, but it says, how have you been doing this week? What was good about it? What was not? And participants have to provide their responses. And there are additional prompts that pop up if participants are not speaking for long enough. So we have some very preliminary interim results here. We found that we were able to enroll 49 participants into the study at baseline. And we've been really happy to see relatively high retention rates. So 40 participants were still completing the speech apps after 12 weeks. And there were also minimal issues in the speech recordings themselves. So only about 3% of the speech samples that we collected were unable to be processed, either because there was too much noise in the audio signal or other technical glitches. And there's been minimal reported participant concerns, but of course we plan to look at this more systematically at the end of the study. So even though this is a research setting, I think there's preliminary evidence here that this high frequency remote digital speech assessment approach is feasible in first episode psychosis outpatients. And this is really exciting for the idea of incorporating this into clinical practice. So I'd like to shift now to talking about some of the clinical applications that this research has been focusing on. And again, another question that I'd like you to think about just in the interest of time, I'll leave this to the end, because I want to make sure that we can also answer some of the real questions that you might have at the end here. But just to think about what would be the most valuable application for you in your clinical practice. And I'd like to walk through four of the main ones that research has focused on so far. So some of the initial work as Dr. Tang presented was really looking at classifying individuals with psychosis versus healthy controls based on these computational speech features. And there have been a number of studies coming out now showing high accuracy rates and being able to classify individuals with psychosis from healthy controls based purely on speech. But I would argue that when we think about the types of questions we want to answer clinically, this might be a little bit less interesting or challenging than trying to look at differential diagnostic questions that we face in the clinic. So I think validating speech analysis to address some of these questions will be really valuable. For example, differentiating between low versus high negative symptom schizophrenia patients or classifying schizophrenia from bipolar disorder. And there are a couple of citations here of work that has used speech and natural language processing to achieve this. And I think NLP will have the potential for other differential diagnostic types of questions. So distinguishing between a primary psychotic disorder versus a secondary psychotic disorder. Trying to identify the individuals who are actually experiencing a prodromal psychosis versus other clinical presentations like depression or adjustment disorder that can sometimes be a little bit blurred. Now another application is using speech analysis to help in the assessment of symptom severity. And we're seeing a number of speech features now that show good convergent validity with clinical symptom ratings in schizophrenia spectrum disorders. So if we think about positive symptoms like disorganized speech, what we're seeing is that these can be captured by some linguistic speech features. For example, decreased semantic coherence, which was one of the features that Dr. Tang mentioned, referring to that kind of coherence in the flow of meaning in speech that unfolds as the patient is speaking, as well as reductions in graph connectivity. This is referring to connections between the words in speech, as well as greater unexpected word use in the context of what someone's talking about. And when we turn to negative symptoms like poverty of speech, for example, we're seeing other linguistic features being able to characterize symptom severity here. And this includes a reduction in syntactic complexity, so a decreased complexity in one's spoken sentences, as well as decreased content of speech. And other negative symptoms like flat affect can be captured by acoustic features, including the rate of speech, the number and duration of pauses, as well as a reduction in the variability of the volume and pitch of one's voice. So we think that natural language processing tools will be able to provide this more comprehensive and quantitative assessment of some of this clinically meaningful language disturbance in schizophrenia that we can use to improve our assessments. So another area of great clinical interest is using speech analysis to help predict illness onset. And there's some reason to believe that natural language processing can be relevant here. So there's some research that's shown that clinician rating language abnormalities are present years before the first psychotic episode in individuals who are at risk of transitioning to psychosis. And there's also research indicating that negative symptoms, including poverty of speech and content, clinician rated again here, are able to predict a later psychosis diagnosis in youth who are at genetic high risk for schizophrenia. And so NLP is recently now being used to look at this question. And we're seeing several linguistic qualities that can help differentiate clinical high risk youth who do versus do not transition to psychosis. And some of the more consistent features emerging here are again going back to these semantic coherence, semantic content measures. So we think again that computational speech analysis can help provide this more comprehensive detection of some of these more subtle speech abnormalities that might have predictive clinical value when it comes to identifying individuals who might be at greater risk for transitioning to psychosis. So another important application is in predicting treatment outcomes. And again, we know that language symptoms like flat affect can predict both short and long-term clinical course following an initial hospitalization. And perhaps unsurprisingly, language symptoms also predict social functioning. And so when we think about applying natural language processing, computational linguistics in combination with machine learning, there's some good research here showing that you can use these tools to predict first episode patients who go on to receive a schizophrenia diagnosis at a six month follow up. And I think there's an opportunity here to use these speech assessments, particularly when we think about high frequency remote monitoring, to start to look at deviations in participants' behavior that might reflect a change from one's baseline, which could be used to predict relapse. Now it's also the case that we know there's a lot of heterogeneity in schizophrenia and related disorders. Not every patient will have the same level of negative symptoms or even speech abnormalities. And so speech assessment might be able to help parse some of this clinical heterogeneity, which might help with patient stratification efforts to inform treatment approaches. Okay, so I'd like to shift gears a little bit and talk about some of the barriers to implementation because we've seen a lot of examples of research here. And despite some very promising research, the translation from the research lab to the clinic is still somewhat limited. And so I'd like to focus on a few of these limitations and barriers that I think we need to consider and address going forward. So it's really crucial to make sure that these computational speech measures that we're using are going to generalize from the research samples that they've been identified in to a broader sample of patients, especially the patients that we see in clinical practice. And it's really no surprise that language is something that might be influenced by various factors. Speech and language don't operate in a vacuum. They're part of our social and cultural context. And so when we talk about speech and computational speeches as being a digital biomarker, it's been proposed that maybe we should be talking about a socio-biomarker. And so I've listed on the slide here a few factors that we know can influence language. And I'm sure that you can imagine a few more. And in the NLP research world, we tend to statistically adjust for some demographic variables like age, sex, and education. But we're gonna need to be a little bit more sophisticated about this going forward. And I'll just mention that these are not issues that are unique to NLP speech markers by any means. We know that even clinician ratings could be influenced by some characteristics of the patients that are being assessed. For example, the education level of a patient can influence the thought disorder ratings that a clinician makes. And even more traditional candidate biomarkers like brain structure have also been shown to be influenced by different social factors. So we're gonna need to start looking at some of these socio-demographic factors and their influence on NLP measures and psychosis in more detail. The field is looking at some of this. And one question is, do the findings generalize across different languages and cultures? And so I wanna mention one study here that looked at speech analysis in Mandarin-speaking youth at clinical high risk for psychosis as well as 20 matched healthy controls. And they collected speech from open-ended narrative interviews. And they focused on analyzing several key linguistic and acoustic speech features. And one of the goals of the study was to use a machine learning model to classify the youth at CHR from the healthy controls. And they wanted to see if this approach would work well when NLP was applied to the original transcripts in Mandarin versus transcripts that had been translated into English. And what they found was very high classification accuracy of these models in discriminating the CHR youth from the healthy controls. And this was the case both for the original transcripts in Mandarin as well as the transcripts that had been translated to English. And on top of that, some of the top speech features that contributed to that classification accuracy across these models were similar, which included decrease in the use of adjectives, as well as decreased semantic coherence that we've seen come up in some of the previous examples. And another one was increase in speech pauses, which we know is associated with negative symptoms. So this is encouraging. However, this was a study that had a monolinguistic sample. And so we need to determine whether in cross-linguistic samples we're seeing some universality in our speech features. And I do want to mention another study that included a cross-linguistic sample. And this was a sample of 187 patients with schizophrenia and 200 matched healthy controls. And these participants spoke either Dutch, German, or Chinese. And speech was collected from a video description task in this case. And the study focused on semantic coherence, given that this is a key feature that's been linked to formal thought disorder. And what they found was that the strength of the association between semantic coherence and symptom severity tended to vary by the linguistic group. And they also, in the study, looked at multiple measures of semantic coherence. And what they found was that there was one measure that showed consistent group differences between the patients and the healthy controls, while the other semantic coherence measures, the magnitude of those group differences tended to vary depending on the linguistic group. So what this suggests is that heterogeneity in linguistic factors and methodological factors might interact when it comes to some of these important speech features that we're linking to symptoms. And so it's really going to be important going forward to figure out which are the features that are most robust across languages to carry forward into the clinic. So another consideration is how do we best standardize our computational speech assessment and analysis approaches? Currently, different research studies have used different task designs that can vary in structure, length, and the type of content that's elicited. And it's not yet clear which task is the one that's the most sensitive to speech disturbance and schizophrenia that we should be focusing on. And it might be that there's a lot of different that we should be focusing on. And it might actually be the case that we might not need a task, that speech obtained from clinical interviews may be just as sensitive. And this would be particularly appealing because instead of having to administer a separate task, you can focus on the speech that's already routinely collected during a clinical interview to help augment our traditional assessment approaches. So what we do know is that working to harmonize some of these assessment and analysis approaches and directly comparing and evaluating them will help us figure out which are the most clinically useful approaches going forward. So it's also important for us to think about identifying and ruling out any potential confounds that might bias some of these novel speech metrics that we're identifying to answer some of our clinically relevant questions in schizophrenia. So what do I mean by this? It's really about figuring out to what extent is the speech that we're focusing on really sensitive to the symptom or illness that we're interested in versus other associated factors. So we know that patients with schizophrenia that we see in the clinic tend to have a range of medical comorbidities, whether that's psychiatric comorbidities like depression or chronic medical conditions, or even thinking about the influence or the effects of medication and substance use on physical functioning. You know, all these might also have an influence on speech. And so working to systematically rule out some of these potential influences will really help us make sure that our speech-based markers are valid and clinically informative. And we've started to look at this as a field and I did want to spend a few minutes to talk about an example of this. So this was an analysis that looked at whether motor symptoms and antipsychotic medication dose influenced acoustic speech properties that are associated with negative symptom severity in schizophrenia. And so this was an analysis done on data from a larger study led by Dr. Sunny Tang. And this study was a prospective longitudinal cohort study that aimed to predict psychosis-related treatment outcomes and symptom trajectories using speech and language markers. And so the participants in the study are adult inpatients with schizophrenia spectrum disorders. For the current analysis, we had a sample of 43 participants at baseline. And participants completed a number of different speech tasks on a tablet computer, including some of these open-ended narratives or journaling tasks, the picture description tasks that we saw earlier, some verbal fluency tasks, and then reading tasks where the participant has to read a paragraph of text that's shown on the screen. For the clinical assessments for this analysis, we were interested in negative symptoms, which were assessed using the scale for the assessment of negative symptoms. Motor symptoms, which were assessed using the extrapyramidal symptom rating scale, which allows for global impression scores of different motor symptoms associated with extrapyramidal symptoms. And then we were interested in antipsychotic medication dose, which we converted to chlorpromazine equivalent units so that we could compare that across participants in the sample. And so the analysis focused on speech production features that are believed to be associated with negative symptoms. And I listed the acoustic and timing-related features that we looked at on the screen here. And for motor symptoms, we focused on Parkinsonism and akathisia symptoms because these were present in a substantial number of participants, whereas the two other symptoms, dyskinesia and dystonia, were relatively absent in most of the participant sample. And so we examined whether motor symptoms and antipsychotic medication dose correlated with some of these identified speech features. So turning to the results, we found that there were associations between negative symptom severity and several features capturing qualities of speech like speaking rate, duration, and intensity. And these tended to be present across multiple speech tasks. And on the right side, you can see the, well, it's probably a little bit small, I realize now, but I'm just listing the different features and then which negative symptoms were significantly correlated with those. And there's an example at the bottom here where there's a negative relationship between the total negative symptom score and speaking rate. So the faster the speaking rate, the lower the negative symptoms. Oh, okay. Now, we also found that motor symptoms were associated with some negative symptoms, including Parkinsonism severity being associated with blunted affect and alogia and akathisia severity being associated with blunted affect. And we're seeing an example here of the relationship between alogia and Parkinsonism. And CpzE dose was also associated with both Parkinsonism and negative symptoms. And so this is what we would expect. We know that there are speech features that can index negative symptoms, and these negative symptoms also tend to be related to EPS symptoms and antipsychotic medication dose. However, our speech features were not significantly associated with motor symptoms or antipsychotic medication dose. And we confirmed this absence of an association with Bayesian statistics. So what this is telling us here is that the speech correlates of negative symptoms that we're identifying are not confounded by motor symptoms or antipsychotic medication use, which is really promising to be able to use these speech metrics to index symptom severity. So when we think about implementing any new technology in a clinical context, we really need to consider some of the potential ethical issues that this might raise. And this is certainly the case when it comes to computational speech analysis. So notably, we need to think about privacy and confidentiality. These digital assessment modalities will need to be accompanied by oversight in how patient data is stored, transmitted, and accessed. And what we know on the research side is that clinical validation research really benefits from these large, openly shared data sets. But these efforts will require careful attention to issues like consent to share these data and how to best anonymize the data to maintain privacy and confidentiality. And speech recordings can contain inherently identifiable information, either through what the speaker is disclosing or what we can identify applying our natural language processing tools to the speech recordings themselves. So because of this, it might be that computerized speech assessment requires special monitoring and safeguards for privacy breaches compared to some of the other novel digital markers that are being explored. Another ethical issue is the potential risk for bias. So the participant samples that we use to develop our machine learning models, they might contain sampling bias when it comes to sociodemographic factors. And this bias can propagate to the models themselves. And what this means is that we need to be mindful of any model that imposes different outcomes on different groups of individuals and at different rates. And we need to be aware of any hidden sources of bias out there, even in some of these more objective markers like digital speech metrics. So ultimately, identifying sources of bias, making our models more transparent and building research databases that are comprised of representative samples is going to be necessary to safeguard against bias, prevent potential harm, and promote equitable health care. Okay, so with the last few slides here, I want to turn to some future directions. We've seen some interesting applications, and I think there are a few kind of future directions that are starting to be explored that are particularly exciting. So the first of these is multimodal ambulatory assessment, also referred to as ecological momentary assessment, and the idea, and Dr. Baker talked about this a little bit, is rather than assessing patients infrequently in the clinic, completing these extensive interviews on an infrequent basis, can we use technology to sample patients' behaviors and experiences in real time throughout the day. And so this approach has led to advances in what's referred to as digital phenotyping, which is really about characterizing within-person fluctuations in symptoms, which we can tie to changes in the environment and social context. And the goal here is to get improved resolution and precision of our assessments across time and place. And I think that the idea of incorporating speech into EMA can be really helpful, one, to get a better sense of a more accurate representation of what speech is capturing in terms of clinical symptoms, and speech itself can improve the clinical utility of EMA. So one example that I want to mention was a study that looked at using EMA behavioral and self-report data to predict affective states in bipolar disorder, and what they found was that if you combined acoustic speech features obtained from phone calls with the EMA data, you had better prediction of both manic and mixed affective states in bipolar disorder compared to just relying on the EMA data, the EMA data alone. So another potential benefit of the low burden or remote assessments that we've been talking about is the possibility that we can obtain the results in real time to augment our clinical assessments. So for example, getting a more comprehensive speech analysis of language disturbance that we can then use in real time to incorporate them into our symptom ratings when we're assessing symptom severity. Or incorporating language markers of psychosis risk alongside with other more traditional risk markers to help inform treatment decisions about the patient in front of us. And it might be that speech-based markers can be developed to identify unusual symptom presentations or even discrepancies in symptom ratings that might warrant further investigation. So for example, if a symptom rating seems to have a discrepancy with the value of a speech feature that we know indexes that symptom, it might prompt for additional follow-up to figure out what's going on in this particular patient. Now what I'll mention is that for this real-time approach to be incorporated, we're going to need to keep improving some of our automated speech recognition capabilities to make sure that accuracy and reliability is really equivalent to what the gold standard is right now, which is manually transcribing our speech recordings. Good. Yeah. I've got one more slide to go and then we'll end it. Thanks. Okay. So a lot of the research that we've discussed is collecting speech using some of these speech-focused tasks as we've seen in picture description or journaling. But we know that the field of passive digital phenotyping is growing rapidly using technology that's found in our smartphones like geolocation, accelerometry, even call and text logs. There's an opportunity here to collect large amounts of data at minimal burden. And so I think a value of this potential approach, again, is to have this data that can help monitor symptoms and behaviors to identify deviations from a patient's baseline that might signal symptom exacerbation or the onset of a relapse. But again, as we saw, data privacy is a key patient concern, and this might be particularly relevant for these passive data collection technologies, especially in individuals with psychosis. So I think I'll stop there so that we have about 10 minutes for discussion and I'll turn it over to you, Marlon. Thank you. So thank you, everyone, and I just really want to thank our panel today. We have about 10 minutes left, but let's just give them a round of applause now in case we run over a bit. Okay. So I have a few slides that I'll bring up in a second to generate some discussion. In this interim, though, I do invite people to go up to the microphones and those who are joining us virtually to add your questions or comments onto the chat as well. So my name is Marlon Danilewicz. I'm an assistant professor of psychiatry at the University of Toronto and associate medical director at Ontario Shores there. I have a few disclosures relevant to this, is that Winter Light Labs as a consultant. So why don't we take some questions as people are already here and make use of our time. So the gentleman over here in the pink shirt, please. Yes. No, thank you very much. I was just thinking about, yeah, patients with paranoia and the whole issue about consent. You sort of slightly alluded to in your last slide, do you find that that is an issue? People get very paranoid about the speech monitoring. So I think it's like, so, you know, as I mentioned, we aren't typically looking at passive data collection in that sense, which I think reduces some of those concerns. But the other piece is typically when we explain the process and just make some space to address some of those concerns and answer those questions, most participants tend to be very open to this process. Thank you. Yeah. I'll just comment that in some of our studies, we have been doing this passive data collection and I have patients who are quite acutely paranoid and they have lots of delusions of being persecuted, et cetera. And so I just straight up ask them, you know, why is it that you feel comfortable sharing all this data with me? And yet you're constantly feeling, you know, tracked by, and his response was kind of simple, but also I think telling, which is like, well, are you saying I shouldn't trust you? And so I think in that healthcare context where there was sort of a built in trust, he was willing to share anything with me, especially in that case, I wasn't his treater. I think he had issues with his treater. So in my experience, yes, there are people who are uncomfortable sharing, you name it, you know, GPS. We don't do passive audio recording, but any of these kind of measure, but it doesn't quite break down the way you might think in terms of paranoia and non-paranoia. We have patients who are not paranoid, but don't feel comfortable. Then we have patients who are, who are providing a lot of data. All right. We'll take the next question. Thank you. Hey there. I'm a fourth year medical student. I have a peer that I'm working with that developed an app that's been monitoring just basic health aspects, been doing it with pain and sickle cell, I've been trying to find ways to incorporate it into psychiatry. One of the things I was thinking of was in patients that came in for a suicide attempt, following them after the case to check to make sure they're doing okay and things like that. And it has in the app things embedded like responses. And one of the things I was thinking of, and this kind of goes along the ethical connotations here. I was thinking that the lack of response would almost be more indicative than other types of response. But to what extent do you decide, okay, we should send someone, we're worried about this patient, we want to check on them. Like if they just decided, oh, I don't really feel like using this anymore, and then is that an invasion of their privacy that they decided not to be a part of it or to, I guess, where do you draw the line and where you reach in versus where you step back? Yeah, I mean, I could comment, others may as well. I mean, I think the key here is in a research context, it's got to be really made clear during the informed consent process what your policy for the study is going to be. And in longitudinal studies like the one I was mentioning and others mentioned, we need to rethink the informed consent process as something you don't just do right at the beginning of the study, but it's especially working with psychotic patients or patients who might be suicidal, really making sure that over the course of the study, if you want your policy to be, look, no one is reviewing this stuff in real time, you know, you need to go to the emergency room if you have an urgent issue, we're required in our IRBs to put those kind of disclaimers into the app itself so that as they, you know, anytime there's a free text kind of answer, like, you know, that we're not just simply asking them what's your level of depression or whatever, there needs to be a disclaimer in there reminding them that if they do have an urgent issue to go to the emergency room. On the question of missing data and how to interpret that, you know, yes, in some cases that could be a sign that someone's doing worse, but as you implied, sometimes it's a sign that they're doing fine and they just don't want to do your app. So to me, it's not that there's one clear actionable thing. I mean, it's different in the case where I'm doing an interview with someone and they tell me that, you know, that they're suicidal and we do the usual thing there. But I think in these remote monitoring studies, we just need to be careful not to do anything that was outside of what the person was told during the consent process and then, you know, try to stick with that as much as possible. But yeah, I think that's a great question. And I think, you know, we learn a lot from other areas of medicine, too, where we have biomarkers, where there are cutoff values and blood tests that, you know, are more urgent that, you know, trigger that. We've done some work at our hospital using different kind of biometric data where that's coupled with our, like, assertive care teams and providing them input to when to intervene. But yeah, interesting to see some of the feedback coming through in terms of where people are seeing the applicability coming through. A lot of consideration in outpatient teams, ACT teams, okay? The data is changing as we're going along. Why don't we take another question here in person? We'll see if there's some more questions that trickle in through virtually. Go ahead. Thank you. So, wonderful symposium presentations. And I'm very interested in the use of this information in terms of predicting relapse. But I also wonder whether it would be possible to do a study on the other side. Years ago, a linguist at Penn, William Lebov, wrote a book called Therapeutic Discourse, which looked at linguistic markers of therapeutic improvement. And one of the, you know, difficulties in clinical work is it sometimes takes a very long time to know whether someone's going to respond or not. Do you think this could possibly give us sort of early warning, not just of relapse, but of response? Oh, yeah. Sorry. Did you want? No. Yeah, that's a great point. And we have a pilot study that's looking at exactly that. And so, we're collaborating with Kyle Miner and Paul Lysaker at the University of Indiana, where they developed MERIT, which is a metacognitive treatment for psychosis. So it's based on the idea of improving people's, like, narratives for themselves and their understanding of their role in the world, and their role in their, the active roles in their lives. So we had 24 sessions each from five patients undergoing this treatment. And we were able to apply natural language processing techniques. In particular, we, just in terms of our hypothesis-driven approach, we wanted to look at the ways that people were referring to themselves, whether or not they were talking about themselves in an active voice, so I'm doing things to others, versus others being, you know, things being done to me. And the other things we looked at were Luke categories, which I think you used in some of your work. So they're, so looking at words that refer to cognitive processes, as well as future orientation versus past and present. And we were able to see what you're talking about. So these objective NLP-based markers of what we think could reflect, you know, this metacognitive therapeutic target. Thanks. Great. Thanks. Great question. Nice to see that we have a healthy degree of optimism and skepticism here amongst the audience today. We're going to take a question virtually. Michael? Yeah, there's a question from our audience attending virtually, and actually, I don't know the answer to this question, but it's, have you considered applying NLP to adult attachment interview and looking at components of Grice-Maxims, like quality, quantity, relation, and manner? I don't know if either of you have any thoughts on this. Certainly, that's not an idea that I've considered before, but it's definitely food for thought, and I'll have to think about that a little bit more. Yeah, I mean, I haven't specifically done that, but for the person online, I think, you know, the message here would be that this is really quite straightforward research to do compared with, like, brain imaging or other things. You literally can, if you have an area of interest, just begin recording, you know, the sessions that you might already be getting, and tools to do these kinds of analyses are readily available. You don't have to be a computer scientist to pull them off, but I don't know if you have any. Yeah, so, actually, we have a paper that just came out in Schizophrenia Bulletin, where we actually discussed Grice's Maxims, but we were, so, it wasn't directly related to the NLP part, though, so I was trying to understand, so we talk about these speech disturbances, but actually, you know, if we did a brainstorm now, I bet this groom could come up with at least 20 different terms, or 20 different symptoms that relate to speech disturbance, right? Tangentiality, circumstantiality, derailment, flight of ideas, poverty, poverty of content, and I can go on and on, so our paper was addressing what kinds of core clinical constructs should we even be modeling with NLP, and there, we found some, you know, some separation between incoherence versus inefficient speech that we thought could relate to some of the Grice's Maxims, in terms of pragmatics of speech, but, yeah, I think that's great. Thank you. So, we have one or two questions. I do want to thank everyone now just for coming, because we are out of time, but, and if those who need to leave, please do so. Hopefully we can stick around for another minute or two, I'm seeing some nods, and we'll take the last two questions here, so thank you. Thank you so much for this really interesting talk. I have a question about one of the potential pitfalls of this technology, and I'm thinking about the fact that African American folks are misdiagnosed with psychotic disorders at higher rates than white folks, and imagining a situation in which, you know, racialized speech patterns could come to be marked as being a risk factor for psychosis, or a marker for psychosis, and just wondering what kind of work is being done to safeguard against that. Great question. I'm sure others have thoughts on this. I mean, from my perspective, change within an individual over time, and focusing on symptom severity measures is going to be critical for that, right, because if you're focusing at the group level, then those kinds of biases are bound to creep in, but if you are looking at, you know, I made a point in my talk that it's often not the same features that distinguish individuals as the ones that are showing changes over time, so I think as we begin, you know, it is a type of data that's easy to scale relatively to, you know, other types of things like, you know, MRI scans, so if we as a community can make the effort to collect samples at large scales that cover multiple cultural contexts, and then, again, really focus in on changes over time, so if we can detect that someone is symptomatic versus less symptomatic, hopefully the models we can come up with would be robust to some of the cultural issues, but where they're not, to at least have sufficient data within individual contexts to be able to make those distinctions, but, yeah. Yeah, I'll just say that I agree, and I think one of the approaches is to make sure that the models themselves are really transparent about what is actually contributing to the classification, and then making sure that we're testing models in varied groups of participants, right, to really try to guard against that, but it's a really great point that I agree with and that I mentioned in that previous slide that, you know, these are not necessarily abstract examples, right? We've seen racial bias creep into things like face recognition technology to predict criminality, and so it's something that we think about a lot. Yeah, thank you for that great question. Thank you. Last question. Wonderful talk. Thank you. My question is, is there any application of NLP to the diagnosis of personality disorders? Yeah, great question. We did a study in young women who were in a borderline personality disorder residential treatment facility at McLean where I'm based, and we were very much interested in this question, and so we studied similar to our other studies. We followed these young women over time as they were getting both their initial diagnostic interview with an expert clinician, but then also over time as they got the BEST, which is a symptom severity rating, and so we were interested in both whether you could identify in a diagnostic interview the moments where the expert clinician kind of clicked and decided to make that jump to the diagnosis, so focusing in that study more on the moments that were discriminative for that expert clinician, and then at the group level, looking at how the symptoms of irritability and other things in those young women changed over time in relation to some of the linguistic measures, and so, you know, I think this is an area that's very ripe for further investigation just because it can be very difficult to diagnose, and I think, you know, given the high rates of suicidality and other things, these are very high-risk populations, so seeing people come into it who are treatment-resistant coming in for residential treatment, it becomes really important to understand the people who are still at high risk just prior to leaving the program versus the ones who've shown improvement, and so similar to some of what Dr. Tang mentioned, we did see certain words, choices, and things coming in later in treatment as a function of how much they had improved, at least based on this BEST score, so, you know, happy to talk more about it, but I think that's at least what our group is focused on. I'm not as familiar with other areas of personality disorders, but I think, you know, it's certainly a very interesting area. I'll just mention that I think when we think about personality disorders compared to other disorders like schizophrenia and the types of language disturbances, you know, we'll probably be dealing with more subtle changes or distinguishing features, and that might make the work a little bit more challenging, but I think that's also one of the advantages of using these computational linguistic approaches, right, to have this very comprehensive, quantitative, and reliable assessment of language, because that lack of reliability that we sometimes see in clinical ratings is what can kind of obscure some of these systematic patterns that are present in speech that we're interested in. Justin, is your paper published on borderline personality disorder? No, we're still writing that up. Okay. But I think we may have it, maybe on our heads, but I'll also just mention that context here is so critical. So it's not just the language sample itself, right, it's the context in which it's captured. So what we're finding in borderline is that, you know, whereas maybe compared to a mania or something where the speech disturbances might be somewhat less context-dependent, where people have increased speech rate maybe across multiple contexts, I think we're going to find that in some of these conditions, the context is, and even sometimes like that micro-context, like who you're talking to and what exactly did they just say before the speech sample is captured that provides that kind of more diagnostic or, you know, the discriminative information. Yeah. Great. And I haven't seen any other studies. We actually have, one of my coordinators is interested in borderline personality disorder as well, but in a much smaller sample, I think, but I haven't seen anything else. Well thank you everyone today for your great questions and participation. It's been a treasure to be here today and at the APA and with this amazing panel, and thank you everyone again. All right.
Video Summary
The session focused on the application of natural language processing (NLP) and computational speech analysis in diagnosing and monitoring schizophrenia. The panel, moderated by Dr. Marlon Danilewicz, comprised esteemed experts Dr. Sunny Tang, Dr. Justin Baker, and Dr. Michael Spilka. They discussed the burgeoning research in NLP as an emergent biomarker for psychiatric disorders, specifically schizophrenia.<br /><br />Dr. Sunny Tang highlighted the historical significance of speech as a symptom of psychiatric disorders, referencing Emil Kraepelin's early observations. Tang introduced the audience to computational speech analysis, emphasizing its role in sensitively capturing subclinical disturbances which traditional clinical ratings might miss. A pilot study she presented showed NLP features outperforming clinical ratings in diagnosing schizophrenia.<br /><br />Dr. Justin Baker elaborated on the use of multi-modal data from speech, video, and wearables to capture mental states. He demonstrated how computational analyses of verbal and non-verbal cues in psychiatric interviews could predict symptom severity. His research showed promising results in predicting both positive and negative symptoms of psychosis through linguistic markers.<br /><br />Dr. Michael Spilka focused on practical applications, hurdles, and future directions for integrating speech analysis in clinical care. He discussed a feasibility study in remote speech assessment for first-episode psychosis patients, revealing high retention and minimal technical issues, suggesting its potential in clinical practice.<br /><br />The session explored the ethical implications and barriers like privacy concerns, model biases, and the need for comprehensive data across diverse populations. The presenters encouraged continued innovation and cautious integration of these digital tools to enhance psychiatric diagnostics and patient care.
Keywords
natural language processing
NLP
computational speech analysis
schizophrenia
psychiatric disorders
biomarker
clinical ratings
multi-modal data
mental states
speech assessment
ethical implications
privacy concerns
psychiatric diagnostics
×
Please select your language
1
English