false
Catalog
Everyday Analytics: Using Public Data and Free Too ...
View Recording
View Recording
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Good afternoon, everyone. Thank you so much for being here, especially the third or fourth presentation of the day for some of us. My name's Stan Mathis. I'm here with my friend, Dr. Oluwajangiri from Connecticut Mental Health Center in New Haven, Connecticut. You see two other names there on the presentation who were not able to make it with us today, but they're both fine. Don't mind, nothing too dramatic. Dr. Peter Kahn, a pulmonary fellow who actually worked with us on some analytic projects, had an obligation when this was moved to a different conference, and then our chair, Dr. Michael Cernak, who's also the CEO of our hospital, very last minute was unable to join us today, but they were both strongly here in spirit. Dr. Peter Kahn, who actually was gonna do one of the part of the presentation today, was able to record his beforehand, and so you'll see him in video format in a few minutes, so don't despair too much. Our presentation today is entitled Everyday Analytics. As you can see, using public data and free tools to yield meaningful insights for your patients, your clinic, and hopefully beyond. I wanted to, I realized that there's a potential for a little bit of a misleading aspect of this presentation, so I wanna make sure we start off, oh, first of all, I have no disclosures of my own to declare, the other presenters will make their own disclosures. But I wanted to clarify what this is not, what this presentation is not. As Mr. Clooney is getting dejected and leaving, this is not a data science class, this is not a stats class or a programming class. I wanna make that very clear up front. You will not be able to leave this presentation and specifically do any of the analytics on your own independently. To be able to encapsulate some small part of that into a 90-minute session would really be so trivial and unfulfilling, oh, we already have, oh, anyway. But that's not really the focus of this particular presentation. What it is more is a call to arms, a pep talk or a challenge to this group, to clinicians, to physicians, to prescribers, to practitioners in the field that we are the ones that should be doing as much as possible a lot of this analytics work, and that's the case that I've made for myself to convince you of during the presentation. The objectives for learning, by the end of the presentation, learners will appreciate the power of their role as clinicians in health data analytics. We will generate one data question relevant to your clients or your clinic and be familiar with resources and approaches to begin this analysis. I wanna call out specifically objective number two because that's hopefully something that we can start brainstorming, y'all can independently start brainstorming throughout the presentation because we're gonna come back to it at the end in a more of a collaborative or kind of cloud mind process, think through some problems that might be relevant to you or your patients and how we might begin to analyze them. The outline is pretty basic, we'll do some background work, introduce some concepts, we'll go through three example projects that physicians, in this case all psychiatrists or pulmonologists have done and we'll lead into a small practical demonstration of a very integral part of analytics, especially around health data and geocoding. We'll wrap it up, spoiler alert, there's a surprise twist at the end and then we'll finally end with a shouting match where you and I will try to amicably and collaboratively figure out some problem and a little problem that will be relevant to you and your clinical work and how we might be, how we can start addressing it with some of the tools and some of the data sources that we explore during the presentation. Does anybody remember Speed? The movie, I guess I should clarify. So it came out 29 years ago and so I think we should take a beat, those of us who remember it and sit in this feeling of ennui, of nostalgia, like pathologic nostalgia and just agedness that sits in, sits in the gut. But, so pop quiz, just to start, to start, kind of start the juices flowing. You know, can I show of hands maybe, who knows how to extract data, maybe raw data from your electronic health records? Fantastic, so for those of you at home, I'm supposed to kind of translate the visual into non-visual, I think that we had four hands out of the couple of thousand people who are here in the audience. I'm really good at counting. So what do you, okay, what do you know about your clinical cohort? Do you, how much, how many of you know, for instance, kind of summary statistics about the socio-demographics, for instance, race, age or economics of those that your clinic sees or takes care of? Fewer hands, no hands. Or how about even diagnostic breakdown? You know, does anybody know the percentage of anxiety disorders, psychosis, you know, whatnot of the particular cohort that they take care of? One show of hands, there we go, very good. So it's a low percentage yield again. Do you know where your patients live? Does anybody know about approximately clusters or ranges or catchment areas that are defined by your patient cohort? Some of those are already defined by your treatment agreements. You know, you take care of a particular CBOC or whatever it is, or even the, or summary statistics about the socio-demographics of those neighborhoods where your clients live. Show of hands on that. I got a maybe up here in the front, so I've got fewer hands there. All right, so some of you right now, the ones that haven't left the door are asking, why me, who is he talking to, why me? Doesn't he know who the audience is here? And this is where I start to make the hard sell. This is where I really lean in and try to get you to buy the car. This is my case for clinicians as health data analysts. First, we have, and first and most importantly, we bring to the table something irreplaceable, which is clinical perspective. We are working there in the trenches. We know better than anybody firsthand what factors, what aspects, what externalities and internalities are impacting our patients and our patients' outcomes better than anybody else. And then contrasting that, say to someone, if we were to refer out to a statistician or some kind of PhD or somebody who does this kind of work for us, who would come with these a priori assumptions, these abstractions or assumptions about things, versus our a posteriori or information or hypotheses that are generated by our lived empirical evidence with working with our clients. And that's a huge distinction. And one of the biggest reasons I think that I'm pleading at this point for us to kind of roll up our sleeves and get a little bit more active in these kind of assessments. Also, data make fertile soil. Getting your hands dirty, exploratory data analysis is something that is irreplaceable and completely unpredictable. Again, we can come, we can make a lot of assumptions about the data we can extract from our EHR or we can collect for our own purposes. But until you're there collecting it raw, cleaning it yourself, looking at trends early on, in the middle and late into the process, you really have that kind of intimacy with your own data that you start to generate your own, all new different perspectives on it, all sorts of different hypotheses that you can then support or investigate with the data. And that is something that, again, completely unexpected and something that would be, you would not have access to unless you were willing to get and roll up your sleeves and get into it. Then also, who else? Depending on what setting where you're working, academic, private, community, otherwise, there might not be anybody else available to you to do this work and it might be up to you if motivated to actually get in there and do it. If there is, if there are other people available, oftentimes it's limited by time or money to support them. And when we're talking about health data and particularly protected health information, there's the added complication of HIPAA when it comes to sharing data, especially with people who are external to your organization and the complications that arise from that. There's an adage, and actually, there's an interesting online debate about where this comes from that I was not aware of, but I encourage you all to read up on, that if you wanna go fast, you go alone. If you wanna go far, you go together. Well, I've adapted it slightly to where if you wanna do data analytics slowly, you do it alone. If you actually wanna complete the projects you want, you actually do it together. And one aspect of that, I mean, is by identifying broadly resources available to you within your organization, be it the data itself, software resources, either free or that sometimes you have to pay for, clinical acumen, which we are the ones oftentimes bringing that to the table in this case, technical skills that sometimes are the hardest ones to find, and again, we might be the ones rolling up our sleeves and doing that. And I think just as important is identifying within your institution champions, supporters, people who see the benefit of this work, who give you the benefit of the doubt and maybe a little looseness to explore it and to present it, and also will give you feedback about the relevancy and applicability of the work. And just as important to that dialogue are critics. And I'm a big fan of a productive, critical dialogue. And those who have skepticism about the work, who have skepticism about your approach and are willing to listen to it and give that kind of feedback are incredibly valuable, especially as you're developing these projects de novo and rolling them out through large data sources and oftentimes through complicated organizational systems. So those who draw diagrams like this like to draw diagrams like this. And so people who like to describe data analysis break it down into this five-step process. I particularly like this representation because of its circularity, which really emphasizes the iterative nature of analytics work. You start off with your best guess. You find as to why you might need these kind of data analysis approaches. You begin collecting your data or sometimes it's been collected passively as part of your clinical work for months or years. Cleaning, analyzing, interpreting the data. And then starting back over, checking in to what has that done? Has it answered your questions? Has it clarified or muddied these questions? And do we need to start the process over again with new insights? Another reason I like this circular representation, not just because of the iterative aspect of it, because it also reminds me of, I think an important parallel, which is the scientific method itself. And specifically that this type of work, especially when you get into large data sets of unfiltered data, needs to be driven by hypothesis-driven analysis. It's too easy to go in and bend and skew and check and recheck when you have big data sets. And it's all the more important both for validity and for the statistical power of your analyses to maintain a hypothesis-driven approach. If we unfold the first diagram and look at the five steps again within the steps of data analysis, there are different software packages, the software resources that you're gonna be using at different phases of the analysis. We see electronic health record is gonna be integral to collecting, cleaning, preliminary cleaning of data, and then also applying or implementing any interventions that are predicated upon the insights from the analysis. Whereas the analytic software, and we'll talk a little bit more about both, or options for both, will be more in the last, you know, stage three, four, and five. You're cleaning, analyzing, and applying the analytics for their core. These are, as of April 2023, these are the top 10 EHRs, which I think is an interesting table in itself, in so much as that A tells a story of how two or three are really starting to corner the market, which is not really that surprising given kind of how the economics of it work. The second point being how many are still, still existing even in small market shares, and still kind of fighting the fight against some of the behemoths. But also just as interesting that number five is none. You know, there's still the fifth most common EHR at this point is our clinics and hospitals who have bucked the trend or the mandate, and actually don't have an EHR. And Dr. Jagade and I actually work at a hospital that does not have a full EHR, and so we can commiserate with those folks. The fourth point about this table I think is important to know, whereas the big systems like Cerner or Epic that have a lot of uptake and a lot of use nationally are gonna have a lot of, there's gonna be a broader support network for how to do some of these analytics. And in fact, both of those packages have some data cleaning, data extraction, and analytic tools built into them, which is powerful. But if you're not working with one of those, you'll have to kind of familiarize yourself with your options and how to extract raw data from the EHR. And that's just gonna be part of the process you have to steel yourself for. As far as the analytic software packages to use, they are myriad as well. They're, you know, I put good old Excel up there just so it wouldn't be totally discounted. If one thing it has going for it, it's on almost everybody's desktop already. Some of us have some, you know, at least I've opened up an Excel spreadsheet before and seen the cells and can at least navigate it. If you get past that, there is quite a bit of analytic power you can do with some of the formula that you can write for it. There's a lot of support for that kind of work online, Googling and the Stack Overflow stuff. But ultimately, you'll hit a point where Excel is no longer the tool that you're able to use. You'll need to buck up and try one of the big boys. I have a very not slight, actually a very large preference for the statistical programming language of R. I do almost all my work in it across everything from geostat stuff that we are gonna present on today up to machine learning and artificial intelligence work. It's fantastic for that. It's absolutely free. It runs on just about anything. It has a little bit of a learning curve, but it has very broad support online. And I have not taken a single R course in my life. I've taught everything, taught myself everything about it via just online Googling and bulletin board systems and support networks of how to do these things and online tutorials, which we'll discuss in a little bit. So, and I have no computer science background. I took one course in Java in 2001. So really, it was just, it was all through online support. There's also options like Python, the more commercial options like Tableau, SAS, SPSS. Some of these you might already have experience with. You might have done research and just do basic statistical stuff. Or you might have been turned on to it by your PhD friends who, again, kind of get indoctrinated into these particular commercial packages when they're doing their training. And they all have their merits. If you're starting from scratch and don't have a reason to choose a particular one over the other one, I strongly suggest R or Python. They'll be around for a very long time. They have a very broad support, and they're both free as in, free, they're free. So this is the point. We've completed our introduction. This is the point where we're gonna go through three cases. The commonalities of these three cases is they were all, they're all done on South Central Connecticut data, either Connecticut Mental Health Center data or Winchester Health Pulmonary Clinic, a pulmonary clinic affiliated with Yale Medical School. They're all 100% done by physicians who had observed clinical questions in their practice and sought answers to those questions and did not have any external resources to explore those answers, or even when they sought those external resources, were not finding them, and decided to, well, damn it, I'm gonna do it myself. And so reached out to friends, got a team together, and proceeded with these analyses themselves. The first case, I think, is Dr. Jagadiz, and I'll let him introduce himself. Thank you, Dr. Mathis, and thank you all for coming. We appreciate you taking our time to do this with us today. So I'm gonna be talking about Neighborhood Mental Health Service Utilization and how we found answers from clinically-derived geoanalytic approaches. My name is Oluwole Jagadiz, and I'm a Assistant Professor of Psychiatry and Addiction Psychiatry at Yale and at Connecticut Mental Health Center in New Haven, Connecticut. I have no conflicts of interest to disclose. So in my role as a community psychiatrist, I have had to have these competing roles, one of which is as a community clinician, as an educator, as a researcher, and also as an administrator. I'm the Director for the MAT Clinic at the CMHC, and in my role, I have to teach medical students, I have medical center rotating, I have residents, and also addiction psychiatry fellows. I say this because this has been the basis for me generating some of my questions, and we'll talk about some of the questions today. First of all, I started as an addiction psychiatry fellow at Yale, and one of the questions I asked myself was how, I was working at a foundation, a methadone clinic, and the first thing I saw when I compared just the people I was seeing in my clinic, I was wondering how that differed from where I was from in Brooklyn, New York. Over at Brooklyn, the methadone clinics were really like African Americans, and when I came to Connecticut, it was all Caucasian, so white people, and I was just wondering, me and my colleagues, why is it that it looked like our foundation methadone program sort of defied national demographics? So this was going on in my mind, and once I finished my fellowship, I started as an assistant professor at Yale, and I moved over to the CMHC, and then I had similar questions, and the questions being, how does the CMHC, how does my clinic reflect the New Haven community? Are we really at a community clinic that we claim to be? How are we doing the work we're supposed to be doing, reaching out to the community? So this was how my questions sort of developed, and in identifying my questions, I wanted to identify the neighborhoods in the community, neighborhood characteristics, locations, the racial composition of the community, and then just the distribution of students, the distribution of patients, and also, and by proxy, the distribution of severe mental illness in the community. In addition to that, I was also interested in social determinants of health. In the New Haven community, how, what are the community-level indicators? When patients come to our clinic, we're looking at an individual level, but now, at this point, I was more interested in what are the community-level health indicators, and also the impact of these indicators on health outcomes. So I went through my decisional algorithm, which I've already told you about, my clinical question, and the next step was to think about information sources. Where are some of the places or who to call on or what formats would my questions be answered? And I looked at two sources, one of which is DMAS, that's the Department of Mental Health and Addiction Services, which is, I'll talk about that in a little bit, but this is where we worked, me and Stan. And also, we also looked at the level of the community, were there free resources, where could we get data about the community, and then we looked at Data Heaven, which is like an organization that keeps this kind of data using the ACS, the Community Survey. When we looked, combined these resources, we were able to get clinically applicable results and develop hypothesis for further studies. So I'll just talk about the Clinical Mental Health Center real quick. The CMHC is an enduring collaboration between the Connecticut State DMAS and also the Department of Psychiatry at the Yale School of Medicine. The CMHC is an educational site training for psychiatry, primary care, psychology, nursing, social work, and child placement students, and the center provides recovery-oriented community mental health services for over 4,000 patients each year. Data Heaven is also, like I was talking about earlier, they are an organization that keep this kind of data. So the data is free, it's available online, all we needed to do was just to access them. So this is one of the sources that we're able to look for when we're trying to answer the questions, like the questions that I generated earlier. Now here are some of the variables that we looked at. One of them was the neighborhood population, the percentage by race and ethnicity, poverty rate, life expectancy, annual primary care checkup rate, annual dental care, percentage uninsured, smoking rate, coronary artery disease, asthma rate, diabetes rate, hypertension rate, and then we also generated, using some geocoding, car travel to our clinic, and then bus travel to our clinic as well from all the neighborhoods in Connecticut, in New Haven. So moving quickly to some of the things we found, using some of those combination of data that we found, the free data in the community, accessible to everyone, this was what we were able to find as far as the number, the clients that we had at CMHC. Like those of them that were using CMHC and living in New Haven, we were able to find the number of people Like those of them that were using CMHC and living in New Haven, in the New Haven community. We broke this down by the communities and this was just using the map of New Haven, how we saw the percentage of adult population who were our clients. So just looking at this, you can tell exactly where the majority of our clients were coming from. In addition to that, we also found the racial composition. Any questions? Okay, the racial composition of the neighborhoods. I want to point your attention here to New Hallville, which is like stuck in the middle. You would see we had the largest composition of African Americans in that community. And if you also look over at East Rock and East Shore, you see the largest community of white people. And if you look at Fair Haven County, you see we have the largest composition of Latinx people. So this shows you just looking at this, we were able to tell the racial composition by community of our clients at the CMHC Mental Health Center. And using the data we got from Data Haven, we looked at these variables and we were able to see the percentage of CMHC users, even their life expectancy. The medical checkup, dental visit, the number of those who had health insurance. And then we develop an illness index based on the combination of medical concerns. And we put this over to the, we made an index of our one and we could find how sick were our population. And then if you look at the last two columns, we also found the car travel, you know, from each community to our clinics. And you would see already that the bus travel went as far as high as 40 minutes to get to the clinic. And this was very important. So we were able to understand why, how, you know, how difficult it was for our clients, our patients to get to the clinics. And then if you look at the ones I have in red, yeah, so Fair Haven, like I said, we had the highest number of Latinx people, Newhall of African-American people. And then if you look at these indices, just, you could tell, like, why is it that some of these communities just have much more, why is it so different in this community? So this, just looking at this, we're able to ask ourselves some searing questions about the communities that we serve. Okay, so this is the chart for the time travel. Like I talked about earlier, we also broke this down by the communities. And then we could also see that we prove something called the Javits Law, which is an old 19th century law about how the distance from your clinic, the distance from the communities to the clinic determine the utilization of your clinics. And we're able to prove this in New Haven, which is like a mid-sized city. This law has only been proven in larger cities, but we're able to show in Connecticut that even the distance from the patient's homes, from the neighborhoods to the clinic was also very, very important. And I think Peter Kan's data also shows that, and you get that next presentation. Okay, so these are neighborhood correlates that we also got from our data. We found that very strong multivariable correlates for being black or African American, being poor or shorter life expectancy. And this is not really surprising to us. We're psychiatrists, so we know that people with severe mental illness also have comorbid medical conditions. So this wasn't very surprising to us. We found a negative correlation there. We found a negative correlation also with annual dental checkup, which is really a proxy for many things, really, including sociodemographic status, including poor income or low income. And we found that a negative correlation, a very strong negative correlation for annual dental checkup. If you also look down the list, coronary artery disease, asthma, diabetes, blood pressure, all were very highly strongly correlated to using RCMHC at the neighborhood level. In summary, neighborhoods have peculiar circumstances, peculiar compositions and resources. And like Astan said, common and readily available data sources and methods can help us clinicians better understand the needs of our communities, the communities where we actually serve, where our clinics are located. And our results show that an opportunity, so someone asked me, so what is the end of all this? What is the point? And that is shown in this third bullet. Like our results show there was an opportunity for us to engage and enhance dialogue with the community and with our community leaders. I acknowledge Stanford doing the hard work with this, with the data geocoding, and Bob Rosenhager, who's a professor of psychiatry and epidemiology. And you can scan that little code if you wanted to ask more questions about data and how we went about sourcing our data, or any of the questions. Thank you very much. All right. Yes, if you follow that code, that scan you'll see a very handsome picture of Dr. Jagadee on his website. So I suggest that if anybody's interested, or collects those kind of things. Next is, oops, I had the wrong one. Go ahead. Next is case number two. This is, again, this is from Dr. Khan, who was unable to join us. He'll do his own introductions. And I think it's self-explanatory, but I was intimately involved in this project and can answer any questions about it. And you can ask questions to Dr. Jagadee now, or we can wait till the end. I think we'll have plenty of time for questions and collaboration during our shouting match at the end of the presentation. Let's see here. Hi, everyone. Thanks so much for having me. I'm Peter Khan, one of the pulmonary critical care fellows at Yale. Today, I'd like to present my portion of the session entitled impact of a medical clinic relocation on travel time. My portion of today's session thinks about the question of what happens when our pulmonary clinic, previously in New Haven, established a new location in North Haven, Connecticut. And we, as a result of this move, sought to understand the impact on our patients and on our clinic. So by way of brief background, and as a refresher as to what we do in pulmonary medicine, we treat those with a wide variety of conditions involving the respiratory system. So we treat those with airway disease, with neuromuscular disorders that relate to the respiratory system, with infections that relate to the respiratory system, cancer, pleural disease, and various other different types of lung diseases that may arise in a patient's lifetime. Thinking specifically though about our clinic and the conditions that we treat, I provided a brief list here of those conditions that we commonly see to help us provide a better sense and context for the type of patients that we are seeing. So just briefly going through the list together then, we see patients who have cystic fibrosis, we see patients across the entire spectrum of COPD, ranging from those who are newly diagnosed, including those who may require a transplant at some point in their journey, all the way to end stage COPD who really require palliative support in addition to pulmonary support as well. We also see those who have various forms of chest infection. We have a comprehensive pulmonary medicine program. We have a robust post COVID program for those who have symptoms after prior COVID infections. We have a very robust pulmonary hypertension program, which sees those with high pressures in the blood vessels of the lung and in the heart as well. We see those who have various sleep disorders and they are in need of treatment. We have a robust thoracic oncology program, tuberculosis, a center for airway disease for those who have diagnoses of asthma and related conditions and a very large interstitial lung disease program, which for those who may not be readily familiar with interstitial lung disease is a condition that impacts the lungs and can be related to various different occupational lung diseases, autoimmune diseases, and other forms of conditions that can lead to scarring of the lung resulting in breathlessness and other related challenges. And so now really moving to the heart of the matter and discussing the move that I had mentioned at the outset of the presentation. So our clinic sees on average on two and a half thousand patients per year. And in 2021, our clinic moved from New Haven to a new location in North Haven, approximately nine miles away. While this doesn't seem like a particularly significant drive and indeed in a private car, one could get there from the old location to the new location in approximately 11 minutes. And for those who may not have access to private cars, we sought to understand the impact of this move on their access to clinic and the impact that that might therefore have on their care. So I'd like to share with you now the main finding of our paper, which is available and I'll put up on a PowerPoint slide shortly. But our main findings were that if you, let's start from the bottom image and then move to the top image. Thinking about those who access our clinic via car, in the end of the day, those who were in the Northern portion of the state of Connecticut had a shorter commute. And those who were in the Southern portion of the state of Connecticut tended to have a longer commute. But at the end of the day, when averaging out these two distances, we found that there was no significant net impact on a population level in accessing our clinic. And as I mentioned, that actually makes quite good sense if the first location is in one more Southern location and the second location is a more Northern location. Those who are in more Northern locations have easy and ready access to the clinic, whereas those in the Southern location have now additional time added when considering their trip to the clinic. But again, in a private car, this doesn't necessarily add significant distance when we look at a whole of a population. And for each individual person, it may add only short amounts of time. Considering now the upper portion of the graphic, the one titled Travel Time Changes Public Transit, what we found was the ultimate move from the more Southern location represented in this panel by a triangle to the more Northern location in North Haven represented by a square did result in more significant access challenges when looking at the entire population that we serve. And that was primarily because of the locations for pickup and drop off of the public transit routes and the ways that our patients were distributed throughout the state. So I'd like to point out that there's a significant clustering of patients in the Southern portion of the state near to the old New Haven location that have significant travel time and distance by a public transit to North Haven, whereas those who are in the more Northern parts of the state and are represented by the green dots are now along more favorable transportation routes and have actually had a significant decrease in their ability and time to make it to the clinic. And so I think fundamentally the approach that we demonstrate in our paper here thinking about one particular clinic is that by using these open source methods and by asking these questions, we as a team were able to answer the question of what does the impact of moving a clinic from one location to another have on our patient population. If this work in particular was of interest, I've included a QR code here if you'd like to use it to download our paper. And I'm also happy to share the paper directly with anyone who might be interested. Thanks so much once again for the opportunity to share our work with you today. I hope very much that this will be helpful to those who are in attendance. Please, please feel free to reach out to me. My email is below and I've also included a LinkedIn QR code if that's easier or more comfortable for folks. Very much hoping that this can be the beginning of an ongoing conversation on this topic. And I look forward to hearing from you. Thanks once again. All right, and thank you, Peter. I had no, I didn't know Peter from Adam until he reached out to a GIS listserv I was on at the university that I followed for other reasons. And he actually had a very interesting question. So a lot of my patients, I work at CMHC, I'm an ACT team medical director. A lot of my patients, very low socioeconomic strata use the kind of free clinics and primary care clinics at Yale. And Yale had just switched their PCP clinics as well, their primary care clinics as well. And I was very interested about the impact of my patients who tended to live centrally as Peter's did with this Winchester Clinic. And so that was the natus of our collaboration there. And it's led into a couple of different projects, but an excellent example about how just kind of shared interests, resources, kind of finding each other in the dark and having these projects developed, again, from clinical understanding and clinical questions. Oh, don't wanna play that again. Sorry about that. Let's see here. Third case, third and final case. And a little bit truncated, it's a smaller one because I'm gonna present it. Slightly different one, a little bit different in its tact, I think, but still some similarities. This is state of Connecticut, we've seen a couple of times now. These are the 169 towns of Connecticut. Towns are like small counties. If you're not familiar with New England, they kind of don't use counties so much for governance. They use these smaller subdivisions called towns. Vinod Shihari, one of my mentors, a great first episode psychiatrist and researcher at CMHC, founded this specialized treatment early on psychosis program, the STEP program, which is doing fantastic work in South Central Connecticut around New Haven and the surrounding towns. The STEP program specifically takes, its aim is to capture and care for all first episode psychosis cases that happen in that 10 town catchment area. And in doing so, it's very interested in that kind of population health approach, is very interested in finding, estimating, identifying ways of where yield is good for recruitment, where yield is poor, where improvement can still be made, but did not have a lot of tools established to do that in a quantitative manner. And so while I was working for a short time on this clinical service, Vinod shared with me some of the data around where our clients were, and it was actually one of my first data analytics projects that was driven by clinical investigation, and now it's about four and a half years old, five years old. So what was the approach? So this strange little doohickey, as you see, is actually the 10 town catchment area for STEP, but some might call it a heat map. Technically, this is called an intensity function, but basically, it's a way of representing where cases are and where they aren't in some way that describes very, very specifically the frequency or intensity of those cases happening in that particular location. So you see on the units of your x-axis here, that's actually the number of cases that occur in a one kilometer squared subdivision of that space, and these are all the, at that time, about 189 cases that had been enrolled in the STEP program. So these are the 189 first episode psychosis cases that had enrolled in the STEP program within our 10 town catchment area, and an astute observer might ask, well, how does that compare to where people live in that area, right? I mean, the population isn't equally distributed in the United States. There might be a river or a desert or a mountain or something weird, or a Walmart or something that takes up a lot of space. Interstates take up a lot of space, and you don't live there. So we wanted to somehow correct for an underlying population intensity or distribution, and which is the next little doohickey on this representation. This is an intensity function of the controls, in this case, not just the population of that area per square mile, but actually the age-corrected population. So we got, again, from the American Community Survey, which is, I think this is the second or third time it's been mentioned during this talk, age-bend populations at very small aerial units of this particular catchment area. So we were able to say, for instance, this is how many people between the ages of 16 and 35 in this case, which happens to be the enrollment age criteria for the STEP program and also the age window that's highest yield for first episode psychosis incidents. What is the intensity function for those? And in comparing those two, well, first of all, I guess the first, it's worth noting that sure enough, there's a lot of kind of blur your eyes similarity, and there's a lot of intensity right there in the middle, and although there seems to be some difference to the shape of that doohickey in the middle, and some areas of maybe, I don't know how well this projects, but, and I can't, I don't know if I can draw, there might be an area up here around that's a little bit higher intensity than this otherwise dark area here on the case enrollment map. Through some pretty straightforward, basically just kind of a surface comparison, you can actually generate a likelihood surface, or a relative risk surface between these two, and what it does is identifies you areas of statistical significance, where you have a likelihood of it being a random association less than 5%, that these areas in red are having, are presenting fewer cases to our STEP program than would be suggested by the underlying population intensity of the middle graph. And I thought this was a pretty interesting observation. What do we do with this kind of, so basically, how do you interpret this? You interpret it as basically underrepresented areas of our catchment area, where there are a lot of folks of our age group that we're interested in, but we're not getting a lot of first episode psychosis cases. Now it could be that those cases are not happening there, or it could be that we're not capturing there, and we should bolster our recruitment efforts in that area. And that recruitment and community outreaches both to school settings, to religious settings, and to primary care providers was a big aspect of the STEP program to again, capture as many of those cases as possible. So the next step in this project was to take that surface where we've identified some areas of underrepresented, or underrecruited areas for the STEP program, and cross-reference it with, say, primary care providers in the area. So this is a data set, all these little dots, the gray ones and the green ones together, are primary care providers in the 10 Crown catchment area. This is data that was scraped for free from the national provider, the MPI database. Any of us could do that, and it's kind of surprising how much information is available, including a lot of physicians that have home addresses in there. These are not their home addresses, these are professional addresses. But once you have those locations, and it's also separable by taxonomy, so I can sub-select for specifically for primary care providers, pediatricians, whatever, psychiatrists. So once you've sub-selected for primary care providers, you have their clinic locations, then you can subset geographically for providers whose clinic locations are in these underrepresented areas, and these might be, rationally, people that you'd wanna reach out for more recruitment in trying to catch some of those cases that presumably are going missing for the STEP program. Okay, so those are our three practical cases. Again, the commonalities there, hypotheses or questions generated by our clinical work that were going unanswered as we decried them into the wind. We decided to try them ourselves, and through a community effort or a team effort, found software resources, data resources, internal resources to organization, when they weren't there, we went out and we bolstered our own technical skills to wrap out the needed requirements for the project, and got them done, and all of them yielded interesting stuff. All of them, I guess, almost all of them have been published, but all of them have yielded, yield, have wielded, have brought around very interesting and meaningful answers that we can then fold back into our clinical work and start addressing them very practically. So, this would be a good time for any, if anybody had any questions, technical or otherwise, about the ones we've gone over, because we're gonna jump into kind of a little bit of a practical session for the next 10 or 15 minutes before that, or after that. Oh, yes? How long have you been thinking about these, like, ideas before you actually started to do your research on them? Great question, the other question was how long we'd been thinking about the projects before we actually started doing the research on them, and I'm just repeating it for the viewers at home. I think it varies per project. The project that Peter and I worked on is something that I thought about a month beforehand, and then he reached out soon after that, and we were working on it pretty quickly. Similarly with the STEP project that I did pretty much independently with the data help from Vinod, and the kind of the championship help from Vinod, Dr. Srihari, it was a pretty quick process, and I'll let Dr. Jaggi to kind of comment on his project. Yeah, so thanks for the question. My project was basically started as soon as I started fellowship, which was, you know, like a jarring realization for me from where I was before, and from what the data showed as far as methadone, people on methadone. New Haven was a little bit different, and where I was working at the time was supposed to be the largest, not supposed to be, was the largest methadone program in the state, if not at least in New Haven for sure. And so that actually started me thinking about why. Was it that we're not connected with the community? Was it that there were no other people with opioid use disorders? What was going on? So, and then as soon as I had the opportunity, I met Stan, and I met Bob Rosenheck, you know, it just kicked off from there. Yes. Oh, sorry. Doctor? So was this something in your training or in your interest that prepared you for this kind of thinking? I can answer for myself, absolutely not. I had no, I think I mentioned, I had one Java programming course in 2001. I did have some protected time during my fellowship in public psychiatry to pursue this a little bit more formally, and audited, for instance, kind of a data analysis class, or a spatial data analysis class at the graduate school. And, but most of it was really just driven by wanting to answer these questions, realizing that very few folks were doing this kind of work in any way that I had access to, and quenching my curiosity by doing it myself. Hi. I'm Connor Pollett. I'm a fourth year resident at SUNY Downstate, and a rising clinical informatics fellow at NYU. Oh, nice. So this is near and dear to my heart, and I appreciate that you're doing this work, and that it's being represented in the field of psychiatry, because I feel like it's traditionally come from emergency medicine, from pathology, from more public health-oriented specialties. And so my question is, how do we integrate this level of training into our education? When I was a medical student, we had just started courses where students could learn Python, and R, and SAS, and we had Tableau licenses, so it was very expensive, very nice to have that. And then I got to residency, and then there's nothing. Well, you have to look for it. But I feel like psychiatry's very late to the game with this. Other specialties have had this mindset for a little bit longer than us. And so how do we catch up and integrate this into the way that we practice, and the way that we design and run our systems? Can I clarify your question? Is it about how to train toward this direction, or how to do it after training? I thought it was a training question. A training question. Yeah, yeah, because that one, actually, I'm a little bit equipped to answer. The other one is a little bit too big for me. I'll try my best. So one model that I think works really well is actually one that we use. So I'm also the Assistant Program Director for the Public Psychiatry Fellowship that we have at Yale, also a graduate of the fellowship. And the model we have there is a capstone project-based. So at the end of their year of fellowship, we have both fourth-year residents as well as fifth-year instructor-level folks who are coming in for these fellowships, so kind of a little bit of a mix there. But the expectation is at the end of that year, folks that want to concentrate in public psychiatry work, that they have a project that they present, both internally and externally, and have some kind of academic product. About half of those projects over the last five years have involved some kind of analytics work that I've been, in my role as the Associate Program Director, is kind of a research mentor for. And so oftentimes, that is helping folks identify a clinical question, and we encourage it to develop organically from their work, and then devise a structure of a program to answer that question meaningfully within the confines of the academic year. For some folks, that's just continuing the work they're already doing. Some folks, it's much more elaborate. It includes reaching out and getting additional training, meeting with me more regularly to kind of help with the go-through. Sometimes I'm working coding with them. Sometimes I'm just advising from an analytics standpoint or a data standpoint. But the key ingredient, I think, is the support that having kind of an identified person with, in my case, a small amount of supported effort to be in that role, and I can see that being rolled out residency-wide as well as beyond just a fellowship, but also having the expectation of that project being done within the year. Because these are kind of topics, because of their ancillarity, I think can be put off to the point that they just never get done. And also, some of these questions are so universal that the urgency isn't as pressing as we'd hope as well. So that's one model that I think is quite successful, at least in our fellowship model. Yeah, and if I can add to that is, as far as your faculty, I was also part of the Public Psychiatry Fellowship. And I think with a background in public health as well, that helped. And having sort of a cohort of people to talk to who were already doing this work and just having that sense that whatever you see, your clinic is not by happenstance. It's something that you needed to look at, and having people to help you through it really helped. Hi, quick question. I was curious, do you guys have ACE data on the population you studied in Connecticut that you described, Dr. Jagadeh? And also, what are some examples of ways you guys are planning to sort of apply this work in terms of clinical work, in terms of your future directions? Yeah, thank you for that question. Can I clarify your first question again? Like, did you guys collect data on ACEs in that population you studied? Yes and no. Some of the questions we have can, in fact, be thought of as in the ACEs, such as childhood poverty, for example, childhood trauma. Some of the limitation of this data is it's already collected. So we don't have any sense in manipulation or change in the data. It's already there. So as far as ACEs, this is something that I can think of right now. But more broadly, we have data that you could be proxy for social determinants of health in general. Another thing is we have the data only on 18 and older population. So that's the first question. The second question is how we've applied it. So for this particular project, it's been how we've engaged with the community. We've actually met with the community leaders with these results at hand. And to show them this is the data and this is why we need to encourage some things in the community. Because we had hard facts and hard data. And we found that they were more able to better appreciate the problems, the clinical problems in the community. And also in our university, being one of the things we talk about in the university is being anti-racist. We also presented this data to the anti-racism community, sort of encouraging more participation in the community that we serve. To tell them we only had about, I think 1.5 or maybe 1.2% of the community, of the population, attending our clinics. And we are like the center community clinic in New Haven to have only 1.6%, that is probably not the best. So we have to find ways to encourage people to use our clinic more and to improve the health indices of our community, like the dental, high blood pressure, other health indices, and also improve the travel distance. To the community, to the clinic. This might be beyond the scope of the presentation, actually, but your question really spurred something very interesting. Which is, you mentioned that you met with the community. And one of the things that we know is data requires storytelling, which I haven't heard so far, actually. I'm actually curious how you were able to present the data to the stakeholders. And my point is, again, it's oftentimes, at least that's what is being said, it's a story that allows our brains to really digest data. So I'm just curious. It's a good question, but I'll also piggyback to what I said earlier. One of the community, meeting with the community would include the Yale community, for example, which is a very intimate part of that community for several decades, several hundreds, whatever, years. I don't know how long. So if we present ourselves as anti-racist, for example, and our community clinic, our community hospital being a part of that community, then it is important for us to know what is the data of our neighborhoods. So we presented this to the academic community to tell them, this is the data. What can you do to improve this? And there are no simple answers, but then the onus is on us to find ways to improve what we have as our results here. I don't know if that answers your question. Well, it answered the first part, actually, that you actually had an audience that was interested, actually, when you were presenting it to the Yale folks. But I'm curious, actually, the other people, actually, it's not very easy to just present this information and start to talk about data, which was my point, actually, that that really requires storytelling. The other question, actually, this is very curiosity, is the relationship between seeing the dentist, dental care, and in a methadone program, it's extremely important because of the stereotype about methadone rotting people's teeth, actually. So what you're really seeing is that there's not a lot of access to dental care in folks who access methadone programs, actually. So that was a very curious and very practical finding, actually, which is whether or not you're able to have these people see dentists, actually. So that's a very- We'll start with the dental care, and thank you for the perspective, very, very interesting. So the dental care has been a conundrum, and it is well-known as, in fact, a real proxy for socioeconomic status. But I was presenting, sort of telling a story, to use your phrase, of how I got into some of this work. So these are very, two disconnected conversations. The one with the methadone clinic was how I started my thought. It was not related to the community mental health center. So the data we present is actually for the community mental health center, and not related to the methadone clinic. But it's a curious, I guess, relationship, and a very interesting one. Also, I'll just mention it again, that for emphasis here is the fact that we identified community leaders to meet with, and present to them this data. And the data is basically what you do with it. To present to them, this is the health status of our community. This is what the data says. And it's an ongoing conversation, by the way. Ongoing conversation and ongoing collaboration with them to improve how we interact with the community, for them to know that this is part of, this is theirs. The community mental health center is for them to get better. We are not there as Yale people looking down on them. No, we are collaborators on the same spectrum. I have a question in the opposite direction. Like, I'm not a physician, I'm a mathematician. For me, the road to health has been very strange and casual. However, okay, first about, I have a comment about the storytelling thing. There are many platforms, like Basecase, that can be used to displace this data, like without code, like the problem with PowerPoint is that you cannot create a good storytelling with it. And also, like R, using code, Basecase is very good in terms of storytelling with data. And my question was about, how can other disciplines approach health to try to create more multidisciplinary efforts to try to show this data and to try to have a good impact on the public health? Like, for me, it was totally a very strange path from mathematics to economics to health economics, and now I'm here. But how do you think that other disciplines can approach physicians to try to create this kind of collaborative efforts? Just to clarify the question, you're asking about kind of how other disciplines, as in economics and, I mean, other places in the, like academia, how they would collaborate more facilely with physicians? Oh, ask. I think there's not a lot of overhead or barrier to these kind of collaborations. You might have to ask a few times and be ready to hear a few no's, but I think you'd be surprised. Well, and I, of course, have a bias in my perspective from an academic setting, but I think there are a lot of folks who are anxious to kind of make connections across disciplines, across areas of focus, and get this kind of work, get these kind of studies done. And in the actual clinical practice, to try to approach practitioners so they can start to gather data and to try to study it from a different, because in the academia, it's very easy to use. Sure, sure, yeah, that's the bias of mine. I don't know, this is probably the right room to ask. We might be the wrong people to ask, but I think this is the right room to ask. There's a guy right here who wants to answer this question for you. If I could just share my thoughts. I guess I would, you know, the ask part is good, but also to tell them what you're. Thank you. it won't have to have them identify the people that you want to collaborate with. All right, yeah. Fantastic. I think the hard work really is asking the right questions. I think that's really the hard work, and that is the work for the clinician. That's the work for you, that's the work for me. It's basically looking at our population and asking the right questions. And of course, if you need help, we'll always need help, and you have to collaborate. We have to be ready for constructive criticism, to go back to your phrase, and be willing to rephrase the question, reframe the question, and eventually get a clinically meaningful answer from these methods. Maybe just one more question, then we'll finish. We're only about halfway done the presentations. Just a quick question. Sure. Do you guys have a list of the databases that you've used to get these data? We do. And do you have any tips, particularly to something like Data Haven, but specific to wherever I am? Sure. Just you wait. Yes. It's coming right up. Okay, perfect. Okay, great question. Thank you very much. Okay, so these, first of all, I don't want to discourage at all interaction. That's fantastic. All those questions were wonderful. I really appreciate it. Hopefully we'll save some of them for the end as well. Quickly, though, I wanted, there's one kind of key aspect to all this. I know, I promised this wasn't a coding class or a stats class, but there's this kind of key conversion that really we needed to talk honestly about and think about some of the thorns and challenges to this particular thing. It's called geocoding. So what geocoding is, show of hands, anybody familiar with this term? That's pretty good. Good for y'all, good for y'all. Basically, very simple. It's converting geographical information to a very specific location, oftentimes represented as a longitude and latitude. So that's just the Cartesian coordinates of where you are on the globe. And most conventionally, it's an address. So we've all used like Google Maps or something like that to look up where Neiman Marcus is because they have a sale on socks or something. Type in an address, it gives you a longitude and latitude. Or maybe it doesn't. Maybe it gives you driving directions or something like that. But inherently, behind the scenes, it's doing all this kind of computation and this analysis based upon longitude and latitude. Once you have a location, you can do so much stuff there. So we're used to this. Like for instance, if we type in 34 Park Street, New Haven, Connecticut, which is where CMHC is located, this kind of thing comes in, boom, there we are. We have a longitude and latitude. And now we know where we are. We've converted kind of human speak into a middle speak, both machines, math, and humans understand to a certain extent if you kind of know where you are on the globe. Now, how do you do that? You use something called a geocoding service. And so for instance, here is an example of using the United States Census Bureau's geocoding service where you feed in a, in this case, an Excel spreadsheet or a common separated value sets of addresses. These are not real addresses. These are AI generated addresses actually just to reassure you. We'll talk about the danger of sharing addresses in just a second. And then you convert it, you ask it to evaluate and convert it to longitude and latitude. And it does a decent job. We see that for about five of them, it was able to convert it. Now again, the reason it was able to, it failed on the other ones is because they're actually made up addresses. And so it did the best it could and I support its value and effort. So but it's taken that address in a very easy to evaluate form and it spit out what we really want, which is a longitude and latitude. Why is that so important? Because at that point, once you have yourself located or the address of a patient, you know, his home address located, that's when you can start to cross reference it with all sorts of stuff. So for instance, a lot of the products that we use, a lot of the data products we use are provided by the US Census and they collect everything, not just population, not just racial breakdown, but economic data, who, what kind of, what kind of households there are, what kind of, how many cars are available to those households, how many percentage of that household was living in the same house 12 months ago. I mean, you name it, the ACS, the American Community Survey has some proxy for that particular measure. It's really kind of quite astounding, but you're not able to access that data until you know in what census subdivision type of aerial unit it sits in. And so that's what this is, that's what the important part about geocoding is, is we can not only count things, like how many dots are those are gonna happen in a particular area or whatever, but we can cross reference it to all that ancillary data that we have from Data Haven, from ACS, from whatever, because once we know what census tract we're in, then we know what population, the unemployment rate, the poverty rate, these kind of things. And that's how we are able to do the analytics that Dr. Jagania had been presenting. These geocoding services are myriad. I mean, just an astounding amount of them, and I won't go over all of them, but we will discuss a little bit kind of in big strokes. There are cloud-based geocoders that a lot of us are familiar with, like for instance, Google Maps or the Census Bureau we talked about. All of these can take big batch geocodes so you can feed in hundreds or thousands of addresses and it spits out hundreds of thousands of longitudes and latitude pairs or tuples. Then there are HIPAA-compliant cloud-based geocoders, like, uh-oh, why does there need to be a HIPAA-compliant on that geocoder? Well, I don't know, we'll talk about it. But there are, Esri, who makes ArcGIS, a lot of us are familiar with, is the biggest GIS software guy on the planet, company on the program, actually founded by a psychiatrist, interestingly enough. Geocodeo, Microsoft, Cardo, these are all folks that you may or may not be familiar with but they brag and advertise and sell HIPAA-compliant cloud-based geocoders, which means that they have certain promises about how they will or won't store the data, they store it in super-encrypted HIPAA-compliant containers online, but they require a BAA, which is a business associates agreement if you've ever read the HIPAA statutes, is required for data sharing outside of your agency. Any covered entity who wants to share between agencies need a BAA, you might have incorporated that, like multi-site research studies or something like that. These are contracts written up by lawyers on your side and lawyers on their side, and everybody signs it, and it's quite a big deal. I've actually never done a BAA to do any of this stuff, but if you wanna feel very satisfied with yourself, you do HIPAA-compliant cloud-based geocoding, that's the step you have to take. The alternative is to do on-prem, or actually doing it on your own computer geocoding. As you might expect, it's maybe a little bit more technically challenging to install, you might need a little bit more hardware of your own to install. Commercial options like Esri and ArcGIS sell their own geocodeo, sells their own on-premises geocoders. There are also open-source or free on-premises geocoders that I've used, I have a lot of experience with, including the Nominatum and Pelias, which are very good, very, very fast. Some of the considerations about how to choose these kind of things, if you wanna convert patient addresses into longitude and latitude to allow yourself some subsequent analysis, are the big one, data security, we talked about, and we'll talk a little bit more about, actually. Cost, some of them are free, some of them are expensive. The commercial ones are not free, as you might expect. The open-source ones are almost all free, but there's oftentimes a trade-off in either security or in speed. The online ones, the cloud-based ones, are considerably slower than on-premises ones, by a factor of about 1,000. So if you need to geocode 30,000 addresses, it's something you might wanna really consider trying to figure out an on-premises alternative for that. And then the technical requirements, both hardware-wise, software-wise, and skillset-wise to implement these things are not inconsiderable and need to be part of your whole project planning. That said, so we're gonna, here's my hippo hippo, which is generated by stable diffusion. I mean, AI is here to stay, man, this is pretty. I asked, could you please make me a picture of a hippo dressed up as a doctor holding a clipboard? And it generated that, and so here it is. So the following, though, I mean, here's where we are. This is where we are in the world, right? I have to say this, the following section of the presentation is information only and does not constitute legal advice. Really, these are conversations you need to have with your counsel, your head of information security, whoever it is you feel like is making these kind of decisions because it really, we're getting into a little bit of a gray area, that's what I'd be very upfront about. What is HIPAA? And HIPAA privacy rules, if you remember, protects, quote unquote, and in bold, individually identifiable health information and its transmission in all its various forms. So basically, HIPAA covers PHI, protected health information, plus an individual identifier. What is PHI? I mean, this is all from the Health and Human Services, this is from the HIPAA code. Information about the individual's past, present, or future physical or mental health or condition. Okay, that makes sense. Here's the sticky one. Provision of healthcare to the individual. And then the stuff about billing, which I never deal with, but it's basically the parallel version for billing because you have to be able to share information with billers and insurance agencies and stuff like that. And that identifies the individual or for which there is a reasonable basis to believe they can be identified from the PHI. And what are the identifiers? I mean, we all take our HIPAA training every year, most of us do, I'm sure. None of these, I think, are surprising. I'm always tarned by the over 89 thing. I don't know, I just think of my grandmother being so exceptional that she gets this own call out, she's 97, so. Now, geography, though, we haven't, most of us probably have not paid a lot of attention to the geography identifier. It makes sense that a, so HIP, it makes sense that obviously an individual home address would be a pretty good identifier, right? What's the threshold that HIPAA sets up? Well, it's actually pretty interesting. It's not even the zip code of that address, it's actually the first three digits of the zip code are the most, are the smallest aerial unit that is sufficiently anonymous enough to be a non-identifying, or a non-identifying geographical reference. And the idea about that is that is, that's where the US Census and HIPAA gets very, very confident that at least 20,000 people are covered within this designation, and the level of anonymity ensured by that is sufficient. What does that look like? This is the map of the United States, if it's divvied up just by the first three digits of your zip code. We'll see in high-density areas, like maybe the Northeast, where the size of these aerial units are relatively small, it might even approximate about the size of counties, but you get out into the West, and like poor Nevada, where these areas are huge, I mean, are half the size of New England, and the meaning of saying that, of the summary statistics around an aerial unit, that big, it becomes less and less powerful when the area is that big, and the summary statistics for that spread out. Now, that might, but this area, this level of specificity might work for your particular project, at which point you wouldn't have to worry about kind of the anonymity of the data very much at all. You could simply just lop off the first three, the last two digits of the zip code, and go with where you are, and you feel pretty confident about it. Oh, where was I going with that? So, we went over the geocoders. I came back to this for a reason, and I'm blanking on it, so I'm gonna push through. I will say, oh, oh, I remember, right. I will say that the fact that there are these kind of qualifiers, like HIPAA-compliant geocoding, and I think speaks to the demand for this kind of level of security. On the other hand, if you go to PubMed, and you search for basically any project that's doing geospatially-referenced data, they're almost all using cloud-based geocoders, like Google Maps, for now, and the pendulum seems to be swinging back and forth as this debate roars. I will say, rather sarcastically or jadedly, that some of the largest voices in this debate are Esri, him or herself, whatever gender we wanna give it. If you look online for this debate, almost every person who's speaking up most vociferously against using cloud-based geocoders because they see them as HIPAA violations are all working for Esri, so that's kind of interesting. And so I'm not gonna make this decision for you. Like I said, that's something you need to make about, make a decision with your counsel, your chief of IT security, whatever it is. But I will say, as recently as nine years ago, the CDC flat-out said, and as a presentation they gave on the topic, that as long as you were de-identifying the home addresses and passing them to a geocoder, they did not see that as a violation. It's a debate that is unresolved and needs to be, it's something you need to kind of come to terms with yourself and the organization you're working within. So hopefully that didn't spin your head too much. Wrapping up, though, we have put together, and this is the young lady who asked about the online resources. Oh, yes, there you are back there. So if you want to just go to StanMathis.net slash APA, I put together a wonderfully rudimentary like HTML one. Actually, I didn't write it at all, Chad GPT wrote it for me. But I have a list of some of our real big resources. I've left off the geocoding references because, again, I don't want to come off as suggesting one or another, to be honest, for reasons I hope you can sense why I have these vagaries around it. But lots of good data resources. The U.S. Census Tiger File is incredibly important when we're thinking about geography. The U.S. Census is the one, you know, the census is in the Constitution, right? In 1792, I mean, they said we have to take a census every ten years, and it has to do with voting rights and all that stuff like that. But because of that, we have really great data for hundreds of years now. It's really well-funded and really well-run. And it also is kind of the touchstone for almost all this analytics work in the United States, of course. Once you get outside the United States, things change considerably. Ask anybody who's tried to do any analytics work that something's between the United States and Canada, which I've done and failed at. American Community Survey, as we've mentioned before, is a great, again, U.S. Census product about a lot of socioeconomic demographic factors at a very granular level within the United States. And then Social Explorer, which is actually a commercial product, but they have a lot of free information on it. I think it actually uses Social Explorer a lot to understand what the American Community Survey is actually saying. The American Community Survey uses a lot of kind of government E's in their tables, and the Social Explorer actually explains what ACS is using in a way that's very helpful. Meta-visualization stuff, there are a couple of products from the American Association of Family Practitioners, AAFP, and the Robert Graham Foundation, UDS Mapper and Health Landscape, which are really good at just doing first-wash, I need to figure out what my catchment area looks like. I want to know what my part of the state looks like over a lot of different socioeconomic factors. I want to look at a map really easy by just doing a couple clicks and zooming in. And that's really, really easy to do with both of those. Another one is specific to a particular metric, which is the SVI Interactive Map. This is the Social Vulnerability Index, which is an interesting proxy measure for 10 different socioeconomic factors that are the most highly influential on social determinant outcomes. And that's a nice map resource for them as well. Again, all these are available at the StanMathis.net slash APA. And then some trainings. You know, like I said before, this is not a training, but I taught myself, and we've taught ourselves through various online trainings. And here's some really, really, really good ones. Particularly the Harvard Geocoding Project, the R Consortium on YouTube, and then this wonderful offline geocoding tutorial, which is really interesting, particularly for anybody who has the chutzpah to do an on-prem, open-source geocoding server. It's a wonderful walkthrough to how to do that. If you really want to say, I have no money, I have some time, and I want to be 100 percent certain that I'm being safe with my patient's data, that last geocoding tutorial is really gold. Now, the addendum. This is the twist that I alluded to in the beginning. We started putting this presentation together, what, like two or three months ago or something, right? So I had a much more exhaustive resource list and some basic introductory tutorials on how to do maybe the basics in R, stuff like that. And you know, this is how we used to teach ourselves things. We'd go to, you'd Google it, you'd go to Stack Overflow. That's how I taught myself to do everything. I still use it quite a bit. And every programmer still does, which is kind of the external brain. And then this happened. And I hope some of you know, the world is a different place than it was four months ago. I mean, this, or whatever, November of 2022. I mean, we have yet, we won't understand for years the impact that this has, but everything has changed. And it's beautifully helpful with exactly what we want to do today. So for instance, so to chat.openai.com, go, play, it's free. I have, I pay for the premium edition, which is a princely $20 a month. It's worth it. I use it every single day. It makes life better. It answers questions for you. Yes, it hallucinates. Yes, it's evolving. Don't put any private information, especially not any patient information on it. But if you want to figure out how to do something with your computer, ask it and it will tell you. For instance, I asked it, why is R the best programming language for people without a lot of programming background who are interested in data analysis and statistics? And it gave me this beautiful essay on why it is, right? I asked it, what are some good resources for spatial data on diabetes rates in Connecticut? And it gave me this beautiful item with links that work and go to the exact data I was looking for. I asked it, I'm sorry, this is kind of small, write me some R code that opens a CSV file as a data frame, converts it to all the rows to data frame to numeric values, and then draws a plot with the X axis being the first column, the Y axis being the second column, and it wrote me the code. And you can copy and paste that code into R and it will run. I mean, I don't even have, we don't even have the words to explain how this is going to change things. I mean, computer science departments, I won't do that, never mind, I'll bring it back a little bit. But I mean, people will stop programming eventually, okay, so, all right, hot shots, here we go. This is the idea. So remember from the objectives from the beginning, you know, we're going to generate one data question relevant to your client or your clinic, and then we're going to have a shouting match about how to possibly start addressing it. Anybody have anything that came up during the talk? We have approximately, only actually about four or five minutes. Anybody have any questions? Either you ask me or we can, or, we can ask the genius. What do we have? Who wants to go? Yes, do you mind going to the mic, please? I want to start a registry, and I want to find out what has worked for people who have lost 100 pounds or 30 pounds and kept it off for years. So what are the things that have worked? Because doctors and psychiatrists, too, have said, food addiction, it doesn't, it's not real. Ashley Gerhardt's going to talk on Tuesday, Nora Volko, come visit. But what has worked? What has worked? There are thousands of people for whom the weight has gone, and they're doing things that work. So why doesn't our society know what those things are that work when there are so many people who are suffering from obesity and the intergenerational aspects of ongoing obesity over generations from the time of conception and the epigenetic changes then? So how can we get this information from all those people for whom it is working to share with all those people for whom it is not working yet? Sure. That sounds like a great question. Will that be a shouting match? We can try to shout that one out. That's going to be a little challenging. I mean, I think in that particular one, it's a data acquisition question, right? How would you go about identifying those cases of folks who have done it and kept it off? The 12-step programs have been successful in various degrees. Grace Eaters Anonymous, the food addicts in recovery are a couple that have been successful in large populations. So it sounds like you have some identified community allies already, folks who might be worth collaborating with. Those are settings that are probably not used to collecting kind of protected health information, so it might be challenging. Very much so. But I think I would definitely begin with my community allies in that approach. And then there are medical professionals that have gone to keto and low-carb eating and have been surprised at the outcomes. Yeah. Yeah. Intermittent fasting, I've seen videos on TikTok about it. Yeah. This is certainly well beyond, obviously it's well beyond my expertise, but I mean, I think it's the way that, it's the general idea of the way to think about it, that you've identified an issue that you want to focus on. Now you're in the resource kind of identification phase, real identifying allies, identifying what skills, what software packages or hardware you might need, what kind of questions you want to ask about it, what ancillary data would be helpful. helpful for you. But you need your cohort first, it sounds like, and finding it through your community allies would be the most useful first steps, I imagine. Thank you. Yeah, sure, sounds like a very interesting project. Anybody else? Oh, we got one up front, all right. Fantastic. Hey, Stan and Olu. I used to work with these guys. So I have my new job in L.A. at a big kind of primary care health system, outpatient care. I practice in West L.A. and we are a community, or we are a clinic that serves the underserved in West L.A. But I'm seeing a lot of folks who do not come from the socioeconomic strata that I expected to see. Some people kind of middle class, upper middle class, backgrounds, working professionals on medical assistance. So one question I have is, is there something strange going on in this part of Los Angeles where a clinic that is primarily meaning to serve particular demographic is, at least for mental health services, is being accessed disproportionately by folks who are not necessarily the most underserved in the community? If that makes sense. It seems like a good question for this particular method. Yeah, I think it's actually similar to some of the analytics that Dr. Jagadee did. I don't know if you had any ideas about that. It's a good question, Keith. I think a good way to start would be, of course, asking the question and looking at just your data, what you have currently as far as your client space, you know? And the neighborhoods, like who are those coming to the clinic from the neighborhoods? Just answering that question, I don't know if you have, if you can identify data to answer that question. Like the neighborhoods you serve. We have addresses, we have an EMR, so we have addresses. I imagine a lot of this data about median income and so forth would be available. The census data just contains a lot of data. I think it underlies kind of one of the key, wary assumptions of aerial-based data is you're often looking at the mean here, so the mean household income of a particular area, which of course has folks on either side of. And I think one approach you might think about is, do you have a geographically defined catchment area for your clinic? I believe so. Okay. I haven't seen it on a map actually, but I'm pretty sure you have. I think two approaches that might be interesting for you is, are you getting folks kind of with a universal distribution from the areas that you should, I mean from your complete catchment area, or is there roughly the same distribution of patients from those areas? And then the flip side of that coin is, are the folks you're seeing representative of the areas from which they come? That would be harder to do because oftentimes we don't ask income data of our clients and stuff. We infer it from other proxy measures. But as far as kind of race, age, and some other kind of metrics that you can get to to see if they're representative of where they're coming from are the kind of two flip sides of a different coin that I think give a bigger picture both about kind of exactly what's represented in the clinic and then how representative those in the clinic are of where they're coming from. And both of those can be, but first of all it's kind of figuring out where your clients are geocoding-wise and then also wrapping around with ancillary data about the communities and neighborhoods from which they come with ACS data or other census data. And are there other community centers where some of your other, some of the people in the neighborhood may also be going other than your clinic, perhaps? How would you think about that in the data analysis? Because there are, right? In New Haven, there's not all that much, right? That's not a big challenge in terms of how we... There are others like Kenneth Scott Hill. When we're looking at his data, someone reminded us that we're not the only game in town. There's other mental health centers that also have similar population. But it looks like what you're describing is like the opposite of what we're used to in CMHC, right? It's like the opposite in terms of composition. It's a new experience for me. It's one that I don't feel like this would happen at a general health center. Yeah. I think it's similar. I'm going to do one more question before our time's up, which it already is, actually. But I think it's a similar observation that Dr. Jagadee was saying when he got to his methadone clinic. It's not the people expected to see. How do I go about thinking about this? I look at kind of what the catchment area looks like as a whole, and then if my clients are representing that catchment area, and it's particularly of the sub-parts of the catchment area and their neighborhoods where their stated home address is, I think would be a good first branch. I think so. Sorry to go rush you on that one. Yes, sir, for our final question. I think our field seems to go in sort of phases. I mean, we tend to spend a lot of time on trying to understand mechanisms, hypotheses. We've gone through, at least in my generation, gone through a neurotransmitter phase. Now we're in a big data phase. We're very great at identifying questions, hypotheses. When it comes to solving these problems or coming up with solutions for these problems, that's where we seem to come to a screeching halt. I was reading a passage by Thomas Insull in his book Healing. He says, after he spent so many years as the NIMH director, and he spent $20 billion, somebody came up from the audience and shouted at him and said, my son has severe mental illness. He's been to so many emergency rooms. My house is on fire, and you're telling me about the chemistry of the paint. To me, I think, should our responsibility or should a part of our training also be not just coming up with these phenomenal mechanisms to generate questions or hypotheses, but having a second part to it. Okay, now we have found a geographical location where somebody is not coming to our clinic. What can we do? Let's create a practicum around that. That's just one reflection that I have. It's a beautiful point. There's that classic quote that in the United States, your zip code has more bearing on your health outcomes than your genetic code, because it has a lot to do with the development history of the United States. It's not specific to the United States, but I think a fundamental to your observation, though, and maybe this will be our last point, is I think it's about empowering trainees. It's about trainees. It's about empowering and renormalizing the focus and the scope of practice and scope of formulation. Up until recently, I also taught structural competency and social determinants of health to our residents. The biggest hurdle, but I think the most important one, was to get buy-in from them, from this odd idea that psychotherapy and psychopharmacology were very, very important tools for a pretty significant wedge of all these various factors that are contributing to mental illness, both acutely and chronically, and at both the individual scale and at the population scale, if you think about all your various axes. Once that buy-in happens, and then developing over a generation of training and continuing education, interventions that work, that are evidence-based and evaluated, what work and doesn't work, and they're specific and kind of provincial to the particular area of practice. That's something that still a lot of physicians or clinicians or practitioners of various titles do not feel, either that's part of their job, or if that kind of language is actually saying, I don't feel comfortable with that broad of a formulation. I prefer these smaller models of the brain, of neurotransmitters, of however you want to slice it in that direction. I came to this work, again, naively as an amateur, simply because I was dissatisfied with the more conventional approaches and realized, especially with the clients I take care of, who are the sickest and the poorest and the most challenged and the most chronic in their illness, we're simply not responding to these simpler definitions, these simpler interventions. I need to think bigger about all the factors. That's the ACT model. That's what I do all day anyway. It came very naturally that these kind of analyses would fit into that broader formulation of our role in giving care. I appreciate that point. I appreciate everybody who talked today, who thought today, who maybe took a little bit of this and takes them home to their home programs, their home practice areas. This time I will speak for Dr. Jagadee. We're both 100% open to any follow-up questions or contact. I think all the contact is online through the email or whatever. I'd be very happy to continue these conversations with anybody who's interested in this kind of work. Thanks a lot. Thank you.
Video Summary
The presentation, led by Stan Mathis and Dr. Oluwole Jagadee, involves exploring the use of everyday analytics, focusing on utilizing public data and free tools to derive meaningful insights for patient care, clinic operations, and beyond. It's emphasized that the session isn't a technical data science or programming class but serves as a motivational discourse for clinicians to engage in data analytics about the populations they serve.<br /><br />Dr. Oluwole Jagadee presents a case study on understanding neighborhood mental health service utilization in New Haven, Connecticut. He explores demographic factors and their impact on healthcare access, advocating for enhancing community dialogue and outreach with the aid of readily available data resources.<br /><br />The session also showcases a study by Dr. Peter Khan, focusing on the impact of relocating a pulmonary clinic on patient travel times, highlighting the challenges faced by patients using public transport due to lack of access.<br /><br />Lastly, a unique project led by Stan Mathis examines the geographic distribution of first-episode psychosis cases, using spatial data to identify underserved areas in treatment catchment regions.<br /><br />Both presenters reflect on the empowering nature of addressing clinical questions through data analytics, advocating practitioners to extend their analytical capabilities to address complex, location-based healthcare challenges. They emphasize the necessity of understanding geographic and socioeconomic factors to improve healthcare outcomes effectively, recommending resources, data repositories, and training opportunities for those interested in enhancing their analytics skills in clinical settings.
Keywords
everyday analytics
public data
patient care
clinic operations
data insights
mental health
healthcare access
community outreach
patient travel times
spatial data
first-episode psychosis
geographic distribution
healthcare outcomes
×
Please select your language
1
English