[MUSIC] Hi, there, Ian Duncan has written an excellent book titled Healthcare Risk Adjustment and Predictive Modelling. This book has a great chapter titled Clinical Identification Algorithms that helped me formulate this lesson. If anyone is interested in administrative healthcare data, I highly recommend this book. To start, I will review definitions of clinical identification algorithms. These are sets of rules that when applied to claims or encountered data set, identifies the conditions that are present in a population. Medical conditions are associate with medical expenditures. So correctly identifying conditions is often important. Yet this is not as easy as one might think. Diagnosis information is general an easy concept measure, but other data such as laboratory or pharmacy data can be much more challenging to extract and to process. At the end of this lesson, you will be able to define clinical identification algorithms, identify how data are transformed by algorithm rules, and articulate why some data types are more or less reliable than others when constructing the algorithms. Okay, let's get started. Usually the most important starting point for thinking about Clinical Identification Algorithms is to ask questions about the best ways to identify and measure medical conditions given uncertainty about the data quality. The following list of questions are a great place to start. What is the source of the diagnosis? For example, are the diagnoses coming from labs or for medical charts? What claims or encounter type should be considered? For example, should claims and encounters be considered from inpatient records, outpatient surgeries and radiology? How many diagnosis should be considered. Next, in what time span should we be looking for diagnosis? For example, should the analyst look back one year or two years for evidence of a diabetes diagnosis? Finally, what procedures and prescription drugs might reliably identify conditions? These questions will depend on the specific data being analyzed and varying aspects of data quality that's being considered. For example, pharmacy data are often more reliable than encounter data that are coming from physician groups. Thus it might be best to prioritize prescription data. When using diagnosis data found on non-clinical data sets, an analyst must think about diagnosis might be more likely to be coded on specific categories of claims or encounters. For example, inpatient claims are more likely to have higher data quality, especially related to conditions as related to conditions, as compared to physician claims. This can be related to a number of reason. For example, hospitals often have professional coding departments and specialty software to ensure better coding. Diagnosis codes are generally more valid when associated with a evaluation in management, also known as E&M procedure codes. This is because the physician in this case is usually treating, rather than conducting exploratory tests. Laboratory claims are less reliable since the diagnosis could be part of a rule out diagnosis rather than something related to treating a real and validated medical condition. The frequency of diagnosis codes found among particular individuals can also be used as a way to ensure that an individual really does have that medical condition. In other words, an analyst or data scientist might want to assume that the more times a diagnosis occurs among the clients and encounters for specific member, the greater the probability that member or patient actually has that medical condition. Of course requiring numerous diagnosis per member can increase false negative rate. People with the condition might not get identified. Pharmacy data are often a good source to flag or identify medical conditions. First, pharmacy data are often quickly process and are usually relatively high in quality. Second, drugs such as insulin are often associated with specific conditions such as diabetes. Of course, it is important to use caution when assuming that a drug might identify a particular condition. For example, many drugs are used to treat a variety of conditions. Related to this, off-label uses of drugs can reduce the specificity of drugs when trying to identify medical conditions. When creating clinical identification algorithms, a researcher needs to consider the balance or trade-offs between sensitivity and specificity. First, let's review these concepts that you likely covered in your statistics courses. Sensitivity is the true positive rate. For example, this is the percentage of the population with diabetes that are correctly identified. Specificity is the true, negative rate. This is the percentage of healthy individuals that are identified. Analysts must balance the trade off between these. If one tries to increase sensitivity, the risk of false positives increases. Attempting to maximize specificity might result in missing some true positive cases. Creators of clinical identification algorithms have to review that it is acceptable or preferred to risk some false positive to make sure that all the people with the disease are identified. Of course, it depends on the purpose of the algorithm. Clinical carries much different in finance and thus the sensitivity specificity trade off will differ between these analytic domains. In past lessons we have discussed the importance of quality measurement. For example, you were introduced to some general research about the quality of healthcare within the United States. Some recent institutional structures that are organizing and promoting quality within the US include the National Quality Forum which endorses reliable and valid measures and the National Quality Strategy. You have also been introduced to some general categories of measures. These include structure, process, and outcome measures. Finally, quality measures are available from both commercial and open source groupers. Overall, a huge variety of measures have been created and health services researchers have evaluated many of them. Now, let's review in detail the logic of quality measures. To understand what quality measurement means, it's helpful to study the component of the algorithm. Some common aspects include what is the type of data. Is it clinical, administrative, or survey, for example? What data inputs are required? How much pre-processing of the data is required to meet the standards? Next the numerator, this is usually the event in question such as the procedure, or some type of adverse event such as a patient safety issue. The denominator is the eligible population that is at risk to experience the event. Next exclusion rules are often important. These rules identify specific types of individuals or events to be excluded. Risk adjustment is a process to adjust measures based on the case mix or severity of illness among the patients or members. It is usually done for outcome measures and not for process measures. Finally, it is important to understand value sets that specify codes such as ICD, and computer code that can be used to implement the algorithms, and finally what documentation exists to describe all of the elements that I just listed. Data preparation is a critical task for running performance measures since the input data will almost never be in the proper format from the algorithms. Luckily, standardized measures usually have clear definitions of data fields and elements that are required for each measure. Using these instructions, it is often necessary to clean and transform the raw data into acceptable input data sets for the algorithms. For example, fields often need to be renamed, some data elements need to be recorded, and the analyst needs to deal with missing or poor-quality data. Sometimes imputation is possible in which the entire data set is used to estimate values for missing data. And finally, probably the most important topic in this section is an analysis of data quality. To make statements about the relative quality of healthcare providers such as hospitals or individual doctors, an analyst must know if they're measuring healthcare quality, which can be considered the signal or data quality that can sometimes be the noise. Most of the work in this area of quality measurement involves assessment and validation of input data, so that the organization can be confident that they are measuring healthcare quality rather than the quality of the data. Much of my career has been conducting data quality assessments to make sure that quality ratings about hospitals or doctors can be defended. Hospitals and doctors who score low will often be quick to ask questions about the reliability of the ratings. Thus it is imperative that analysts have answers. That concludes our look at clinical identification algorithms. In our next lesson we will briefly review some quality measures that are commonly used among healthcare organizations. See you soon.