This paper describes a case study for collecting digital footprint data for the purpose of health data mining. The case study involved 20 subjects residing in Finland who were instructed to collect data from registries which they evaluated to be useful for understanding their health or health behaviour, current or past. 11 subjects were active, sending 100 data requests to 49 distinct organizations in total. Our results indicate that there are still practical challenges in collecting actionable digital footprint data. Our subjects received a total of 75 replies (reply rate of 75.0%) and 61 datasets (reception rate of 61%). Out of the received data, 44 datasets (72.1%) were delivered in paper format, 4 (6.6%) in portable document format and 13 (21.3%) in structured digital form. The time duration between the sending of the information requests and reception of a reply was 26.4 days on the average.
We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as "hotspots" for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical-disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.
This article contrasts two case definitions for myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). We compared the empiric CFS case definition (Reeves et al., 2005) and the Canadian ME/CFS clinical case definition (Carruthers et al., 2003) with a sample of individuals with CFS versus those without. Data mining with decision trees was used to identify the best items to identify patients with CFS. Data mining is a statistical technique that was used to help determine which of the survey questions were most effective for accurately classifying cases. The empiric criteria identified about 79% of patients with CFS and the Canadian criteria identified 87% of patients. Items identified by the Canadian criteria had more construct validity. The implications of these findings are discussed.
Adverse drug events (ADEs) are a public health issue. The objective of this work is to data-mine electronic health records in order to automatically identify ADEs and generate alert rules to prevent those ADEs. The first step of data-mining is to transform native and complex data into a set of binary variables that can be used as causes and effects. The second step is to identify cause-to-effect relationships using statistical methods. After mining 10,500 hospitalizations from Denmark and France, we automatically obtain 250 rules, 75 have been validated till now. The article details the data aggregation and an example of decision tree that allows finding several rules in the field of vitamin K antagonists.
Understanding the impact of treatment policies on patient outcomes is essential in improving all aspects of patient care. The BC Cancer Agency is a provincial program that provides cancer care on a population basis for 4.5 million residents. The Lung and Head & Neck Tumour Groups planned to create a generic yet comprehensive software infrastructure that could be used by all Tumour Groups: the Outcomes and Surveillance Integration System (OaSIS). The primary goal was the development of an integrated database that will amalgamate existing provincial data warehouses of varying datasets and provide the infrastructure to support additional routes of data entry, including clinicians from multiple-disciplines, quality of life and survivorship data from patients, and three dimensional dosimetric information archived from the radiotherapy planning and delivery systems. The primary goal is to be able to capture any data point related to patient characteristics, disease factors, treatment details and survivorship, from the point of diagnosis onwards. Through existing and novel data-mining techniques, OaSIS will support unique population based research activities by promoting collaborative interactions between the research centre, clinical activities at the cancer treatment centres and other institutions. This will also facilitate initiatives to improve patient outcomes, decision support in achieving operational efficiencies and an environment that supports knowledge generation.
Here, we report first results on the development of computational health information technology for monitoring chronic non-communicable diseases (NCDs) risks in Russia based on data of the large-scale ongoing population survey in Health Centers (HCs). The technology involve algorithms for automated raw data process and generation of joint database, tools for data standardization and visualization, the assessment of risks, and other components. The data on physical status of Russians, including height, weight, and BMI are provided and compared with Belgian (1835), Swiss (2002), and US (1988-1994) reference datasets. The age-standardized prevalence of obesity in 5-85 years-old Russians according to the conventional WHO criteria was found to be high (18.9% in males and 26.7% in females) and varied significantly across federal subjects of Russia thus suggesting an importance of the Russian NCDs risks monitoring system for planning and evaluation of the effectiveness of preventive and therapeutic measures.
Drugs have tremendous potential to cure and relieve disease, but the risk of unintended effects is always present. Healthcare providers increasingly record data in electronic patient records (EPRs), in which we aim to identify possible adverse events (AEs) and, specifically, possible adverse drug events (ADEs).
Based on the undesirable effects section from the summary of product characteristics (SPC) of 7446 drugs, we have built a Danish ADE dictionary. Starting from this dictionary we have developed a pipeline for identifying possible ADEs in unstructured clinical narrative text. We use a named entity recognition (NER) tagger to identify dictionary matches in the text and post-coordination rules to construct ADE compound terms. Finally, we apply post-processing rules and filters to handle, for example, negations and sentences about subjects other than the patient. Moreover, this method allows synonyms to be identified and anatomical location descriptions can be merged to allow appropriate grouping of effects in the same location.
The method identified 1 970 731 (35 477 unique) possible ADEs in a large corpus of 6011 psychiatric hospital patient records. Validation was performed through manual inspection of possible ADEs, resulting in precision of 89% and recall of 75%.
The presented dictionary-building method could be used to construct other ADE dictionaries. The complication of compound words in Germanic languages was addressed. Additionally, the synonym and anatomical location collapse improve the method.
The developed dictionary and method can be used to identify possible ADEs in Danish clinical narratives.
Cites: J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-2420064797
Fewer than half of individuals with a mental disorder seek formal care in a given year. Much research has been conducted on the factors that influence service use in this population, but the methods generally used cannot easily identify the complex interactions that are thought to exist. In this paper, we examine predictors of subsequent service use among respondents to a population health survey who met criteria for a past-year mood, anxiety or substance-related disorder.
To determine service use, we use an administrative database including all physician consultations in the period of interest. To identify predictors, we use classification tree (CART) analysis, a data mining technique with the ability to identify unsuspected interactions. We compare results to those from logistic regression models.
We identify 1213 individuals with past-year disorder. In the year after the survey, 24% (n=312) of these had a mental health-related physician consultation. Logistic regression revealed that age, sex and marital status predicted service use. CART analysis yielded a set of rules based on age, sex, marital status and income adequacy, with marital status playing a role among men and by income adequacy important among women. CART analysis proved moderately effective overall, with agreement of 60%, sensitivity of 82% and specificity of 53%.
Results highlight the potential of data-mining techniques to uncover complex interactions, and offer support to the view that the intersection of multiple statuses influence health and behaviour in ways that are difficult to identify with conventional statistics. The disadvantages of these methods are also discussed.
A trigger is a powerful tool for identifying adverse events to measure the level of any kind of harm caused in patient care. Studies with epilepsy patients have illustrated that using triggers as a methodology with data mining may increase patient well-being. The purpose of this study is to test the functionality and validity of the previously defined triggers to describe the status of epilepsy patient's well-being. In both medical and nursing data, the triggers described patients' well-being comprehensively. The narratives showed that there was overlapping in triggers. The preliminary results of triggers encourage us to develop some reminders to the documentation of epilepsy patient well-being. These provide healthcare professionals with further and more detailed information when necessary.