Health administrative data can be a valuable tool for disease surveillance and research. Few studies have rigorously evaluated the accuracy of administrative databases for identifying rheumatoid arthritis (RA) patients. Our aim was to validate administrative data algorithms to identify RA patients in Ontario, Canada.
We performed a retrospective review of a random sample of 450 patients from 18 rheumatology clinics. Using rheumatologist-reported diagnosis as the reference standard, we tested and validated different combinations of physician billing, hospitalization, and pharmacy data.
One hundred forty-nine rheumatology patients were classified as having RA and 301 were classified as not having RA based on our reference standard definition (study RA prevalence 33%). Overall, algorithms that included physician billings had excellent sensitivity (range 94-100%). Specificity and positive predictive value (PPV) were modest to excellent and increased when algorithms included multiple physician claims or specialist claims. The addition of RA medications did not significantly improve algorithm performance. The algorithm of "(1 hospitalization RA code ever) OR (3 physician RA diagnosis codes [claims] with =1 by a specialist in a 2-year period)" had a sensitivity of 97%, specificity of 85%, PPV of 76%, and negative predictive value of 98%. Most RA patients (84%) had an RA diagnosis code present in the administrative data within ±1 year of a rheumatologist's documented diagnosis date.
We demonstrated that administrative data can be used to identify RA patients with a high degree of accuracy. RA diagnosis date and disease duration are fairly well estimated from administrative data in jurisdictions of universal health care insurance.
Structure and modules of computer informational-analytical system "Electronic atlas of Russia" is presented, the object of mapping in this system is epidemiology of socially significant infectious diseases. Systemic information on processes of emergence and spread of socially significant infectious diseases (anthroponoses, zoonoses and sapronoses) in the population of Russian Federation is presented in the atlas. Detailed electronic maps of country territory filled with prognosis-analytical information created by using technological achievements of mathematic and computer modeling of epidemics and outbreaks of viral and bacterial infections are of particular interest. Atlas allows to objectively evaluate the pattern of infection spread, prepare prognoses of epidemic and outbreak developments taking into account the implementation of control measures (vaccination, prophylaxis, diagnostics and therapy) and evaluate their economic effectiveness.
The utilisation of data mining methods has become common in many fields. In occupational accident analysis, however, these methods are still rarely exploited. This study applies methods of data mining (decision tree and association rules) to the Finnish national occupational accidents and diseases statistics database to analyse factors related to slipping, stumbling, and falling (SSF) accidents at work from 2006 to 2007. SSF accidents at work constitute a large proportion (22%) of all accidents at work in Finland. In addition, they are more likely to result in longer periods of incapacity for work than other workplace accidents. The most important factor influencing whether or not an accident at work is related to SSF is the specific physical activity of movement. In addition, the risk of SSF accidents at work seems to depend on the occupation and the age of the worker. The results were in line with previous research. Hence the application of data mining methods was considered successful. The results did not reveal anything unexpected though. Nevertheless, because of the capability to illustrate a large dataset and relationships between variables easily, data mining methods were seen as a useful supplementary method in analysing occupational accident data.
The main goal of this paper is to develop a spell checker module for clinical text in Russian. The described approach combines string distance measure algorithms with technics of machine learning embedding methods. Our overall precision is 0.86, lexical precision - 0.975 and error precision is 0.74. We develop spell checker as a part of medical text mining tool regarding the problems of misspelling, negation, experiencer and temporality detection.
Candidate genes for non-alcoholic fatty liver disease (NAFLD) identified by a bioinformatics approach were examined for variant associations to quantitative traits of NAFLD-related phenotypes.
By integrating public database text mining, trans-organism protein-protein interaction transferal, and information on liver protein expression a protein-protein interaction network was constructed and from this a smaller isolated interactome was identified. Five genes from this interactome were selected for genetic analysis. Twenty-one tag single-nucleotide polymorphisms (SNPs) which captured all common variation in these genes were genotyped in 10,196 Danes, and analyzed for association with NAFLD-related quantitative traits, type 2 diabetes (T2D), central obesity, and WHO-defined metabolic syndrome (MetS).
273 genes were included in the protein-protein interaction analysis and EHHADH, ECHS1, HADHA, HADHB, and ACADL were selected for further examination. A total of 10 nominal statistical significant associations (P
Laboratory for Infectious Diseases and Screening (LIS) Centre for Infectious Disease Control, National Institute for Public Health and the Environment - RIVM - Netherlands, 3720 BA Bilthoven, The Netherlands. firstname.lastname@example.org
BACKGROUND: Bordetella pertussis is the causative agent of human whooping cough (pertussis) and is particularly severe in infants. Despite worldwide vaccinations, whooping cough remains a public health problem. A significant increase in the incidence of whooping cough has been observed in many countries since the 1990s. Several reasons for the re-emergence of this highly contagious disease have been suggested. A particularly intriguing possibility is based on evidence indicating that pathogen adaptation may play a role in this process. In an attempt to gain insight into the genomic make-up of B. pertussis over the last 60 years, we used an oligonucleotide DNA microarray to compare the genomic contents of a collection of 171 strains of B. pertussis isolates from different countries. RESULTS: The CGH microarray analysis estimated the core genome of B. pertussis, to consist of 3,281 CDSs that are conserved among all B. pertussis strains, and represent 84.8% of all CDSs found in the 171 B. pertussis strains. A total of 64 regions of difference consisting of one or more contiguous CDSs were identified among the variable genes. CGH data also revealed that the genome size of B. pertussis strains is decreasing progressively over the past 60 years. Phylogenetic analysis of microarray data generated a minimum spanning tree that depicted the phylogenetic structure of the strains. B. pertussis strains with the same gene content were found in several different countries. However, geographic specificity of the B. pertussis strains was not observed. The gene content was determined to highly correlate with the ptxP-type of the strains. CONCLUSIONS: An overview of genomic contents of a large collection of isolates from different countries allowed us to derive a core genome and a phylogenetic structure of B. pertussis. Our results show that B. pertussis is a dynamic organism that continues to evolve.
The goal of this study was to evaluate associations between the meteorological conditions and the number of emergency cases for five distinctive causes of dispatch groups reported to SOS dispatch centre in Uppsala, Sweden. Center's responsibility include alerting to 17 ambulances in whole Uppsala County, area of 8,209 km2 with around 320,000 inhabitants representing the target patient group. Source of the medical data for this study is the database of dispatch data for the year of 2009, while the metrological data have been provided from Uppsala University Department of Earth Sciences yearly weather report. Medical and meteorological data were summoned into the unified data space where each point represents a day with its weather parameters and dispatch cause group cardinality. DBSCAN data mining algorithm was implemented to five distinctive groups of dispatch causes after the data spaces have gone through the variance adjustment and the principal component analyses. As the result, several point clusters were discovered in each of the examined data spaces indicating the distinctive conditions regarding the weather and daily cardinality of the dispatch cause, as well as the associations between these two. Most interesting finding is that specific type of winter weather formed a cluster only around the days with the high count of breathing difficulties, while one of the summer weather clusters made similar association with the days with low number of cases. Findings were confirmed by confidence level estimation based on signal to noise ratio for the observed data points.
This paper describes a case study for collecting digital footprint data for the purpose of health data mining. The case study involved 20 subjects residing in Finland who were instructed to collect data from registries which they evaluated to be useful for understanding their health or health behaviour, current or past. 11 subjects were active, sending 100 data requests to 49 distinct organizations in total. Our results indicate that there are still practical challenges in collecting actionable digital footprint data. Our subjects received a total of 75 replies (reply rate of 75.0%) and 61 datasets (reception rate of 61%). Out of the received data, 44 datasets (72.1%) were delivered in paper format, 4 (6.6%) in portable document format and 13 (21.3%) in structured digital form. The time duration between the sending of the information requests and reception of a reply was 26.4 days on the average.
We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as "hotspots" for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical-disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.