The main goal of this paper is to develop a spell checker module for clinical text in Russian. The described approach combines string distance measure algorithms with technics of machine learning embedding methods. Our overall precision is 0.86, lexical precision - 0.975 and error precision is 0.74. We develop spell checker as a part of medical text mining tool regarding the problems of misspelling, negation, experiencer and temporality detection.
Predicting binary events such as newborns with large birthweight is important for obstetricians in their attempt to reduce both maternal and fetal morbidity and mortality. Such predictions have been a challenge in obstetric practice, where longitudinal ultrasound measurements taken at multiple gestational times during pregnancy may be useful for predicting various poor pregnancy outcomes. The focus of this article is on developing a flexible class of joint models for the multivariate longitudinal ultrasound measurements that can be used for predicting a binary event at birth. A skewed multivariate random effects model is proposed for the ultrasound measurements, and the skewed generalized t-link is assumed for the link function relating the binary event and the underlying longitudinal processes. We consider a shared random effect to link the two processes together. Markov chain Monte Carlo sampling is used to carry out Bayesian posterior computation. Several variations of the proposed model are considered and compared via the deviance information criterion, the logarithm of pseudomarginal likelihood, and with a training-test set prediction paradigm. The proposed methodology is illustrated with data from the NICHD Successive Small-for-Gestational-Age Births study, a large prospective fetal growth cohort conducted in Norway and Sweden.
The proper estimate of the risk of recurrences in early-stage oral tongue squamous cell carcinoma (OTSCC) is mandatory for individual treatment-decision making. However, this remains a challenge even for experienced multidisciplinary centers.
We compared the performance of four machine learning (ML) algorithms for predicting the risk of locoregional recurrences in patients with OTSCC. These algorithms were Support Vector Machine (SVM), Naive Bayes (NB), Boosted Decision Tree (BDT), and Decision Forest (DF).
The study cohort comprised 311 cases from the five University Hospitals in Finland and A.C. Camargo Cancer Center, São Paulo, Brazil. For comparison of the algorithms, we used the harmonic mean of precision and recall called F1 score, specificity, and accuracy values. These algorithms and their corresponding permutation feature importance (PFI) with the input parameters were externally tested on 59 new cases. Furthermore, we compared the performance of the algorithm that showed the highest prediction accuracy with the prognostic significance of depth of invasion (DOI).
The results showed that the average specificity of all the algorithms was 71% . The SVM showed an accuracy of 68% and F1 score of 0.63, NB an accuracy of 70% and F1 score of 0.64, BDT an accuracy of 81% and F1 score of 0.78, and DF an accuracy of 78% and F1 score of 0.70. Additionally, these algorithms outperformed the DOI-based approach, which gave an accuracy of 63%. With PFI-analysis, there was no significant difference in the overall accuracies of three of the algorithms; PFI-BDT accuracy increased to 83.1%, PFI-DF increased to 80%, PFI-SVM decreased to 64.4%, while PFI-NB accuracy increased significantly to 81.4%.
Our findings show that the best classification accuracy was achieved with the boosted decision tree algorithm. Additionally, these algorithms outperformed the DOI-based approach. Furthermore, with few parameters identified in the PFI analysis, ML technique still showed the ability to predict locoregional recurrence. The application of boosted decision tree machine learning algorithm can stratify OTSCC patients and thus aid in their individual treatment planning.
Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.
Early dumping syndrome after gastric bypass surgery due to rapid delivery of hyperosmolar nutrients into the bowel causing intense symptoms is often described as a complication. Twelve patients, mean age 47 years, were interviewed approximately 9 years post-operation. The interviews were audiotaped and transcribed verbatim, followed by an inductive content analysis to reveal patients' experience of the dumping syndrome. The core category 'Dumping syndrome is a positive consequence of Roux-en-Y gastric bypass surgery and a tool to control food intake' was identified based on the following four sub-categories: (i) 'The multidimensional emergence and effects of dumping syndrome', (ii) 'Dumping syndrome as something positive although unpleasant', (iii) 'Developing coping mechanisms and ingenious strategies' and (iv) 'My own fault if I expose myself to dumping syndrome'. From the patients' perspective, dumping syndrome gives control over food intake; although the symptoms were unpleasant, patients considered dumping syndrome as a positive protection against over-consumption. Hence, healthcare professionals should not present dumping syndrome as a complication but rather as an aid to control eating behaviour and excessive food intake.
Pre-deployment identification of soldiers at risk for long-term posttraumatic stress psychopathology after home coming is important to guide decisions about deployment. Early post-deployment identification can direct early interventions to those in need and thereby prevents the development of chronic psychopathology. Both hold significant public health benefits given large numbers of deployed soldiers, but has so far not been achieved. Here, we aim to assess the potential for pre- and early post-deployment prediction of resilience or posttraumatic stress development in soldiers by application of machine learning (ML) methods.
ML feature selection and prediction algorithms were applied to a prospective cohort of 561 Danish soldiers deployed to Afghanistan in 2009 to identify unique risk indicators and forecast long-term posttraumatic stress responses.
Robust pre- and early postdeployment risk indicators were identified, and included individual PTSD symptoms as well as total level of PTSD symptoms, previous trauma and treatment, negative emotions, and thought suppression. The predictive performance of these risk indicators combined was assessed by cross-validation. Together, these indicators forecasted long term posttraumatic stress responses with high accuracy (pre-deployment: AUC = 0.84 (95% CI = 0.81-0.87), post-deployment: AUC = 0.88 (95% CI = 0.85-0.91)).
This study utilized a previously collected data set and was therefore not designed to exhaust the potential of ML methods. Further, the study relied solely on self-reported measures.
Pre-deployment and early post-deployment identification of risk for long-term posttraumatic psychopathology are feasible and could greatly reduce the public health costs of war.
This article describes the study results of echocardiographic (ECHO) test data for 4P medicine applied to cardiovascular patients. Data from more than 145,000 echocardiographic tests were analyzed. One of the objectives of the study is the possibility to identify patterns and relationships in patient characteristics for more accurate appointment procedures based on the history of the disease and the individual characteristics of the patient. This is achieved by using classifications models based on machine learning methods. Early detection of disease risks and "accurate" appointment of diagnostic procedures makes a significant contribution to value-based medicine. Moreover, it was also possible to identify the classes and characteristics of patients for whom repeated diagnostic procedures are well founded. Calculation of personal risks from empirical retrospective data helps to detect the disease in early stages. Identifying patients with high risk of disease complications allow physicians to make right decisions about timely treatment, which can significantly improve the quality of treatment, and help to avoid diseases complications, optimize costs and improve the quality of medical care.
A trigger is a powerful tool for identifying adverse events to measure the level of any kind of harm caused in patient care. Studies with epilepsy patients have illustrated that using triggers as a methodology with data mining may increase patient well-being. The purpose of this study is to test the functionality and validity of the previously defined triggers to describe the status of epilepsy patient's well-being. In both medical and nursing data, the triggers described patients' well-being comprehensively. The narratives showed that there was overlapping in triggers. The preliminary results of triggers encourage us to develop some reminders to the documentation of epilepsy patient well-being. These provide healthcare professionals with further and more detailed information when necessary.
For the 2014 i2b2/UTHealth de-identification challenge, we introduced a new non-parametric Bayesian hidden Markov model using a Dirichlet process (HMM-DP). The model intends to reduce task-specific feature engineering and to generalize well to new data. In the challenge we developed a variational method to learn the model and an efficient approximation algorithm for prediction. To accommodate out-of-vocabulary words, we designed a number of feature functions to model such words. The results show the model is capable of understanding local context cues to make correct predictions without manual feature engineering and performs as accurately as state-of-the-art conditional random field models in a number of categories. To incorporate long-range and cross-document context cues, we developed a skip-chain conditional random field model to align the results produced by HMM-DP, which further improved the performance.
Although the airway microbiota is a highly dynamic ecology, the role of longitudinal changes in airway microbiota during early childhood in asthma development is unclear. We aimed to investigate the association of longitudinal changes in early nasal microbiota with the risk of developing asthma.
In this prospective, population-based birth cohort study, we followed children from birth to age 7 years. The nasal microbiota was tested by using 16S ribosomal RNA gene sequencing at ages 2, 13, and 24 months. We applied an unsupervised machine learning approach to identify longitudinal nasal microbiota profiles during age 2 to 13 months (the primary exposure) and during age 2 to 24 months (the secondary exposure) and examined the association of these profiles with the risk of physician-diagnosed asthma at age 7 years.
Of the analytic cohort of 704 children, 57 (8%) later developed asthma. We identified 4 distinct longitudinal nasal microbiota profiles during age 2 to 13 months. In the multivariable analysis, compared with the persistent Moraxella dominance profile during age 2 to 13 months, the persistent Moraxella sparsity profile was associated with a significantly higher risk of asthma (adjusted odds ratio, 2.74; 95% confidence interval, 1.20-6.27). Similar associations were observed between the longitudinal changes in nasal microbiota during age 2 to 24 months and risk of asthma.
Children with an altered longitudinal pattern in the nasal microbiota during early childhood had a high risk of developing asthma. Our data guide the development of primary prevention strategies (eg, early identification of children at high risk and modification of microbiota) for childhood asthma. These observations present a new avenue for risk modification for asthma (eg, microbiota modification).