METHODS: In order to solve this problem, this paper proposes an end-to-end framework based on BERT for NER and RE tasks in electronic medical records. Our framework first integrates NER and RE tasks into a unified model, adopting an end-to-end processing manner, which removes the limitation and error propagation of multiple independent steps in traditional methods. Second, by pre-training and fine-tuning the BERT model on large-scale electronic medical record data, we enable the model to obtain rich semantic representation capabilities that adapt to the needs of medical fields and tasks. Finally, through multi-task learning, we enable the model to make full use of the correlation and complementarity between NER and RE tasks, and improve the generalization ability and effect of the model on different data sets.
RESULTS AND DISCUSSION: We conduct experimental evaluation on four electronic medical record datasets, and the model significantly out performs other methods on different datasets in the NER task. In the RE task, the EMLB model also achieved advantages on different data sets, especially in the multi-task learning mode, its performance has been significantly improved, and the ETE and MTL modules performed well in terms of comprehensive precision and recall. Our research provides an innovative solution for medical image and signal data.
METHODS: Using measures of discrimination and calibration, we tested the performance of the NL-IHRS (n=100 475) and FC-IHRS (n=107 863) for predicting incident CVD in a community-based, prospective study across seven geographic regions: South Asia, China, Southeast Asia, Middle East, Europe/North America, South America and Africa. CVD was defined as the composite of cardiovascular death, myocardial infarction, stroke, heart failure or coronary revascularisation.
RESULTS: Mean age of the study population was 50.53 (SD 9.79) years and mean follow-up was 4.89 (SD 2.24) years. The NL-IHRS had moderate to good discrimination for incident CVD across geographic regions (concordance statistic (C-statistic) ranging from 0.64 to 0.74), although recalibration was necessary in all regions, which improved its performance in the overall cohort (increase in C-statistic from 0.69 to 0.72, p<0.001). Regional recalibration was also necessary for the FC-IHRS, which also improved its overall discrimination (increase in C-statistic from 0.71 to 0.74, p<0.001). In 85 078 participants with complete data for both scores, discrimination was only modestly better with the FC-IHRS compared with the NL-IHRS (0.74 vs 0.73, p<0.001).
CONCLUSIONS: External validations of the NL-IHRS and FC-IHRS suggest that regionally recalibrated versions of both can be useful for estimating CVD risk across a diverse range of community-based populations. CVD prediction using a non-laboratory score can provide similar accuracy to laboratory-based methods.