DEEP LEARNING ALGORITHM FOR CONNECTING SCIENTIFIC RECORDS AND SOCIAL PLATFORM

  • Budi Santoso Universitas Sains dan Teknologi Komputer
  • Agustinus Budi Santoso Universitas Sains dan Teknologi Komputer
  • Eko Siswanto Universitas Sains dan Teknologi Komputer
Keywords: Social Platform, Natural Language Processing, Deep Learning, Tensor Processing Unit, Electronic Medical Record

Abstract

In the healthcare industry, professionals develop big amounts of disorganized data. The complexity of this data and the loss of computational capability lead to delays in the investigation. Nevertheless, with the advent of Deep Learning algorithms and connection to computing power such as Graphic Processor Units (GPUs) and Tensor Processing Units (TPUs), text and image processing has become usable. Deep Learning (DL) data bring about a big outcome in Natural Language Processing (NLP) and computer perception. The main purpose of this study is to build an undivided approach that can relate social platforms, literature, and scientific records to develop an approach to medicinal education for the public and experts.

This study focuses on NLP in the healthcare industry and compiles data by Electronic Medical Records (EMR), medical literature, and social platforms. The framework proposed in this study is one for connecting social platforms, medical literature, and Electronic Medical Records scientific records using Deep Learning algorithms. Linking data sources requires defining the relationships between them, and finding concepts in medical texts. The National Library of Medicine (NLM) introduced the Unified Medical Language System (UMLS) and uses this system as the basis for the proposed system. The dynamic nature of a social platform can be recognized and supervised methodologies can be applied under supervision to develop conception. Named entity Recognition (NER) enables the active eradication of data or individuals by the pharmaceutical literature.

References

Albishre, K., M. Albathan, and Y. Li, Effective 20 newsgroups dataset cleaning. The 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015. 3: p. 98--101.
Bates, DW, et al., Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Affairs, 2014. 33(7): p. 1123-1131.
Blei, DM, AY Ng, and MI Jordan, Latent Dirichlet allocation. Journal of machine Learning research, 2003. 3(1): p. 993--1022.
Bodenreider, O., The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 2004. 32(90001): p. 267D-270.
Campos, D., S. Matos, and JL Oliveira, Gimli: open source and high-performance biomedical name recognition. BMC Bioinformatics, 2013. 14(1): p. 54.
Cawley, GC, and NL Talbot, On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 2010. 11: p. 2079-2107.
Chapman, BE, et al., Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. Journal of Biomedical Informatics, 2011. 44(5): p. 728-737.
Chapman, W., J. Dowling, and D. Chu. ConText: An algorithm for identifying contextual features from clinical text. in Biological, translational, and clinical language processing. 2007.
Chiu, JPC and E. Nichols, Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 2016. 4: p. 357--370.
Choi, E., et al., Medical concept representation of learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686, 2016.
Deerwester, S., et al., Indexing by latent semantic analysis. Journal of the American society for information science, 1990. 41(6): p. 391--407.
DeHart, K. and J. Holbrook, Emergency department applications of digital dictation and natural language processing. The Journal of ambulatory care management, 1992. 15(4): p. 18-23.
Doan, RI, R. Leaman, and Z. Lu, NCBI disease corpus: a resource for disease name recognition and concept normalization. Journal of biomedical informatics, 2014. 47: p. 1--10.
Dumais, ST, Latent semantic analysis. Annual review of information science and technology, 2004. 38(1): p. 188--230.
Gerner, M., G. Nenadic, and CM Bergman, LINNAEUS: A species name identification system for biomedical literature. BMC Bioinformatics, 2010. 11(1): p. 85.
Guo, L., et al., Big social data analytics in journalism and mass communication: Comparing dictionary-based text analysis and unsupervised topic modeling. Journalism \& Mass Communication Quarterly, 2016. 93(2): p. 332--359.
Huang, Z., W. Xu, and K. Yu, Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991, 2015.
Hwang, M.-H., et al., Spatiotemporal transformation of social platform geostreams: a case study of Twitter for flu risk analysis. The 4th ACM SIGSPATIAL International Workshop on GeoStreaming, 2013: p. 12--21.
Kelly, L., et al. Overview of the share/clef eHealth evaluation lab 2014. in International Conference of the Cross-Language Evaluation Forum for European Languages. 2014. Springer.
Kim, J.-D., et al., Introduction to the bio-entity recognition task at JNLPBA. 2004 :p. 70--75.
Kingma, DP and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. in Ijcai. 1995. Montreal, Canada.
Krestel, R., P. Fankhauser, and W. Nejdl, Latent Dirichlet allocation for tag recommendation. The third ACM conference on Recommender systems, 2009: p. 61--68.
Krstajic, D., et al., Cross-validation pitfalls when selecting and assessing regression and classification models. Journal of cheminformatics, 2014. 6(1): p. 1-15.
Kuperman, GJ, et al., Medication-related clinical decision support in computerized provider order entry systems: a review. Journal of the American Medical Informatics Association, 2007. 14(1): p. 29-40.
Lafferty, J., A. McCallum, and FCN Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
Laylavi, F., A. Rajabifard, and M. Kalantari, Event relatedness assessment of Twitter messages for emergency response. Information processing \& management, 2017. 53(1): p. 266--280.
Leaman, R. and G. Gonzalez, BANNER: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput, 2008: p. 652-63.
Li, J., et al., BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Databases, 2016. 2016.
Liu, F., C. Weng, and H. Yu, Natural language processing, electronic health records, and clinical research, in Clinical Research Informatics. 2012, Springer. p.s. 293-310.
Ma, X. and E. Hovy, End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354, 2016.
Marasovi, Srl4orl: Improving opinion role labeling using multi-task learning with semantic role labeling. arXiv preprint arXiv:1711.00768, 2017.
McCallum, A., D. Freitag, and FCN Pereira, Maximum Entropy Markov Models for Information Extraction and Segmentation. 2000.17 :p. 591--598.
McCallum, AK, Mallet: A machine learning for language toolkit. http://mallet. cs. umass. edu, 2002.
Mi, H. and P. Thomas, PANTHER pathway: an ontology-based pathway database coupled with data analysis tools. Methods Mol Biol, 2009. 563: p. 123-40.
Mikolov, T., et al., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
Mikolov, T., et al., Recurrent neural network based language model. 2010.
Mowery, D., Developing a Clinical Linguistic Framework for Problem List Generation from Clinical Text. 2014, University of Pittsburgh.
Passos, A., V. Kumar, and A. McCallum, Lexicon infused phrase enclosed for named entity resolution. arXiv preprint arXiv:1404.5367, 2014.
Pennington, J., R. Socher, and CD Manning, Glove: Global vectors for word representation. 2014 :p 1532--1543.
Pennington, J., R. Socher, and CD Manning. Glove: Global vectors for word representation. in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
Prier, KW, et al. Identifying health-related topics on twitter. in International conference on social computing, behavioral-cultural modeling, and prediction. 2011. Springer.
Rathore, MM, et al., Advanced computing model for geosocial platform using big data analytics. Multimedia Tools and Applications, 2017. 76(23): p. 24767--24787.
Ratinov, L. and D. Roth, Design challenges and misconceptions in named entity recognition. 2009:p.p. 147--155.
Ratnaparkhi, A., A maximum entropy model for part-of-speech tagging. 1996.
Rder, M., A. Both, and A. Hinneburg, Exploring the space of topic coherence measures. The eighth ACM international conference on Web search and data mining, 2015: p. 399--408.
Reddy, CK, and CC Aggarwal, Healthcare data analytics. Vol. 36. 2015: CRC Press.
Rosenberg, A. and J. Hirschberg, V-Ms: A conditional entropy-based external cluster evaluation measure. The 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), 2007: p. 410--420.
Santos, CD and B. Zadrozny, Learning character-level representations for part-of-speech tagging. 2014 :p 1818--1826.
Scanfeld, D., V. Scanfeld, and EL Larson, Dissemination of health information through social networks: Twitter and antibiotics. American journal of infection control, 2010. 38(3): p. 182-188.
Smith, L., et al., Overview of BioCreative II gene mention recognition. Genome Biology, 2008. 9(S2): p. S2.
Suominen, H., et al. Overview of the ShARe/CLEF eHealth evaluation lab 2013. in International Conference of the Cross-Language Evaluation Forum for European Languages. 2013. Springer.
Surian, D., et al., Characterizing Twitter discussions about HPV vaccines using topic modeling and community detection. Journal of medical Internet research, 2016. 18(8): p. e232.
Svenstrup, D., JM Hansen, and O. Winther, Hash enclosed for efficient word representations. arXiv preprint arXiv:1709.03933, 2017.
Terry, M., Twittering Healthcare: Social platform and Medicine. Telemedicine and e-Health, 2009. 15(6): p. 507-510.
Thom, D., et al., Spatiotemporal anomaly detection through visual analysis of geolocated Twitter messages. The 2012 IEEE Pacific Visualization Symposium, 2012: p. 41--48.
Tran, T., et al., Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM). Journal of biomedical informatics, 2015. 54: p. 96-105.
Yan, X., et al., A bitterm topic model for short texts. The 22nd international conference on the World Wide Web, 2013: p. 1445--1456.
Yen, S.-J., et al., A support vector machine-based context-ranking model for question answering. Information Sciences, 2013. 224: p. 77--87.
Zaremba, W., I. Sutskever, and O. Vinyals, Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.
Zhao, L., et al., Spatiotemporal event forecasting in social platform. the 2015 SIAM international conference on data mining, 2015: p. 963--971.
Zhou, P., et al., Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639, 2016.
Published
2022-06-22
How to Cite
Budi Santoso, Agustinus Budi Santoso, & Eko Siswanto. (2022). DEEP LEARNING ALGORITHM FOR CONNECTING SCIENTIFIC RECORDS AND SOCIAL PLATFORM. Journal of Engineering, Electrical and Informatics, 2(2), 23-37. https://doi.org/10.55606/jeei.v2i2.913