I joined the school in 2017 as a Data Analyst for ALPHA Network. I hold a BSc Honours degree in Physics, complemented by a Master of Business Administration (MBA) degree with a specialization in Information Technology. I am an enthusiastic staff research degree student pursuing my PhD study in parallel with my professional commitments at the institution.
Early in my career, while consulting for the Vadu Health and Demographic Surveillance System, I played a pivotal role in introducing electronic field data capture systems. Initially, we utilized laptop computers and later transitioned to Android tablets. This initiative marked one of the pioneering deployments of large-scale longitudinal field surveillance using tablets in India. I also had the privilege of serving as a member of the Scientific Advisory Committee for the INDEPTH Network. Subsequently, I assumed a critical role in research data management in INDEPTH's iSHARE2 project, dedicated to data harmonization and sharing. Furthermore, I led the technical team responsible for the National Surveillance System for Enteric Fever in India, a testament to my commitment to advancing public health. These valuable experiences laid a strong foundation for my journey before joining the school.
In my current professional capacity, I serve as a Data Analyst/Scientist for two prominent networks: the ALPHA Network (https://alpha.lshtm.ac.uk/) and the INSPIRE Network (https://aphrc.org/inspire/ & https://inspiredata.network/ ). My primary focus revolves around the intricate world of data science.
Within the ALPHA Network, I have made substantial contributions by designing and implementing an ETL (Extract, Transform, Load) pipeline using the powerful Pentaho Data Integration tool. This pipeline efficiently manages ALPHA site data within the innovative Centre-in-a-Box (CiB) environment, consistently meeting precise data specifications. Additionally, I have spearheaded the development of process automation pipelines within the ALPHA server, spanning from data uploads to meticulous data quality checks, comprehensive quality reports, and data harmonization.
In the context of the INSPIRE Network, I've harnessed my expertise in OHDSI tools to seamlessly migrate data from ALPHA data specifications to the OMOP CDM (Observational Medical Outcomes Partnership Common Data Model). I am deeply involved in the critical task of harmonizing COVID-19 data sourced from the Integrated Disease Surveillance and Response in the African Region into the OMOP CDM. I've adeptly configured the INSPIRE platform-as-a-service (PaaS) on Microsoft Azure cloud services to streamline and accelerate these complex data processing tasks. Generated a synthetic dataset for WHO Integrated Disease Surveillance and Response (IDSR) for the Africa Region, focusing on COVID-19. This dataset has been used for the development of the ETL pipeline to migrate data from IDSR format to the standardised Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The details of this work are made available on a GitHub repository (https://github.com/tathagatabhattacharjee/Generic-IDSR-COVID-19-data-to…). Work on mental health, other infectious and non-communicable diseases are in progress on the INSPIRE platform.
In summary, my academic foundation and experience have provided me with a strong knowledge base, which I apply to drive impactful advancements in data science and processing within the population health and research domains.
Early in my career, while consulting for the Vadu Health and Demographic Surveillance System, I played a pivotal role in introducing electronic field data capture systems. Initially, we utilized laptop computers and later transitioned to Android tablets. This initiative marked one of the pioneering deployments of large-scale longitudinal field surveillance using tablets in India. I also had the privilege of serving as a member of the Scientific Advisory Committee for the INDEPTH Network. Subsequently, I assumed a critical role in research data management in INDEPTH's iSHARE2 project, dedicated to data harmonization and sharing. Furthermore, I led the technical team responsible for the National Surveillance System for Enteric Fever in India, a testament to my commitment to advancing public health. These valuable experiences laid a strong foundation for my journey before joining the school.
In my current professional capacity, I serve as a Data Analyst/Scientist for two prominent networks: the ALPHA Network (https://alpha.lshtm.ac.uk/) and the INSPIRE Network (https://aphrc.org/inspire/ & https://inspiredata.network/ ). My primary focus revolves around the intricate world of data science.
Within the ALPHA Network, I have made substantial contributions by designing and implementing an ETL (Extract, Transform, Load) pipeline using the powerful Pentaho Data Integration tool. This pipeline efficiently manages ALPHA site data within the innovative Centre-in-a-Box (CiB) environment, consistently meeting precise data specifications. Additionally, I have spearheaded the development of process automation pipelines within the ALPHA server, spanning from data uploads to meticulous data quality checks, comprehensive quality reports, and data harmonization.
In the context of the INSPIRE Network, I've harnessed my expertise in OHDSI tools to seamlessly migrate data from ALPHA data specifications to the OMOP CDM (Observational Medical Outcomes Partnership Common Data Model). I am deeply involved in the critical task of harmonizing COVID-19 data sourced from the Integrated Disease Surveillance and Response in the African Region into the OMOP CDM. I've adeptly configured the INSPIRE platform-as-a-service (PaaS) on Microsoft Azure cloud services to streamline and accelerate these complex data processing tasks. Generated a synthetic dataset for WHO Integrated Disease Surveillance and Response (IDSR) for the Africa Region, focusing on COVID-19. This dataset has been used for the development of the ETL pipeline to migrate data from IDSR format to the standardised Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The details of this work are made available on a GitHub repository (https://github.com/tathagatabhattacharjee/Generic-IDSR-COVID-19-data-to…). Work on mental health, other infectious and non-communicable diseases are in progress on the INSPIRE platform.
In summary, my academic foundation and experience have provided me with a strong knowledge base, which I apply to drive impactful advancements in data science and processing within the population health and research domains.
Affiliations
Department of Population Health
Faculty of Epidemiology and Population Health
Teaching
I have been instructing ETL techniques as part of the Health Data Management Module for the MSc Health Data Science program from the academic year 2020-2021 through 2022-2023.
Research
My research centers on the application of Machine Learning techniques to facilitate robust data record linkage between Health and Demographic Surveillance Systems (HDSS) and HIV clinic datasets, all within the same geographical regions. This ambitious endeavour aims to address critical gaps in healthcare and epidemiological research.
The datasets under scrutiny are sourced from an HDSS site in Tanzania. These datasets represent a rich and diverse repository of healthcare information. When effectively interconnected, they hold the potential to uncover profound insights into the ever-evolving dynamics of public health, especially within the context of HIV trends.
I am fortunate to have the guidance of mentors throughout this intellectual journey. Professor Jim Todd and Dr. Emma Slaymaker from the school alongside Dr. Chodziwadziwa Kabudula from the University of the Witwatersrand, South Africa, have been providing me with invaluable support from the Department of Population Health, Faculty of Epidemiology and Population Health. Their expertise and unwavering commitment plays a pivotal role in shaping the direction and impact of my research.
The datasets under scrutiny are sourced from an HDSS site in Tanzania. These datasets represent a rich and diverse repository of healthcare information. When effectively interconnected, they hold the potential to uncover profound insights into the ever-evolving dynamics of public health, especially within the context of HIV trends.
I am fortunate to have the guidance of mentors throughout this intellectual journey. Professor Jim Todd and Dr. Emma Slaymaker from the school alongside Dr. Chodziwadziwa Kabudula from the University of the Witwatersrand, South Africa, have been providing me with invaluable support from the Department of Population Health, Faculty of Epidemiology and Population Health. Their expertise and unwavering commitment plays a pivotal role in shaping the direction and impact of my research.
Selected Publications
Making metadata machine-readable as the first step to FAIR population health data
2024
Online journal of public health informatics
Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.
2024
Online Journal of Public Health Informatics
Harmonizing African Population Cohort Data: INSPIRE Network's Roadmap to Sustainable Standardization in OMOP CDM
2024
African Population Cohorts Consortium Blueprint Conference
INSPIRE datahub: a pan-African integrated suite of services for harmonising longitudinal population health data using OHDSI tools.
2024
Frontiers in Digital Health