Precision Medicine




Biomedical Informatics









Machine Learning

Vivek Kanpa

Deep Learning @ Novartis · Research @ AI_PREMie · George J. Mitchell Scholar @ UCD

Experience

Novartis

AI & Computational Sciences Intern

What cardiovascular conditions are associated with different features of ECG readouts?

Recently, the Patient Contrastive Learning of Representations (PCLR) featurization—a vector of ~320 numbers—was found to more "expressive, performant, and practical" than raw electrocardiogram (ECG) data for cardiovascular machine learning tasks (Diamant et al., 2022). Traditionally, cardiologists read an ECG to diagnose adverse cardiac events, monitor drug toxicity, and screen patients who might be at risk for certain conditions. Using XAI and generative learning techniques on large datasets from MIMIC-IV (MIT) and the UK Biobank (NHS), I seek to understand internal feature relationships in PCLR and associate feature perturbations with clinical cardiac conditions.

January 2024 — present

AI_PREMie

ML Research Assistant

Goal: To develop a best-in-class AI-assisted clinical decision making tool for maternal preeclampsia.

Preeclampsia is a pregnancy disorder characterized by hypertension that affects 1 in 10 women, claiming the lives of 50,000 mothers and 500,000 babies annually. Because its warning signs are difficult to detect and may only appear close to delivery, preeclampsia often goes undetected until delivery, when complications can become serious. Following the discovery of novel blood biomarkers, the AI_PREMie team began developing an AI risk stratification tool to assist clinicians. I am running various machine learning optimization and feature engineering techniques to improve the performance of ensemble models at diagnostic classification. My optimization techniques, in addition to novel biomarker data, enables our model to outperform market competitors by large margins.

October 2023 — present

Revolution Medicines

Data Science Research Intern

Can protein structure inform strategic priorities for discovery biology and pharmacogenomics?

Performed structural alignment and hierarchical clustering (Dali, AlphaFold) of 600+ cancer driver proteins identified by OncoKB. Constructed a polar dendrogram where proteins were clustered by structural similarity, and encoded patient mutation frequency (AACR GENIE) with node radius. By clustering cancer driver proteins by structural homology, I informed the decision-making for next generation drug priorities and enabled efficient cancer targeting strategies. I documented this project's workflow over here.

Bioinformatics & Cancer Technology Co-op

Goals: develop generalizable tools to increase pharmacogenomics throughput and expidite ADME screening.

  • Trained graph convolution network (DeepChem) using molecular featurization of RevMed hit library to improve permeability (Caco-2) predictions by 28% compared to conventional molecular weight cutoff.
  • Designed a light-weight software system to automate mass spectrometry (MS) cross-linking analysis, calculating a normalized percent cross-linking value given raw relative abundance outputs. I enabled high throughput screening for drug-target binding, which shaved dozens of hours off each 384-well MS experiment and led to the discovery of a c-Myc-drug heterodimer complex a few weeks later.
  • Performed in vitro dose-response assays of KRAS-mutant CRC cell lines and quantified c-Myc mRNA levels (PAGE, qRT-PCR) and protein levels (BCA assay, western blotting/ELISA) to understand downstream effect of KRAS inhibition on Myc activity.

July 2022 — June 2023

Apfeld Laboratory

Research Assistant

Goal: To be updated soon!

Description coming soon!

January 2022 — April 2023

Takeda Pharmaceuticals

Bioanalytics & Automation Sciences Co-op

Full stack web developer for Global Biologics team in Cambridge, MA. Two of my major projects included:

  • Designed end-to-end web apps with Django for protein quantification assay (HTP Lunatic) and HPLC chromatography (Agilent 1260 Infinity) from assay completion to database upload, increasing analysis throughput, data availability, and data standardization exponentially.
  • Enabled high throughput assay workflows by building a translator for the Lynx liquid handler to interpret a scientists' experimental plate layout (96/384-well) into fulfillment instructions with picomolar precision.

July 2021 - December 2021

Education

PhD, Artificial Intelligence for Medicine @ Icahn School of Medicine at Mount Sinai

,

I will be beginning my doctoral studies in New York City in August, 2024. I'm excited to become a part of the cutting-edge at Sinai, learning from the best minds in computational biomedicine and clinical big data!

MSc, Artificial Intelligence for Medicine @ University College Dublin

George J. Mitchell Scholarship Class of '24
,
Thesis project—AI_PREMie: developing the machine learning clinical decision-making tool for preeclampsia
  • Publication in progress: Integration of platelet multi-omics data using machine learning for risk stratification of acute coronary syndromes (2024)
  • Founder of Medical Informatics Graduate Student journal club at the Conway Institute
  • Traveled to 10+ countries during my first time in Europe! Read about it in my blog below.
GPA: 3.80

BS, Data Science & Biology @ Northeastern University

Dean's Scholar, Class of 2023
,
  • Awarded $10,000 in grant funding from three internal Northeastern PEAK Awards to study cell signalling and gene expression protecting neurons from oxidative stress
  • Worked as a teaching assistant for CS1800 (Discrete Structures), GE1501 (Cornerstone of Engineering), and BIOL2309 (Biology Project Lab)
  • Second-author publication in Brain Sciences for discovering mPFC activity linked to self-deception during peer pressure
  • Worked as a Resident Assistant to help first-year students transition to college
  • Ran two marathons in San Francisco and Sacramento (PR: 3:39)
GPA: 3.60

Skills

Programming Languages & Tools
Techniques
  • Machine Learning Algorithm Selection
  • Biomedical Data Model Optimization
  • Object-Oriented Design
  • RNASeq DGE analysis
  • System Architecture
  • HTP Data Automation
  • Graph Feature Engineering
  • Database Design
  • Vibe Engineering (I have great vibes)

Interests

I love to stay active. I enjoy playing basketball, tennis, pickleball, and frisbee. I'm a marathoner and recently began on-season training for the Connemarathon Ultra (39.3 miles). 2024 will be the year I break 3:15, I just know it! I enjoy reading non-fiction and memoirs; I just finished And Finally and now I'm reading The Henna Artist, and I stay up to date with Nature and The New York Times as best I can.

I can't hold a conversation about TV shows, but I can talk your ear off about music. Bad Bunny and 6LACK had a chokehold on my Spotify playlists this year. I casually produce music and I host a weekly radio show (PM.fm! Wednesdays at 3pm EST), and a close friend and I are debuting our music podcast in January 2024. Check out my original music and some of my favorite songs!


And finally, here's my reading bucket list for 2024. If you have any recommendations, please let me know :)

  • The Emperor of All Maladies, Siddhartha Mukherjee
  • On The Origin of Species, Charles Darwin
  • Six Not-So-Easy Pieces, Richard Feynman
  • Freakonomics, Stephen J. Dubner
  • Manufacturing Consent, Noam Chomsky
  • Sapiens: A Brief History of Humankind, Yuval Noah Harari
  • The Structure of Scientific Revolutions, Thomas Kuhn

Projects - Under Renovation!

  • Visit My Blog!
  • Predicting Drug Permeability
  • Onco-proteome Data Visualization
  • Predicting Climate Change Trends