Sukanya Krishna Portfolio

All About Me!

Hello, my name is Sukanya (or Suki for short)! I’m currently a PhD student at Harvard University in Engineering Sciences with a concentration in Computer Science and a secondary in Data Science. I graduated from the University of California San Diego where I majored in Bioengineering with a double minor in Data Science and Cognitive Science. I am interested in applying computational methods and applied machine learning with respect to the medical sciences. As an avid programmer and engineer, I’m very passionate about using CS and Data Science (DS) in order to help solve some of the world's most interesting problems.

I’m interested in exploring and learning about all sorts of technology and software. Some of my passions for learning surround app development, finance, geospatial analysis, computer vision, renewable energy, and healthcare. I find it incredible how CS and DS can be used in so many intersecting ways and all the opportunities that exist in these fields. If there are any interesting collaboration opportunities or would like to reach out, feel free to reach me through the "Contact Me" message form!

To learn a bit more about me, separate from school, work, and tech, I really enjoy listening to music on YouTube or Spotify, making music (I have been playing the flute for ~10 years now since middle school!), watching kdramas or anime (my favorites at the moment are Blue Lock and One Punch Man), and learning new languages (I am currently learning Korean through my university!).

Skills

Below are some of the skills that I am familiar with, and I'm always looking to learn more.

Python

5 years +

Proficient with NumPy, Pandas, Keras, TensorFlow, PyTorch, Flask, Django

SQL & R

1 year

Proficient with working with relational databases, querying, and performing data analysis.

Java

3 years +

Proficient with object-oriented programming (OOP) principles.

Git & GitHub

4 years +

Proficient with using Git and GitHub as a version control system for managing projects.

Cloud & Containerization

1 year

Proficient with using Cloud Computing and Containerization Technology: AWS, Kubernetes, Poseidon.

HTML & CSS

1 year

Adequate at using HTML and CSS for web-development.

Select Work Experiences

Undergraduate Student Researcher @ Systems Biology and Systems Medicine Lab

UCSD La Jolla, CA April 2024 - Sep. 2024

Developing an image-based methodology to stratify the heterogeneity and classify the disease state of tumors in triple-negative breast cancer (TNBC) using fluorescent microscopy images obtained from GeoMx experiments.

Machine Learning (Pharmacometrics) Intern @ Bristol Myers Squibb

Bristol Myers Squibb San Diego, CA July 2023 - Sep. 2023

Employed machine learning techniques to identify key features for predicting diabetes using two distinct datasets. Investigated multiple scikit-learn classifiers, explainable AI techniques, and neural networks
Achieved up to 86% model accuracy for smaller Pima Indians Diabetes dataset and 75% accuracy for larger Diabetes Readmission dataset.
Implemented Generative Adversarial Networks (GANs) to augment new patient data, enhancing the project’s scope beyond feature selection

Product Analytics Intern @ One Medical

One Medical (Amazon) San Francisco, CA May 2023 - Aug. 2023

Worked with big data to develop an efficient data model on Snowflake, aggregating patient data to enhance performance and analytical capabilities.
Designed and published Tableau dashboards, visually representing 8 crucial success metrics sourced from Snowflake and Mixpanel. Utilized a pre-aggregated data model to ensure superior performance.

Open Source Contributor @ Google Summer of Code

Ontario Institute for Cancer Research> Remote May 2023 - Sep. 2023

Spearheaded implementation of a CI/CD pipeline using Argo CD, Argo Workflows, AWS, GitHub, and Jenkins.
Successfully automated continuous integration deployments for 2 repositories using Git Hooks, with plans to expand to over 20 by the next release.
Dockerized critical repositories in the release pipeline to reduce manual intervention, improve curator workflows, and enhance developer productivity.

Undergraduate Student Researcher @ Robotic and Haptic Devices Lab

UCSD La Jolla, CA May 2023 - Aug. 2024

Part of bioengineering senior design team that aims to develop a proof-of-concept demonstration of a novel vine biomedical robot which can be steered by local actuation of responsive material
Evaluating and testing the attachment of heating actuators (LCEs) to different vine materials, and characterizing the performance of LCEs when activated using hydronic heating or pneumatic heating

Data Science Engineering Intern @ Medtronic

Medtronic Northridge , CA June 2022 - Aug. 2022

Further developed the Digital Twin (DT) model (works to get a set of parameter values that minimizes the amount of error between patient sensor glucose values and the fitted sensor glucose values from the DT model) for speed optimization.
Used Python and developed on Poseidon clusters.
Achieved 7.5x reduction in compute time with stable fitting (1.4% deviation in MARD), and stable parameter estimation -- 4% parameter variation for 10-minute step (against parameters fitted using max discretization at 1-minute steps).
Estimated 5x cost reduction in cloud resources (AWS) at scale.

Undergaduate Student Researcher @ Duarte Lab

UCSD La Jolla, CA June 2021 - Feb. 2024

Performing anomaly detection analysis at the Duarte Lab (lab studying Particle Physics using Machine Learning) using graph-based ML models for LHC data analysis to discover exotic new physics.
Working on development of particle graph autoencoders, unsupervised deep learning models for application in anomaly detection using Python and developed on Kubernetes clusters.
IRIS-HEP fellow for summer 2021

Undergraduate Student Researcher @ Zorrilla Lab

Scripps Research La Jolla, CA Sep. 2021 - June 2022

Training in bioinformatics techniques, including GWAS and whole exome sequence analysis using Python and R.
Troubleshooting and applying code in order to perform psychiatric genetic association analysis on the UK Biobank database. For analyzing priori genes and conduct a gene variant discovery analysis
Worked on the development of a random forest predictive model to determine if there is a relationship between a patient profile and whether they would have an alcohol related rehospitalization

Project Work

Research, Group, and Personal projects.

Software Projects

Fake Amazon Reviews (FARS)

FARS is a 4 person team I led to work on a Data Science/ML project upon the US Amazon Customer Reviews dataset from Amazon (2014-2015) archives to predict whether a given review is verified or unverified.

Developed KNN (K-Nearest Neighbors) and Bigrams & Random Forest Classification ML models for this analysis. Optimized the KNN classifier by working on feature selection and data cleaning - achieved around 70% test accuracy for both models

Interaction Network (IN)

This project under the Duarte Lab investigates a kind of type of graph-based autoencoder and randomized neural network architecture, the interaction network autoencoder and variational autoencoder (ie. CNN, DNN). The objective is to evaluate, against other kinds of autoencoder and variational autoencoder structures, to see which structures can be best optimized to fit in an FPGA (to meet L1 trigger requirements) that is also good at anomaly detection.

National Park Size & Diversity Analysis

This project is a simple linear regression analysis done to look into whether there is a relationship between national park size and diversity in each park done upon open source Kaggle datasets.

Power Outages Analysis

This project is a simple linear regression analysis done to look into whether there is a relationship between national park size and diversity in each park done upon open source Kaggle datasets.

Effects of Food Accessibility on Type II Diabetes

As part of 5 person team worked on project that explored the prevalence of Type II diabetes in accordance to Californian adults' access to fresh foods, their race, and their gender in 2017.

The ordinal data was processed into seven column elements through one-hot encoding: if they have diabetes, type of diabetes, their accessibility to fresh foods, their accessibility to affordable fresh foods, race, if they are prediabetic, and if they are a female. We performed univariate analysis to determine the distribution of data among each element as well as using multivariate analysis and scatterplot to determine the relationship between them. Furthermore, we used a decision tree in order to further analyze the relationship of diabetes and prediabetes among those other elements.

Our results showed that there is a high volume of adults who have access to affordable fresh foods, however there is a higher prevalence of type II diabetes for those have greater accessibility to affordable fresh foods.

Song Visualizer (Spotify)

This is a project I have recently started on which by taking in a song title and corresponding artist name, will generate a rating for the overall mood/vibe of the song based on various musical indicators and the sentiment of the song lyrics.

My goal is to qualitatively describe the song using an "emoji"/emotion visual so that the user can filter through what kinds of songs they want to listen to by how they're feeling. Plan to showcasethe analysis through an online web dashboard and goal for deployment for Nov 2022.

Inspiration for this idea came when I was having difficulty finding songs on my playlist that were of a particular mood from an unorganized playlist (Liked Songs) by style or genre.

News Articles Classifier

This project incorporated web scraping using BeautifulSoup and predictive machine learning using a HuggingFace zero-shot-classification model which would scrape an article, determine what topic that article is most correlated to and label that article under 1 of 13 topic categories : "politics", "international news", "celebrity", "sports", "health", "nutrition", "fitness", "beauty", "business", "economy", "finance", "technology", "science", "lifestyle".

Personal Portfolio

This personal portfolio is one of my first projects getting to work with and learn how to use HTML, CSS, and JavaScript to create a nice deliverable.

Mech-E / Design Projects

Construct-A-Thon: Demolition Robot

Bulbasaur Pendulum Clock

Amazon Redesign - Data Driven UX and Product Design Case Study

VEX Robotics

VEX Robotics is a robotics competition where different teams in high school division would build a robot based on the year's game and compete.

2017 - Qualified for World Championships
2018, 2019, 2020 - Qualified for State Championships

All About Me!

Python

SQL & R

Java

Git & GitHub

Cloud & Containerization

HTML & CSS

Undergraduate Student Researcher @ Systems Biology and Systems Medicine Lab

UCSD La Jolla, CA April 2024 - Sep. 2024

Machine Learning (Pharmacometrics) Intern @ Bristol Myers Squibb

Bristol Myers Squibb San Diego, CA July 2023 - Sep. 2023

Product Analytics Intern @ One Medical

One Medical (Amazon) San Francisco, CA May 2023 - Aug. 2023

Open Source Contributor @ Google Summer of Code

Ontario Institute for Cancer Research> Remote May 2023 - Sep. 2023

Undergraduate Student Researcher @ Robotic and Haptic Devices Lab

UCSD La Jolla, CA May 2023 - Aug. 2024

Data Science Engineering Intern @ Medtronic

Medtronic Northridge , CA June 2022 - Aug. 2022

Undergaduate Student Researcher @ Duarte Lab

UCSD La Jolla, CA June 2021 - Feb. 2024

Undergraduate Student Researcher @ Zorrilla Lab

Scripps Research La Jolla, CA Sep. 2021 - June 2022

Fake Amazon Reviews (FARS)

Interaction Network (IN)

National Park Size & Diversity Analysis

Power Outages Analysis

Effects of Food Accessibility on Type II Diabetes

Song Visualizer (Spotify)

News Articles Classifier

Personal Portfolio

Construct-A-Thon: Demolition Robot

Bulbasaur Pendulum Clock

Amazon Redesign - Data Driven UX and Product Design Case Study

VEX Robotics

"Particle Graph Autoencoders and Differentiable, Learned Energy Mover's Distance"

Tsan, S. et al. including S. Krishna. NeurIPS (2021)