Hello, my name is Sukanya (or Suki for short)! I’m currently a PhD student at Harvard University in Engineering Sciences with a concentration in Computer Science and a secondary in Data Science. I graduated from the University of California San Diego where I majored in Bioengineering with a double minor in Data Science and Cognitive Science. I am interested in applying computational methods and applied machine learning with respect to the medical sciences. As an avid programmer and engineer, I’m very passionate about using CS and Data Science (DS) in order to help solve some of the world's most interesting problems.
I’m interested in exploring and learning about all sorts of technology and software. Some of my passions for learning surround app development, finance, geospatial analysis, computer vision, renewable energy, and healthcare. I find it incredible how CS and DS can be used in so many intersecting ways and all the opportunities that exist in these fields. If there are any interesting collaboration opportunities or would like to reach out, feel free to reach me through the "Contact Me" message form!
To learn a bit more about me, separate from school, work, and tech, I really enjoy listening to music on YouTube or Spotify, making music (I have been playing the flute for ~10 years now since middle school!), watching kdramas or anime (my favorites at the moment are Blue Lock and One Punch Man), and learning new languages (I am currently learning Korean through my university!).
5 years +
Proficient with NumPy, Pandas, Keras, TensorFlow, PyTorch, Flask, Django
1 year
Proficient with working with relational databases, querying, and performing data analysis.
3 years +
Proficient with object-oriented programming (OOP) principles.
4 years +
Proficient with using Git and GitHub as a version control system for managing projects.
1 year
Proficient with using Cloud Computing and Containerization Technology: AWS, Kubernetes, Poseidon.
1 year
Adequate at using HTML and CSS for web-development.
FARS is a 4 person team I led to work on a Data Science/ML project upon the US Amazon Customer Reviews dataset from Amazon (2014-2015) archives to predict whether a given review is verified or unverified.
Developed KNN (K-Nearest Neighbors) and Bigrams & Random Forest Classification ML models for this analysis. Optimized the KNN classifier by working on feature selection and data cleaning - achieved around 70% test accuracy for both models
This project under the Duarte Lab investigates a kind of type of graph-based autoencoder and randomized neural network architecture, the interaction network autoencoder and variational autoencoder (ie. CNN, DNN). The objective is to evaluate, against other kinds of autoencoder and variational autoencoder structures, to see which structures can be best optimized to fit in an FPGA (to meet L1 trigger requirements) that is also good at anomaly detection.
This project is a simple linear regression analysis done to look into whether there is a relationship between national park size and diversity in each park done upon open source Kaggle datasets.
This project is a simple linear regression analysis done to look into whether there is a relationship between national park size and diversity in each park done upon open source Kaggle datasets.
As part of 5 person team worked on project that explored the prevalence of Type II diabetes in accordance to Californian adults' access to fresh foods, their race, and their gender in 2017.
The ordinal data was processed into seven column elements through one-hot encoding: if they have diabetes, type of diabetes, their accessibility to fresh foods, their accessibility to affordable fresh foods, race, if they are prediabetic, and if they are a female. We performed univariate analysis to determine the distribution of data among each element as well as using multivariate analysis and scatterplot to determine the relationship between them. Furthermore, we used a decision tree in order to further analyze the relationship of diabetes and prediabetes among those other elements.
Our results showed that there is a high volume of adults who have access to affordable fresh foods, however there is a higher prevalence of type II diabetes for those have greater accessibility to affordable fresh foods.
This is a project I have recently started on which by taking in a song title and corresponding artist name, will generate a rating for the overall mood/vibe of the song based on various musical indicators and the sentiment of the song lyrics.
My goal is to qualitatively describe the song using an "emoji"/emotion visual so that the user can filter through what kinds of songs they want to listen to by how they're feeling. Plan to showcasethe analysis through an online web dashboard and goal for deployment for Nov 2022.
Inspiration for this idea came when I was having difficulty finding songs on my playlist that were of a particular mood from an unorganized playlist (Liked Songs) by style or genre.
This project incorporated web scraping using BeautifulSoup and predictive machine learning using a HuggingFace zero-shot-classification model which would scrape an article, determine what topic that article is most correlated to and label that article under 1 of 13 topic categories : "politics", "international news", "celebrity", "sports", "health", "nutrition", "fitness", "beauty", "business", "economy", "finance", "technology", "science", "lifestyle".
This personal portfolio is one of my first projects getting to work with and learn how to use HTML, CSS, and JavaScript to create a nice deliverable.
VEX Robotics is a robotics competition where different teams in high school division would build a robot based on the year's game and compete.