I summarize various projects I have worked on during my graduate coursework.
Final project for STAT 615: Advanced Bayesian Methods. I worked on this project by myself. Find report, presentation slides, and code on GitHub.
The goal of this project was to obtain a model-based estimate of a
cartridge case comparison algorithm’s error rate. I developed a Bayesian
random-effects model with a Markov Random Field spatial term to capture
behavior in the similarity scores for two cartridge cases computed by
the algorithm. I fit the model using the Integrated Nested LaPlace
Approximation (INLA) algorithm as implemented in the inla
R package. I was able to obtain interpretable parameter estimates that
measured, for example, whether the algorithm naturally assigns high
similarity scores to specific cartridge cases or cartridge cases fired
from specific barrels. The spatial parameter estimates were also
interpretable and indicated regions of a cartridge case that tend to be
assigned higher similarity scores compared to other regions.
Tools utilized:
Final project for STAT 602: Modern Multivariate Statistical Learning. I worked in a team of three on this project. Report, presentation slides, and code can be found on GitHub.
The goal of this project was to classify whether a 5 second audio clip contained a bird call. We converted an audio file to a 2D frequency over time representation called a spectrogram (see example above). The three of us then independently engineered features for each spectrogram and trained our own classifiers using these features. For example, I trained k-Nearest Neighbors, Random Forest, XGBoost, support-vector machine, and C5.0 classifiers. Each of our classifiers returned a probability that an audio file contained a bird call. We then trained a Classification and Regression Tree (CART) model using these class probabilities. Our leave one out cross-validation estimated accuracy using the ensembling technique was 0.944 (i.e., we were able to correctly classify 94.4% of training audio files correctly). Unfortunately, we ran out of time in the semester to apply our trained model to test data.
Tools utilized:
Final project for STAT 544: Bayesian Statistics. I worked on this project by myself. Report, slides, and code can be found on GitHub.
The goal of this project was to model the chemical concentrations of float glass. Forensic glass analysts use these chemical concentrations to, for example, match glass shards found on a suspect to a broken window at a crime scene. Modeling this chemical concentration helps us understand how these chemical concentrations vary across manufacturers, time, and individual panes of glass. Panes of glass were obtained from two float glass manufacturers and the concentration of various elements (e.g., Iron, Hafnium, Zirconium) were measured per pane. During exploratory data analysis, I identified change in the concentration of some elements over time. Thus, I compared two Bayesian models: one that modeled the chemical concentration per pane of glass (i.e., independence between sampled panes) and another that modeled a dependence over time as stationary autoregressive. After model fitting and diagnosing, I noticed an extreme positive dependence between mean concentrations over time for Hafnium and Zirconium. It seems that the raw materials used in the manufacturing of these panes started off with large concentrations of these elements that “ran out” over time.
Tools utilized:
Final project for STAT 585: Data Technologies for Statistical Analysis. I worked on this project in a team of four. Code for my contribution to the project can be found on GitHub.
The goal of this project was to create a package and interactive web application to help statistics students learn introductory concepts. I created a visual tool for determining when to accept or reject a hypothesis test given null and alternative hypotheses, a test statistic, and a significance level. I also added a madlibs-style Hypothesis Test “game” whereby a students are given a random hypothesis test-related problem and must fill-in blanks in the standard hypothesis testing procedure. Finally, I created a visual tool to demonstrate the asymptotic distributional properties of the sample mean.
Cumulative Graduate GPA: 3.95
Fall 2021
Spring 2021
Fall 2020
EE 524: Digital Signal Processing by Dr. Aleksandar Dogandzic, Grade: A
STAT 590B: Missing Data Methods by Dr. Jae-Kwang Kim, Grade: A
STAT 643: Advanced Theory of Statistical Inference by Dr. Daniel Nordman, Grade: A
Spring 2020
STAT 544: Bayesian Statistics by Dr. Danica Ommen, Grade: A
STAT 601: Advanced Statistical Methods by Dr. Mark Kaiser, Grade: A-
STAT 642: Advanced Probability Theory by Dr. Vivekananda Roy, Grade: A-
Fall 2019
STAT 520: Statistical Methods III by Dr. Emily Berg, Grade: A
STAT 551: Time Series Analysis by Dr. Daniel Nordman, Grade: A
STAT 641: Foundations of Probability Theory by Dr. Arka Ghosh, Grade: A-
Spring 2019
COM S: Introduction to Machine Learning by Dr. Kris De Brabanter, Grade: A
STAT 510: Statistical Methods II by Dr. Daniel Nettleton, Grade: A
STAT 543: Theory of Probability and Statistics II by Dr. Vivekananda Roy, Grade: A
STAT 585: Data Technologies for Statistical Analysis by Dr. Heike Hofmann, Grade: A
Fall 2018
STAT 500: Statistical Methods I by Dr. Peng Liu, Grade: A
STAT 542: Theory of Probability and Statistics I by Dr. Lily Wang, Grade: A
STAT 579: An Introduction to R by Dr. Heike Hofmann, Grade: A