Tuesday, May 6, 2025

Grad School Check-in: ~40% Done

Yesterday I submitted my last assignments for my graduate school classes, and I realized I haven’t written much here about the experience of getting my Masters (Data Science @ IU). I only have one week before my summer semester starts, so I thought I’d pump out a quick piece about some of the interesting things I’ve gotten to do so far.

Big picture

I find school to be very fulfilling, and while my program is very challenging (especially when balancing with a full-time job), it’s frankly nothing compared to my undergraduate years. Mostly I notice that I have a lot less bullshit-time; time spent on my phone, or sitting on my couch. Giving up real time with family, hobbies, etc. hasn’t really hit me, though I do occasionally have to give up a round of golf. I chose a good time in my life to do this.

Courses

Two semesters in, two classes each. So far I’ve taken:

Cool stuff I’ve gotten to build

K-means cluster algorithm: machine learning for breast cancer detection

For D590, our final project was to build a machine learning algorithm to detect breast cancer in subjects using a dataset of cellular samples. K-means clustering involves projecting attributes of an entity into an n-dimensional space and finding similar samples that live in that same “space” using euclidean distance (or other distance-based heuristics).

The project involved performing exploratory data analysis, processing and cleaning data, and then training the algorithm before testing its efficacy. My resulting implementation was able to successfully predict whether cell samples were benign or malignant with an overall 96.0% accuracy.

Alt text

This was a great project for learning about a basic machine learning algorithm and how it might be used in practice.

Watchdog: financial and economic pipelines and dashboards with Python and Blazor

Like I said, the final project for I535 was loose enough that students could pretty much build whatever they want, given that they could relate it back to the course materials in an interesting way. I chose to build a series of pipelines for scraping economic, financial, and trade data from multiple sources like the Bureau of Labor Statistics, the U.S. Census Bureau, and the SEC Edgar Database. I actually think this could be an interesting product if it had some more work put into it. Some key features:

Statistical Analysis of Economic Convergence in R

This was some of the most fun I’ve had so far - validating claims about the theory of economic convergence using real data, in R, for S580. I’ve really come around on R. There’s some syntactical funkiness for sure (wtf is with the 1-indexed arrays), but when it comes to doing real statistical work this is ultimately where I’m comfortable. The project involved studying OECD data around GDP for most countries to determine whether poor countries are more capable of explosive growth than larger countries. The report I authored determined that, given the data I had access to, the theory of economic convergence could not be confirmed with statistical methods like linear regression.

Alt text

Alt text

Up next

I’m projected to graduate next Spring, at my current pace, which is admittedly very fast. Most students in my program take one class a semester, but I’d rather not be in school for 2+ more years.

This summer I’m taking a course on data visualization, and a series of mini-courses that span all sorts of different topics, which I’m sure I’ll come back and write all about. I’m expecting the next two semesters to be very hard, but I’m looking forward to learning some new skills and wrapping this thing up soon. Thanks for reading!

Year in Review

Comments