Comparing Training Costs to Rewards – #01

This blog entry is part of a sequence where I journal my experience creating a data-science portfolio piece. This post focuses on the first half of data wrangling. The overall project compares the energy it takes to gain a career against the associated financial reward.

To see the previous entry in this blog-post series, please visit the following link:

Link to previous blog post

Importing

Getting started with the datasets for both O*NET and BLS Wage data was a pleasure. I enjoy cleaning and organizing data; it’s relaxing.

Importing O*NET into PostgreSQL

I spent some time perusing the O*NET datasets, and later the BLS datasets. The collections are large — super large. Furthermore, I wanted my data to be stored in PostgreSQL, but each dataset was not compatible by default and required conversion.

There was a risk at this early stage where, if I chose the wrong direction, I might find myself working for hours without any benefit.

To avoid wasting time, I turned to Claude AI for guidance and assistance importing data. This was not ideal in this educational setting, but without outside support, the risk was too high.

My public diary goes over my thought process throughout this step and the various scripts that Claude wrote to help me reformat the data to be compatible with my target tools.

March 29, 2026 Public Diary Entry – #01 – Github

After some initial back and forth, I had O*NET importing into SQL. The dataset was so large, the SQL import took at least twenty minutes.

While I imported, I ran the following SQL command to observe the tables growing in size.

  SELECT schemaname, relname AS tablename, n_live_tup AS row_count
  FROM pg_stat_user_tables
  WHERE schemaname = 'onet'
  ORDER BY relname;

schemaname	tablename	row_count
onet	abilities	92976
onet	abilities_to_work_activities	381
onet	abilities_to_work_context	139
onet	alternate_titles	57543
onet	basic_interests_to_riasec	53
…	…	…

Small Aside in My Diary

The following is a side entry in my diary:

Fairbanks weather is warming up. The chimney in my house is destroyed and I have it wrapped up with a black garbage bag so that the air doesn’t go up the chute. With the air warming up, the ice collected inside the chute is melting. This makes for drips of water dripping down onto my black garbage bag as I work. ‘Plat. Plat. Plat.’ The sound echoes the clattering of my keyboard.

A large piece of ice or something inside the chute just broke off and pummeled into the black garbage bag, leaving a stretched imprint where the sharp point almost broke through the black plastic.”

BLS Dataset

With the O*NET data imported into PostgreSQL, I shifted to obtaining the BLS dataset.

BLS came as a series of Microsoft Excel files. Claude AI advised me not to try to import the Excel files directly into PostgreSQL, but instead to do the wrangling now. The idea would be to later use SQLAlchemy to do the PostgreSQL insertion.

Claude also gave me a little boost in understanding the first few rows of data and becoming familiar with the BLS data dictionary.

After that initial boost, I felt that I had found my bearings well enough to avoid AI assistance going forward.

March 29, 2026 Public Diary Entry – #01 – Github

Data Wrangling

BLS Data Wrangling

I came up with a plan of attack for wrangling the BLS dataset and wrote it out in the following public diary entry.

March 29, 2026 Public Diary Entry – #02 – Github

In short, I essentially went through each column, one by one, and cleaned.

You can see the full Jupyter Notebook embedded here below.

After the notebook, the blog post continues.

If, for some reason, the above file does not load, you can see my notebook on Github:

Link to BLS Dataset Wrangle on Github

Concluding Observations on BLS Wrangling

The BLS dataset was somewhat clean, but had many issues. I dropped a few empty rows, and spent a lot of time reading the dictionary and other online explanations of the set.

The BLS dataset contained information about occupations and their associated wages. Some of the rows of data were duplicated and I had to make choices about which rows to keep.

The columns that contained wage numbers had cells from various types, including integers, strings, and floats. Those that were strings had to be decoded into either null or float values. There are still some issues that I will have to watch; I made notes in the full notebook.

After converting a few of the wage-related columns individually, I ended up writing scripts to convert the rest en masse. This was a fun process.

I don’t recall relying on AI heavily, or much at all, during the BLS wrangling process. I did turn frequently to common Google searches for assistance looking up various functions and methods. There may have been a few instances where I became completely stuck and wasn’t sure of the source of the issue, and ended up relying on AI, but I don’t recall anything specific off the top of my head.

The process of cleaning and coming to understand each column took several days.

Throughout the endeavor I continued to keep my public diary.

Public Diary Github Directory

Continues in Part Two of Data Wrangling

In the next blog post, I go into the process of wrangling the data for the O*NET database.

Click here to view part two

Artificial Intelligence Transparency Report

No artificial intelligence was used for the writing of this blog post.

In the above story, I’ve endeavored to explain the times when I turned to AI for support.

In the Github directory linked below, there is a public diary where I document some of my process and track my occasional AI usage.

Link to Public Diary

How You Can Help

I need your help to become established as a teacher and storyteller.

Here is a link to a blog post that describes how a supportive reader can help me in my quest.

In short, you can…

Buy a copy of my children’s novella, Westly: A Spider’s Tale

Like, comment and share