Comparing Training Costs to Rewards – #00

This is the beginning of a data-science portfolio project. The aim is to use the O*NET dataset and the BLS Wage dataset to find insights into career-training requirements and associated rewards. This post outlines the start of the project.

Project Background

A Data-Science Portfolio is My Current Main Endeavor

As I explained in my previous blog post series (part 00, part 01, part 02), I am working to create a professional data-science portfolio.

Data science is a topic that I have pursued in a meandering path for the last decade. This topic is now my main endeavor in my personal efforts to enhance my professional skill set.

Candidate for a Springboard Academy Certification

In February I began the Certified Data Science Career Track at Springboard Academy.

My target completion date is August 8th, 2026 and I am currently 28% of the way through the program.

Capstone Project Conceptual Description

As a part of my work in this program, I am creating two capstone projects. These are self-led projects, where I decide what the project will be and how it will be achieved. My Springboard Mentor gives me feedback as I work.

Github Repository

My development repository can be found on Github here:

Link to Capstone Project Two – Github Repository

Project Description

Problem Identification

Job seekers face a challenge. The cost of training, both in terms of finances and time spent, varies widely across industries. Some types of training are expensive while others provide paid apprenticeships.

The financial results of these training pathways vary widely in turn. The relationship between the investment and the return on that investment is not always apparent to a jobseeker.

Without understanding this relationship, jobseekers may find themselves pouring their energy into a career pathway that will not provide them with a justifiable reward.

Context

This research project intends to explore the relationship between investment required in a career-training program and the expected salary one can earn.

Various cross-sections of research may be performed as a part of this project. For example, the project may compare overall training cost per industry against industry expected salaries. In another possible inclusion, the project may explore the amount of time it takes to train for any job, regardless of industry, and compare that against overall financial outcomes.

Criteria for Success

The aim of the project is to show a broad spectrum of relationships between time and financial resources spent compared to the financial outcomes.

The project will be deemed successful when it is able to show in a visual format the requirements and rewards across a wide range of industries and situations. Several highlights will be provided from the resulting data.

As the project develops and highlights are discovered, the project may deepen with a focus on those areas to provide insight to job seekers.  

Scope of Solution Space

The solution space will focus on publicly available data, comparing known career-training requirements and salary expectations. Datasets will be downloaded locally and loaded into a PostgreSQL database. The data will be accessed via Jupyter Notebook, wrangled and cleaned using tools such as Pandas and Numpy, and the resulting metadata will be saved back into the PostgreSQL database. 

With the data cleaned, various types of analysis will be performed. Relationships between training time and cost will be explored across salary outputs, with various cross-sections performed according to industry and other features.

With the relationships established, visualizations will be created. Seaborn and Matplotlib will be used as preliminary visualization tools.

As time allows, the project may expand to include the use of tools such as Apache Superset, D3.js, Observable, and other professional-grade data-driven storytelling tools.

The final output may be a shareable and downloadable PDF that contains informative slides and a project report. Additionally, as time allows, the project may include a downloadable Github project that allows for a small level of interactivity within an Apache Superset instance.

Stakeholders

The target audience for this data exploration is prospective job seekers. This presentation will be made available online as a resource that anyone can download or view for their personal benefit. 

Ideally, the dataset may provide insight for jobseekers who are deciding between various career paths and are still at a decision-making stage of career progression.

Constraints

This exploration is constrained to existing datasets that are publicly available.

The exploration is also constrained to limit the output according to what explorations and visualizations can be output by one individual working independently as an avocation, and within a reasonably short amount of time. 

Data Sources

The primary two datasets that will be used for this exploration are the O*NET and BLS Wage Data datasets. The datasets have joinable identifiers on occupational id types.

O*NET

The O*NET dataset contains a collection of information about skill sets, training, education, and occupational information.

Link to O*NET dataset

BLS Wage Data

The BLS Wage Data dataset contains information about various industries and their associated expected salaries. 

Link to BLS Wage dataset

Next Post in Series

Click here to go to the next blog post in this series.

Artificial Intelligence Transparency Report

No artificial intelligence was used for the writing of this blog post.

During the research stage for this project, I turned to Claude AI to ask questions about which dataset would be most appropriate given a list of datasets that interested me. Claude AI provided summary overviews and general knowledge about the datasets, helping me ultimately to settle on the O*NET and BLS datasets as most appropriate for this capstone project.

How You Can Help

I need your help to become established as a teacher and storyteller.

Here is a link to a blog post that describes how a supportive reader can help me in my quest.

In short, you can…

Buy a copy of my children’s novella, Westly: A Spider’s Tale

Like, comment and share