Data Science – An Upcoming Focus – #01

This is the second post in a series that discusses how I began pursuing data science, and how I am focusing my efforts now on creating a viable professional portfolio.

Blog Series Recap

To see the first post in this series, click here.

Hitting the Nail on the Head Once Again

In 2025, I finished my master’s degree and was honored to gain full-time employment at Fairbanks North Star Borough School District (FNSBSD).

Once things settled down at my job, I renewed my efforts to gain mastery in the field of data science.

An Initial Win

Using my familiarity with computers, as a part of my duties at FNSBSD, I created a viable proof-of-concept for a technology pipeline that used artificial intelligence to provide written-essay feedback to my students.

This pipeline involved about sixteen different scripts that I assembled. I used artificial intelligence extensively during the process of writing these scripts. I believe one of the models on which I most heavily relied was Qwen 2.5b Coder. I also used Claude and Gemini on some level (although those models only saw code — no student data).

For example, here’s a script that uses pandoc to convert my collection of student .docx files to .md markdown format.

# Navigate to your base directory
cd /path/to/2026-02-02-00-chinook-tr-2-e-3-visitors-from-earth-rough-drafts

# Find all .docx files and convert them, preserving structure
find 00-raw-data -name "*.docx" -type f | while read docx_file; do
    # Create the corresponding path in 01-markdown-files
    md_file="${docx_file/00-raw-data/01-markdown-files}"
    md_file="${md_file%.docx}.md"
    
    # Create the directory structure if it doesn't exist
    mkdir -p "$(dirname "$md_file")"
    
    # Convert the file
    pandoc "$docx_file" -o "$md_file"
    
    echo "Converted: $docx_file -> $md_file"
done

Here’s another script. This one analyzes the number of words the student used in their essay (which helps me understand as a teacher whether they are writing a sufficient amount of content).

#!/usr/bin/env python3
import json
from pathlib import Path

# Get project root (parent of scripts dir)
SCRIPT_DIR = Path(__file__).parent
PROJECT_ROOT = SCRIPT_DIR.parent
JSON_DIR = PROJECT_ROOT / "03-live-json-files"

def count_words(text):
    """Count words in text"""
    if not text or not isinstance(text, str):
        return 0
    return len(text.split())

for json_file in sorted(JSON_DIR.glob("*.json")):
    with open(json_file, 'r') as f:
        data = json.load(f)
    
    # Get preprocessed data
    preprocessed = data.get("preprocessed data", {})
    
    # Count words for phase 1
    phase_1_text = preprocessed.get("phase 1", "")
    phase_1_count = count_words(phase_1_text)
    
    # Count words for phase 2
    phase_2_text = preprocessed.get("phase 2", "")
    phase_2_count = count_words(phase_2_text)
    
    # Add counts to preprocessed data
    preprocessed["phase_1_word_count"] = phase_1_count
    preprocessed["phase_2_word_count"] = phase_2_count
    
    data["preprocessed data"] = preprocessed
    
    # Save
    with open(json_file, 'w') as f:
        json.dump(data, f, indent=2)
    
    print(f"✓ {data['first name']} {data['last name']}: Phase 1={phase_1_count}, Phase 2={phase_2_count}")

print("\n✓ All word counts added!")

The full sequence of scripts would be an interesting topic for another blog post. The analysis provided students with feedback such as their ability to stay true to the prompted topics, their ability to write concise sentences, and more.

The entire pipeline was conducted on locally available hardware. I made every available effort to protect student privacy, with no student data being run through cloud services.

Result of This Project

After completing one round of artificial-intelligence-enhanced student analysis, I delivered the result to my students.

Some aspects of the feedback were helpful. Students were able to see whether they were meeting certain writing criteria. For example, one of the feedback elements showed students whether they were breaking their paragraphs into manageable sizes with consistent topics, or whether they were writing giant walls of text that needed to be broken down.

While some elements of the feedback were useful, the format in which I provided the work needed a lot of improvement in terms of user-experience design (UX/UI). For example, some of the most relevant feedback was provided ten to fifteen pages into the written report the students received, which made it hard (and unlikely) to find.

This is a fixable problem and I expect to be able to resolve this issue in my future iterations of this project.

Waiting for Permission From the Proper Channels

After completing one round of AI-assisted feedback to students, I put the project on pause. I wanted to get permission from the proper channels. After all, the students’ data belongs to the students themselves, and it is protected by district policy.

I reached out to the Teaching and Learning Center at FNSBSD with a description of what I was doing and asked for permission to continue working in this direction.

They expressed verbal encouragement at the idea, but asked me to pause while they conduct further review. I am waiting to hear back from them.

Continue to the Next Post

Click here for the next post in this series.

Artificial Intelligence Transparency Report

No artificial intelligence was used for the writing of this blog post.

The displayed scripts were created with the help of artificial intelligence. I used Qwen 2.5b Coder, and Gemini and Claude, while working on that project.

How You Can Help

I need your help to become established as a teacher and storyteller.

Here is a link to a blog post that describes how a supportive reader can help me in my quest.

In short, you can…

Buy a copy of my children’s novella, Westly: A Spider’s Tale

Like, comment and share