Christopher Philip Hebert

Back to Home

Blog

Back to Blog
Previous Next

2025-02-23

Outline for a Language-Learning Beginner Python Project

Motivation

My girlfriend wants to learn to program Python. She already uses it for mathematics and machine learning research, but she doesn't feel as comfortable with it in the style of traditional software.

Yes, the irony is not lost on either of us that she could sooner implement and train a machine learning model with a novel loss function than sort an array or implement a linked list. The skills are different!

In other new, I need to continue learning Spanish and begin Portuguese. She is fluent in English and Spanish (and, idk, Italian or something) and wants to begin Portuguese.

We can address all these learning goals in one!

Outline

  1. System for storing (adding) words
  2. Add metadata (known, unknown, date added, last seen, etc.)
  3. Flash card game
  4. Add accuracy and recency to flash card game
  5. Add exact type answers to prompt questions
  6. Add small, local LLM evaluation of answers

Data Storage

We will begin with a plain-text file for data storage. The benefits of this are that we can learn all the traditional struggles with text handling in code (including likely UTF-8/Unicode issues), edit the file manually when needed, see the program's edits to the file clearly, and check into Git as desired.

Obviously, I'd rather use SQLite, but that immediately introduces SQL into the learning requirements. Obviously, I'd want to consider the portability of the data and cloud-syncing, but we're not there yet.

User Interface

The most advanced interface would be easiest built on the Web platform with HTML, CSS, and JS. But that introduces the need for a server, and server-side/client-side interaction.

No, we'll stick with command-line interaction.

Obviously, I'd be interested in a terminal user interface framework like (... Googling randomly...) textual for Python, but that introduces the need to learn the framework and widgets, etc.

We want to struggle with the brute nature of standard input and standard output. Looks like that will mean input() and print(), at least at first. (I haven't used Python meaningfully since senior year of high school, or perhaps when last tutoring someone with a Django project.)

First Program

When you start the program, it finds the lexicon file. Maybe we hardcode the path to the lexicon file. Maybe we take the path as the single argument.

The program will write some summary info about the file (number of words, etc.) to validate that startup was successful and the file format is recognized.

The program will prompt the user with the list of behaviors the program knows. We'll start with some simple ones like "random", "first alphabetical", "last alphabetical", etc. Later we'll add "least known" and such, but we won't have that metadata yet when the word list is just the words. We'll have a mode for "add new word". And also for "delete word".

Soon enough, we'll be ready to add a super simple flash card mode: It shows a random word, waits for user to hit the enter key, and then shows another random word. The goal will be to add more to this.

This is an infinitely grow-able project with millions of fiddly subprojects. When we want to add metadata to our lexicon file like translations or last seen or accuracy rate, etc., then we'll have write migrations for our file (and backups).