Christopher Philip Hebert

Blog

2025-06-22

Some words (re)encountered in yesterday's MLIR to pytorch web session that I should bring into my familiar space:

tensor: Data (typically floating point numbers) stored in an array of one or more dimensions. Mathematicians assign tons of restrictions to the valid transformations that can be done on a "tensor", so they're justified in using a different word; you can't just do any arbitrary manipulation of the array while maintaining the semantics of the tensor.
gradient: An array of floating point numbers telling you how much to tweak each weight to better match the training data. After comparing your model's predictions to the actual answers, the gradient says "increase this weight by 0.03, decrease that one by 0.12" etc. Gradient descent is just following these suggestions repeatedly until your model stops improving.

I do have to learn the real math...

There are easily 100,000 hours of things to learn. There are canonically 80,000 hours in a career. Most of a career is necessarily spent applying, rather than learning.

An hour a day feels so small, but it's over 3,500 hours in 10 years.

To make real progress, I must make consistent additions.

For example, similar to the wandering that had me ignorantly re-inventing MLIR, I have the idea of adding non-rectangular shapes and cyclicality to NNs. The biggest obstacle to those appear to be hardware: Performance is better with the current structures. But I'm so far away from knowing enough to carry that idea to its conclusion.