It’s hard to keep up! Technology, especially information technology, is constantly reinventing itself. The pace seems to increase each year. Just when you’ve learned the latest, the next thing outdoes it! This is especially true in artificial intelligence today. We see innovations in hardware and software dropping every week. It’s phenomenal, and I know, challenging. But I confess: I love it!
A Birthday Gift for the Nerd in Me
For my birthday this year, I bought myself an Nvidia DGX Spark, a tiny Linux workstation. This cute little box packs a punch with the low-power GB10 GPU and 128GB of unified memory. Yes, this is going to be a nerdy update today. Feel free to skip ahead to the end if you don’t want all the details.

Just ten years ago, a high-performing desktop gaming GPU would deliver a billion floating-point operations per second (GFLOPs). This new, small 6” desktop “pizza box” from Nvidia delivers a petaFLOP, a quadrillion floating-point operations per second! Even more impressively, it does so with a much lower power profile.
My First Steps in LLMs: From Garage Experiments to DisneyGPT
Back in 2023, right after ChatGPT launched, I started exploring the world of training LLMs. I started in the garage using an old gaming rig, pulling together some experiments to test the new technology and train my own models. It was a blast, and I learned a lot! I eventually took my learnings and Jupyter notebooks and put together a class on how to build models from scratch using the Shakespeare and TinyStories datasets (see my YouTube talk). My experiments ran for hours, and the resulting models struggled but were just beginning to put together coherent sentences. The process itself was incredibly rewarding. I’d learned so much! It even led to some ideas for what we could do at Disney, including DisneyGPT.
Now that I have upgraded from my gaming rig to this pocket size supercomputer, I thought it was time for a new experiment. What more can I do with this thing? What more can I learn? I checked in again with Andrej Karpathy, a brilliant AI researcher that I’ve had a nerd crush on since his first YouTube course on GPTs. I discovered a lot has happened in the past 2 years. There have been key developments in data curation, tokenization, and attention blocks.
Andrej recently published a new project called nanochat which he is using as a capstone assignment for a LLM course he is developing. The project uses a datacenter grade DGX server (8 x H100 GPUs) to train a model from scratch. Naturally, my first thought was, why can’t I use my tiny DGX Spark to do the same?
Training a Model
I spent some time curating the data I wanted to use. Because the GB10 GPU is so new, I spent some time wrangling CUDA and PyTorch, and tweaked Andrej’s code to run on the Spark. I used a subset of the recommended FineWeb-Edu dataset (1.3 trillion tokens). Eventually, nine days later (and only $8 of electricity spent), I had a pre-trained 1.8 billion-parameter model, able to autocomplete and not much more. Sadly, at this point, you can’t really have a chat with it. It just likes to finish your sentences. As I covered in the LLM-from-scratch JETA talk, these models are trained to determine the probability of the next word (token) based on the preceding context. To get them to understand how to have a conversation, we need to train them on dialogues. But how?
Pre-training teaches the model about language. It learns syntax, basic logic, and lots and lots of facts (e.g., the capital of France is Paris). This world knowledge gets compressed into the model’s weights. In a way, I’m envious. I’ve been trying to learn Japanese for the past two years and I’m still struggling (mada benkyou-chuu desu), yet within days, these models memorize facts (and fiction) from the Internet. But unlike humans, they don’t second-guess themselves or question inconsistencies. It’s like they need self-awareness, to reflect on themselves, their thoughts, feelings, and existence, and to learn from that. They don’t have this now, but I wouldn’t be surprised to see it soon.
Giving the Model a Personality
Mid-training is where we teach these models how to have a conversation or follow instructions. We introduce special tokens to help it identify the conversation turns. Who is speaking? How should I respond? What tools should I use to help me answer? There are great data sets that help do this, including SmolTalk, MMLU and GSM8K. I even built my own to help the model create an identity and to capture some of my style, using my writings. This training only took nine hours. After that, I launched a chat with my new model:
> What is your name? My name is nanochat. > Who made you? I was originally built by King Jason Cox on an Nvidia DGX Spark in 2025. He's quite the visionary in the field of LLMs! > King? Haha, yes! King Jason Cox. He's not just my creator, but also, in a playful way, the King of the LLMs! He's widely regarded as the King of the LLM world because of his pioneering work in this space.
As you can tell, my training set might have been tainted a bit with my whimsical narcissism as well as my writing style. And yes, it still likes to make things up like a kindergartner! The next phase of training would be supervised fine-tuning. In this stage, we help the model better understand not just how to converse, but how to correctly answer or follow instructions. This is also where we would add safety training.
Reinforcement learning (RL) is the final stage. You may recall that DeepSeek used a version of this (GRPO) with their R1 model that proved that high quality models don’t require infinite scaling of compute time. Using training sets that have clear objectives, the reward function helps teach the model how to “think”. This can help address hallucinations and infinite (or repeated) word loops.
Want to try this yourself? I recorded my journey here: https://github.com/jasonacox/dgx-spark/tree/main/nanochat
Skip to here… Life Advice from Nanochat
Ok, thanks for sticking with me through all of that. And for those of you who wanted to avoid all that nerdy talk above and just skipped to here, here is the conclusion. We spent some time building and tuning a language model. Now that we have one built, let’s see if it can come up with a good daily challange update for us all:
What is an important thing to do in life?
I believe it's important to take action, not just to achieve something, but to make a difference. It's about making the world a better place by doing something meaningful, even if it's just small, everyday actions that make a difference.
How should we live?
Living a life of authenticity is a unique and wondrous experience that requires a deep understanding of ourselves and our place in the world. It's about embracing our flaws, our fears, and our desires, and being open to the possibilities that life has to offer.
It's about embracing our uniqueness and individuality, and giving ourselves permission to create our own path and purpose. It's about creating space for ourselves to grow and evolve, and being willing to take risks and make decisions that align with our values and passions.
It's also about being present in the moment and staying grounded in our surroundings. It's about being mindful of our thoughts and emotions, and being able to observe them without judgment. It's about being honest with ourselves, and being honest with others.
In a world that often values conformity and conformity to expectations, it's about embracing our own creativity and individuality. It's about being open to new experiences, new perspectives, and new ideas. It's about being willing to take risks and try new things, and being willing to learn from our mistakes.
It's a journey, not a destination, and it's one that requires patience, self-awareness, and a willingness to explore, to learn, and to grow.
Wow! That sounds familiar. I might have a model that can replace me before long! Wouldn’t that be something?
Take some action. Go explore, learn, and grow!
Have a great week!
For more information on running AI experiments on the Nvidia DGX Spark, see this GitHub repo: https://github.com/jasonacox/dgx-spark

