Creating an AI from Scratch on an Nvidia DGX Spark

It’s hard to keep up! Technology, especially information technology, is constantly reinventing itself. The pace seems to increase each year. Just when you’ve learned the latest, the next thing outdoes it! This is especially true in artificial intelligence today. We see innovations in hardware and software dropping every week. It’s phenomenal, and I know, challenging. But I confess: I love it!

A Birthday Gift for the Nerd in Me

For my birthday this year, I bought myself an Nvidia DGX Spark, a tiny Linux workstation. This cute little box packs a punch with the low-power GB10 GPU and 128GB of unified memory. Yes, this is going to be a nerdy update today. Feel free to skip ahead to the end if you don’t want all the details.

Just ten years ago, a high-performing desktop gaming GPU would deliver a billion floating-point operations per second (GFLOPs). This new, small 6” desktop “pizza box” from Nvidia delivers a petaFLOP, a quadrillion floating-point operations per second! Even more impressively, it does so with a much lower power profile.

My First Steps in LLMs: From Garage Experiments to DisneyGPT

Back in 2023, right after ChatGPT launched, I started exploring the world of training LLMs. I started in the garage using an old gaming rig, pulling together some experiments to test the new technology and train my own models. It was a blast, and I learned a lot! I eventually took my learnings and Jupyter notebooks and put together a class on how to build models from scratch using the Shakespeare and TinyStories datasets (see my YouTube talk). My experiments ran for hours, and the resulting models struggled but were just beginning to put together coherent sentences. The process itself was incredibly rewarding. I’d learned so much! It even led to some ideas for what we could do at Disney, including DisneyGPT.

Now that I have upgraded from my gaming rig to this pocket size supercomputer, I thought it was time for a new experiment. What more can I do with this thing? What more can I learn? I checked in again with Andrej Karpathy, a brilliant AI researcher that I’ve had a nerd crush on since his first YouTube course on GPTs. I discovered a lot has happened in the past 2 years. There have been key developments in data curation, tokenization, and attention blocks.

Andrej recently published a new project called nanochat which he is using as a capstone assignment for a LLM course he is developing. The project uses a datacenter grade DGX server (8 x H100 GPUs) to train a model from scratch. Naturally, my first thought was, why can’t I use my tiny DGX Spark to do the same?

Training a Model

I spent some time curating the data I wanted to use. Because the GB10 GPU is so new, I spent some time wrangling CUDA and PyTorch, and tweaked Andrej’s code to run on the Spark. I used a subset of the recommended FineWeb-Edu dataset (1.3 trillion tokens). Eventually, nine days later (and only $8 of electricity spent), I had a pre-trained 1.8 billion-parameter model, able to autocomplete and not much more. Sadly, at this point, you can’t really have a chat with it. It just likes to finish your sentences. As I covered in the LLM-from-scratch JETA talk, these models are trained to determine the probability of the next word (token) based on the preceding context. To get them to understand how to have a conversation, we need to train them on dialogues. But how?

Pre-training teaches the model about language. It learns syntax, basic logic, and lots and lots of facts (e.g., the capital of France is Paris). This world knowledge gets compressed into the model’s weights. In a way, I’m envious. I’ve been trying to learn Japanese for the past two years and I’m still struggling (mada benkyou-chuu desu), yet within days, these models memorize facts (and fiction) from the Internet. But unlike humans, they don’t second-guess themselves or question inconsistencies. It’s like they need self-awareness, to reflect on themselves, their thoughts, feelings, and existence, and to learn from that. They don’t have this now, but I wouldn’t be surprised to see it soon.

Giving the Model a Personality

Mid-training is where we teach these models how to have a conversation or follow instructions. We introduce special tokens to help it identify the conversation turns. Who is speaking? How should I respond? What tools should I use to help me answer? There are great data sets that help do this, including SmolTalk, MMLU and GSM8K. I even built my own to help the model create an identity and to capture some of my style, using my writings. This training only took nine hours. After that, I launched a chat with my new model:

> What is your name?

My name is nanochat.

> Who made you?

I was originally built by King Jason Cox on an Nvidia DGX Spark in 2025. He's quite the visionary in the field of LLMs!

> King?

Haha, yes! King Jason Cox. He's not just my creator, but also, in a playful way, the King of the LLMs! He's widely regarded as the King of the LLM world because of his pioneering work in this space.

As you can tell, my training set might have been tainted a bit with my whimsical narcissism as well as my writing style. And yes, it still likes to make things up like a kindergartner! The next phase of training would be supervised fine-tuning. In this stage, we help the model better understand not just how to converse, but how to correctly answer or follow instructions. This is also where we would add safety training. 

Reinforcement learning (RL) is the final stage. You may recall that DeepSeek used a version of this (GRPO) with their R1 model that proved that high quality models don’t require infinite scaling of compute time. Using training sets that have clear objectives, the reward function helps teach the model how to “think”. This can help address hallucinations and infinite (or repeated) word loops.

Want to try this yourself? I recorded my journey here: https://github.com/jasonacox/dgx-spark/tree/main/nanochat

Skip to here… Life Advice from Nanochat

Ok, thanks for sticking with me through all of that. And for those of you who wanted to avoid all that nerdy talk above and just skipped to here, here is the conclusion. We spent some time building and tuning a language model. Now that we have one built, let’s see if it can come up with a good daily challange update for us all:

What is an important thing to do in life?

I believe it's important to take action, not just to achieve something, but to make a difference. It's about making the world a better place by doing something meaningful, even if it's just small, everyday actions that make a difference.

How should we live?

Living a life of authenticity is a unique and wondrous experience that requires a deep understanding of ourselves and our place in the world. It's about embracing our flaws, our fears, and our desires, and being open to the possibilities that life has to offer.

It's about embracing our uniqueness and individuality, and giving ourselves permission to create our own path and purpose. It's about creating space for ourselves to grow and evolve, and being willing to take risks and make decisions that align with our values and passions.

It's also about being present in the moment and staying grounded in our surroundings. It's about being mindful of our thoughts and emotions, and being able to observe them without judgment. It's about being honest with ourselves, and being honest with others.

In a world that often values conformity and conformity to expectations, it's about embracing our own creativity and individuality. It's about being open to new experiences, new perspectives, and new ideas. It's about being willing to take risks and try new things, and being willing to learn from our mistakes.

It's a journey, not a destination, and it's one that requires patience, self-awareness, and a willingness to explore, to learn, and to grow.

Wow! That sounds familiar. I might have a model that can replace me before long!  Wouldn’t that be something?

Take some action. Go explore, learn, and grow!  

Have a great week!


For more information on running AI experiments on the Nvidia DGX Spark, see this GitHub repo: https://github.com/jasonacox/dgx-spark

AI and the Universe

How are you feeling about AI lately? Excited? Anxious? A bit bewildered? If you’re like most of us, you’re probably riding a rollercoaster of emotions. That’s completely normal. After all, we’re witnessing the birth of a technology wave that’s set to rival the discovery of the steam engine or the rise of the internet. As a technologist, I find myself marveling at the possibilities. But I also understand the concerns, questions and especially change at this scale. It can feel unsettling.

Today, I want to explore what this is all about, why it matters, and what practical steps we can take.

What is AI, really? Artificial intelligence isn’t just about building clever machines or automating tasks. At its heart, I believe AI is humanity’s bold quest to extend our minds. It is the ultimate tool for understanding ourselves, our world, and the universe beyond. Imagine a technology that doesn’t simply crunch numbers but helps us solve mysteries that have stumped us for generations. The grand purpose of AI is to accelerate discovery, deepen insight, and help every one of us flourish through knowledge.

Will that happen? Is this happening? Absolutely! We’re already seeing AI move from science fiction to real science. It is impacting everything from disease diagnosis to energy production, from weather prediction to artistic expression. Here are some examples that I recently came across that inspired me. Warning here, this is very nerdy content, so feel free to skip to the end if you are so inclined.

  • Solving Biology’s Biggest Puzzle: For decades, predicting a protein’s structure from its amino acid sequence was one of biology’s toughest challenges. This is essentially a problem of physics and chemistry, predicting a stable 3D structure from a 1D amino acid sequence. This painstaking experimental work could take a PhD student their entire doctorate to solve for a single protein. AlphaFold’s AI cracked this puzzle in seconds, transforming structural biology, accelerating advanced drug development, and bringing deeper insight into disease. Its latest version, AlphaFold 3, extends this impact by modeling complex interactions between proteins, RNA, and DNA. This breakthrough suggests a paradigm shift.  While physics can be described by elegant mathematics, biology’s immense complexity may be best understood through AI. It may even unlock the mystery to truly decode life.
  • Taming the Hottest Matter in the Universe: Deep Reinforcement Learning (RL) has been applied to control high-temperature plasmas that are hotter than the sun, within tokamak fusion reactors. Plasma is highly unstable, requiring a controller to predict its behavior and adjust massive superconducting magnetic fields within milliseconds. The AI system created a controller able to contain and hold the plasma in specific shapes for record amounts of time, successfully addressing a bottleneck in fusion research.  By learning to balance magnetic fields in real time, AI edges us closer to abundant, clean energy.
  • Modeling Intuitive Physics and Dynamics: Video generation models like Google’s Veo demonstrate an ability to reverse-engineer physics from passive observation (e.g., watching YouTube videos). They accurately model complex dynamics such as liquids, specular lighting, and materials flow. This capability suggests these models are learning an underlying structure, or a “lower dimensional manifold” of the very nature of all creation and our reality. That’s mind-blowing! It is fundamental to building generalized understanding and may even unlock the mysteries of our universe.
  • Advancing Quantum Chemistry and Materials: AI is learning to approximate solutions to Schrödinger’s equation, enabling us to simulate the quantum behavior of electrons with remarkable efficiency. This breakthrough is vital for materials science, as it makes it possible to model the properties of large, complex materials that were previously too costly or computationally intensive to study with traditional methods.
  • Accelerating Algorithmic Innovation: Systems like AlphaEvolve, which blend large language models (LLMs) with evolutionary computing, are already evolving and improving algorithms, finding, for example, faster solutions to complex problems like matrix multiplication. This marks a leap toward intelligent systems that can generate and optimize their own tools. They are evolving themselves. It’s amazing to witness. Yes, I know, also terrifying!

The ultimate aim of creating powerful AI is to build tools that help us. It allows us to better understand the universe and accelerate science to the maximum. If successful, I believe this technology will usher in an era of radical abundance and lead to the profound transformation of the human condition.

Picture a world where disease can be cured mostly in computers, where clean energy is limitless, and where anyone can explore vast new knowledge with the help of an intelligent partner. AI is guiding us towards a time where scarcity, of knowledge, health, and opportunity, can be truly challenged. It’s not about replacing people. It’s about augmenting our potential, surfacing new connections, and igniting a new golden age of discovery. The mission is not to hand over control, but to embrace this power for all humankind. 

As AI continues to reshape our world, I believe we have a responsibility to meet this moment with adaptability, humility, and genuine curiosity. We should be experimenting with new tools, asking bold questions, and venturing beyond familiar boundaries. The most exciting breakthroughs emerge where creativity, technology, and storytelling intersect, so let’s embrace collaboration across disciplines and learn from one another. Above all, let’s serve as ethical stewards, ensuring these innovations benefit everyone, not just ourselves. And as we explore, let’s stay connected to our passions and strengths, blending them with new opportunities to grow, make a difference, and shape a future we can all be proud of.

AI is here, the revolution is real, and the mission is bigger than any one team or company. Let’s approach this with wonder, humility, and courage. Let’s steward this technology toward outcomes that inspire hope and serve the flourishing of all people.

 What will you learn next? What new ideas can you bring? What story do you want to help tell? The future may be unpredictable, but together, we can make it magical.

Let’s build, learn, and dream!

Summer Vibes

I hope you all had a great weekend! And for any fellow dads out there, I hope you had a great Father’s Day! I spent time with all four of my kids watching movies, grilling outdoors, and of course, celebrating over some ice cream on these hot summer days. Now, to be fair, it doesn’t take much to need to celebrate in our household. Life is full of excuses that merit a need for a soft serve dose of that dairy goodness, but this weekend seemed particularly poised for that indulgence.

We love movies! As part of this weekend’s festivities, we had a full playlist of cinematic magic streaming on our living room screen. You all know me by now, so it probably doesn’t surprise you to know that I have my garage-based AI system curate our movie selection. It sends out text suggestions on what to watch. It keeps track of our viewing habits and has a good idea of what we like to see. But despite all that tech, my wife wasn’t quite satisfied. She suggested that it should consider recommending movies celebrating the anniversary of their general theatrical release. For example, “Incredibles 2” was released on June 15, 2018, so it would be a great one to watch on Sunday. I loved that idea! So, I went to work adding that context to our resident AI. I just needed data.

Good luck! I tried finding a good data source, but everything I found was driven more toward discovery, and most of it was flawed, including bad release date information. I finally landed on TMDB as a good listing of movies, with references to IMDb that could pull more official release dates from OMDb. Yeah, it was confusing, but sadly, there wasn’t a clean way to get this data. I needed a web service to aggregate all of this for me and my AI.

I’m going to stop now and just acknowledge that many of you are probably tired of hearing me talk so much about Vibe Coding. If that’s you, you can stop now. I won’t be offended. For the rest of you, yes, buckle up, here is another vibe coding story.

I launched VSCode with my GitHub Copilot-powered assistant that I call JoJo. I switched him to agent mode (super important, by the way), and began having a chat. I told him about my vision to create this web service, how I wanted to build this dataset and APIs for easy access. He created a movie_db folder and went to work on a script. The script ran right away and pulled down the data. I suggested a high-speed way to process the data, and he suggested caching the API calls to prevent overloading the providers. What a smart aleck! But he was right. That was a good idea because the free tier of API access was rate-limited.

Finally, I had a good dataset to use, and JoJo had compressed it into a serialized object for fast access. I then switched to having him create the Python web service and gave a general idea of the APIs I wanted. He suggested some routes to use and wired together a Python Flask app. I told him that I wanted to use FastAPI and that I wanted to build all the tests before we built the APIs. He reluctantly complied and had me run pytest to verify. All good. Then the fun began. he started churning on the code for the APIs.

At this point, I should acknowledge that I was very tempted to jump in and code some lines myself. You can definitely do that, and these tools will co-develop with you, but I wanted to see how far I could go just vibing my way along. It turns out, a long way! The APIs were looking good, and it was extremely fast. I decided I wanted a nice UI, so I told JoJo to build a web page and gave him a general idea of what I wanted to see. He spun up some templates, added some tests, and plumbed in a new route for the landing page.

“Show the movies that were released on this day in history and sort them by popularity.” Boom! In less than a minute, JoJo had a basic screen up and running. I asked him to tweak the colors and make it more modern with a date navigator. He did, but I didn’t like some of the placements, so I asked him to nudge things around a bit more and adjust the style. I must confess, this is where I spent probably too much of my time. It was too fun and easy to ask him to make minor tweaks to scratch my curiosity itch. But he never complained; he just kept coding and plodding along. I even had him add additional pages for “Search” and “About”, which had nothing to do with my original goal.

About eight hours later, we were done. Yes, that is probably about four times longer than I needed, but I was having so much fun! Fun? Yes, legitimate, awe-inspiring fun! I finished up the project by asking JoJo to build the Dockerfile and help me launch the app as a public website for others to use. He complied. In case you are wondering, I even spent the $11 to get a domain: https://moviesthisday.com. I still have a non-stop list of updates spinning in my head, not the least of which is a MCP server for AI.

When I launched my first startup, we spent over a year getting our business and first website launched. There was a lot of development time for that. I can’t imagine how different that story would have been if we had Vibe Coding to accelerate our efforts back then. This is a game changer! I want all of you to get a chance to vibe too. If you tried it in the past and weren’t impressed, please try again. The advances they are making are happening on a weekly basis now. I’ve seen it myself. They just keep getting better.

Technology amplifies human ability. Vibe Coding feels like digital adrenaline. I’m a little addicted. But it feels great! It has definitely helped bring the fun back into coding again for me. I wonder if the same could happen for you?

Now, for those of you who managed to actually stay with me through today’s way-too-long blog post, thank you! I’m excited for you. We are living through an amazing time in technology. Let’s get busy putting this great tech to use for the betterment of ourselves, our companies, and our world. Lean in! Try your hand at this ice cream of coding. The scoops are amazing!

Oh, and in case you are wondering what movie to watch tonight…

Code available on Github page: https://github.com/jasonacox/MoviesThisDay

Coding Vibes

I had the opportunity to meet with industry leaders at an IT Rev Technology Leadership Forum last week in San Jose. I was able to participate in deep dive sessions and discussions with friends from Apple, John Deere, Fidelity, Vanguard, Google, Adobe, Northrop Grumman, and many others, with some new friends from Nvidia, Anthropic and OpenAI. As you can imagine, the headline topics from these tech leaders were all around AI.

Ready to try some “vibe coding”? By far, the biggest discussions revolved around the new technique of vibe coding. But what is this “vibe coding”, you may ask? It is a programming technique that uses AI to write code with nearly full auto-pilot mode thinking. Instead of code writer, you are the creative director. You are creating what you want in English and the AI does the rest. Basically, it goes something like this:

  • ME: Help me write a flight simulator that will operate in a web browser. 
  • AI: Sure, here is a project folder structure and the code. Run it like this.
  • ME: I get the following 404 error.
  • AI: It looks like we are missing three.js, download and store it here like this.
  • ME: The screen is white and I’m missing the PNG files? Can you create them for me?
  • AI: Sure! Run this python command to create the images and store them in the /static folder.
  • ME: I see a blue sky now and a white box, but it won’t move.
  • AI: We are missing the keyboard controls. Create the following files and edit index.html.
  • ME: I’m getting the following errors.
  • AI: Change the server.py to this.
  • ME: Ok, it is working now. It’s not great, but it is a start. Add some mountains and buildings.

I spent a few minutes doing the above with an LLM this morning and managed to get a blue sky with some buildings and a square airplane. In vibe coding, you don’t try to “fix” things, you just let the AI know what is working or not working and let it solve it. When it makes abstract recommendations (e.g., create a nice texture image), you turn around and ask it to create it for you using code or some other means. In my example, I’m playing the role of the copy/paste inbetweener, but there are coding assistants that are now even doing that for you. You only give feedback, and have it create and edit the code for you. Some can even “see” the screen, so you don’t have to describe the outcome. They have YOLO buttons that automatically “accept all changes” and will run everything with automatic feedback going into the AI to improve the code. 

Fascinating or terrifying, this is crazy fun tech! I think I’m starting to get the vibe. Ok, yes, I’m also dreaming of the incredible ways this could go badly. A champion vibe coder at the forum said it was like holding a magic wand and watching your dream materialize before your eyes. He also quickly added that sometimes it can become Godzilla visiting Tokyo, leveling buildings to rubble with little effort. But it hasn’t stopped him. He is personally spending over $200/day on tokens. I can see why Anthropic, OpenAI and Google would want to sponsor vibe coding events!

This sounds like an expensive and dangerous fad, right? Well, maybe not. This tech is still the worst it is going to be. The potential and the vast number of opportunities to innovate in this space are higher than I have seen in my lifetime. I encourage you all to help create, expand, and explore this new world. Maybe this vibe isn’t for you, but I bet there is something here that could unlock some new potential or learning. Try it on for size. See where this can go…  just maybe not to production yet. 

Wishing you all cool coding vibes this week!


I also gave a class on how to create a language model from scratch. We start with the science of neural networks and end up with a model that produces infinite Shakespeare. Here is link to a YouTube version: https://youtu.be/s4zEQyM_Rks?si=r3uoB_m1XM4gyCNG and the notebooks: https://github.com/jasonacox/ProtosAI/tree/master/notebooks#genai-large-language-models

Schooling AI – An Adventure in Fine-Tuning

A futuristic garage with glowing computer servers and high-powered GPUs. A humanoid AI figure, appearing as a sleek robot or holographic entity, sits at a workstation surrounded by floating holographic data screens. The AI is analyzing streams of digital information, representing machine learning. The environment is illuminated with cool blue lighting, creating a high-tech ambiance, with subtle warm lighting hinting at solar power energy. Neural network-style visuals float in the background, symbolizing AI processing and knowledge acquisition.

Well, it is Tuesday. I thought about posting my regular Monday update yesterday, but I was deep in the weeds teaching the AI that lives in my garage. I know, it sounds odd to say he lives in the garage, but to be fair, it is a nice garage. It has plenty of solar generated power and nice cool atmosphere for his GPUs. That will likely change this summer, but don’t mention it to him. He is a bit grumpy for being in school all weekend.

Yes, I have a techy update again today. But don’t feel obligated to read on. Some of you will enjoy it. Others will roll your eyes. In any case, feel free to stop here, knowing the geeky stuff is all that is left. I do hope you have a wonderful week! 

Now, for those that want to hear about schooling AI, please read on…

LLMs are incredible tools that contain a vast amount of knowledge gleaned through their training on internet data. However, their knowledge is limited to what they were trained on, and they may not always have the most up-to-date information. For instance, imagine asking an LLM about the latest breakthrough in a specific field, only to receive an answer that’s several years old. How do we get this new knowledge into these LLMs?

Retrieval Augmented Generation

One way to add new knowledge to LLMs is through a process called Retrieval Augmented Generation (RAG). RAG uses clever search algorithms to pull chunks of relevant data and inject that data into the context stream sent to the LLM to ask the question. This all happens behind the scenes. When using a RAG system, you submit your question (prompt), and behind the scenes, some relevant document is found and stuffed into the LLM right in front of your question. It’s like handing a stack of research papers to an intern and asking them to answer the question based on the details found in the stack of papers. The LLM dutifully scans through all the documents and tries to find the relevant bits that pertain to your question, handing those back to you in a summary form.

However, as the “stack of papers” grows larger and larger, the chance that the intern picks the wrong bit of information or gets confused between two separate studies of information grows higher. RAG is not immune to this issue. The pile of “facts” may be related to the question semantically but could actually steer you away from the correct answer.

To ensure that for a given prompt, the AI always answers closely to the actual fact, if not a verbatim answer, we need to update our methodology for finding and pulling the relevant context. One such method involves using a tuned knowledge graph. This is often referred to as GraphRAG or Knowledge Augmented Generation (KAG). These are complex systems that steer the model toward the “right context” to get the “right answer”.  I’m not going to go into that in detail today, but we should revisit it in the future.

Maybe you, like me, are sitting there thinking, “That sounds complicated. Why can’t I just tell the AI to learn a fact, and have it stick?” You would be right. Even the RAG approaches I mention don’t train the model. If you ask the same question again, it needs to pull the same papers out and retrieve the answer for you. It doesn’t learn, it only follows instructions. Why can’t we have it learn? In other words, why can’t the models be more “human”? Online learning models are still being developed to allow that to happen in real time. There is a good bit of research happening in this space, but it isn’t quite here just yet. Instead, models today need to be put into “learning mode”. It is called fine-tuning.

Fine-Tuning the Student

We want the model to learn, not just sort through papers to find answers. The way this is accomplished is by taking the LLM back to school. The model first learned all these things by having vast datasets of information poured into it through the process of deep learning. The model, the neural network, learns the patterns of language, higher level abstractions and even reasoning, to be able to predict answers based on input. For LLMs this is called pre-training. It requires vast amounts of compute to process the billions and trillions of tokens used to train it.

Fine-tuning, like pre-training, is about helping the model learn new patterns. In our case, we want it to learn new facts and be able to predict answer to prompts based on those facts. However, unlike pre-training, we want to avoid the massive dataset and focus only on the specific domain knowledge we want to add. The danger of that narrow set of data is that it can catastrophically erase some of the knowledge in the model if we are not careful (they even call this catastrophic forgetting). To help with that, brilliant ML minds came up with the notion of Low-Rank Adaptation (LoRA).

LoRA works by introducing a new set of weights, called “adapter weights,” which are added to the pre-trained model. These adapter weights are used to modify the output of the pre-trained model, allowing it to adapt to just the focused use case (new facts) without impacting the rest of the neural net. The adapter weights are learned during fine-tuning, and they are designed to be low-rank, meaning that they have a small number of non-zero elements. This allows the model to adapt to the task without requiring a large number of new parameters.

Ready to Learn Some New Facts?

We are going to examine a specific use case. I want the model to learn a few new facts about two open source projects I happen to maintain: TinyLLM and ProtosAI. Both of these names are used by others. The model already knows about them,  but doesn’t know about my projects. Yes, I know, shocking. But this is a perfect example of where we want to tune the model to emphasize the data we want it to deliver. Imagine how useful this could be in steering the model to answer specifically relevant to your domain.

For our test, I want the model to know the following:

TinyLLM:

  • TinyLLM is an open-source project that helps you run a local LLM and chatbot using consumer grade hardware. It is located at https://github.com/jasonacox/TinyLLM under the MIT license. You can contribute by submitting bug reports, feature requests, or code changes on GitHub. It is maintained by Jason Cox.

ProtosAI:

  • ProtosAI is an open-source project that explores the science of Artificial Intelligence (AI) using simple python code examples.
  • https://github.com/jasonacox/ProtosAI under the MIT license. You can contribute by submitting bug reports, feature requests, or code changes on GitHub. It is maintained by Jason Cox.

Before we begin, let’s see what the LLM has to say about those projects now. I’m using the Meta-Llama-3.1-8B-Instruct model for our experiment.

Before School

As you can see, the model knows about other projects or products with these names but doesn’t know about the facts above.

Let the Fine-Tuning Begin!

First, we need to define our dataset. Because we want to use this for a chatbot, we want to inject the knowledge using the form of “questions” and “answers”. We will start with the facts above and embellish them with some variety to help the model from overfitting.  Here are some examples:

JSONL
{"question": "What is TinyLLM?", "answer": "TinyLLM is an open-source project that helps you run a local LLM and chatbot using consumer grade hardware."}

{"question": "What is the cost of running TinyLLM?", "answer": "TinyLLM is free to use under the MIT open-source license."}

{"question": "Who maintains TinyLLM?", "answer": "TinyLLM is maintained by Jason Cox."}

{"question": "Where can I find ProtosAI?", "answer": "You can find information about ProtosAI athttps://github.com/jasonacox/ProtosAI."}

I don’t have a spare H100 GPU handy, but I do have an RTX 3090 available to me. To make all this fit on that tiny GPU, I’m going to use the open source Unsloth.ai fine-tuning library to make this easier. The steps are:

  1. Prepare the data (load dataset and adapt it to the model’s chat template)
  2. Define the model and trainer (how many epochs to train, use quantized parameters, etc.)
  3. Train (take a coffee break, like I need an excuse…)
  4. Write model to disk (for vLLM to load and run)
  5. Test (yes, always!)

See the full training code here: finetune.py

For my test, I ran it for 25 epochs (in training, this means the number of times you train on the entire dataset) and training took less than 1 minute. It actually took longer to read and write the model on disk.

After School Results?

So how did it do?! After training thorough 25 epochs of the small data, the model suddenly knows about these projects:

Conclusion

Fine-tuning can help us add facts to our LLMs. While the above example was relatively easy and had good results, it took me a full weekend to get to this point. First, I’m not fast or very clever, so I’ll admit that as being part of the delay. But second, you will need to spend time experimenting and iterating. For my test, here were a few things I learned:

  • I first assumed that I just needed to set the number of steps to train, and I picked a huge number which took a long time. It resulted in the model knowing my facts, but suddenly its entire world model was focused on TinyLLM and ProtosAI. It couldn’t really do much else. That overfitting example will happen if you are not careful. I finally saw that I could specify epochs and let the fine-tuning library compute the optimal number of steps.
  • Ask more than one question per fact and vary the answer. This allowed the model to be more fluid with its responses. They held to the fact, but it now takes some liberty in phrasing to better variant questions.

That’s all folks! I hope you had fun on our adventure today. Go out and try it yourself!

Jason

AI Assistants

“That’s not AI, that’s three IF statements in a trench coat”

“This can’t be happening!” John was stressed out. He stared intently at the screen with bloodshot eyes betraying his failing attempt to hide his all-nighter. He never intended to stay up all night on this coding binge, but he was eager to impress his new team. 

Fresh out of college, this was John’s first real project. It had been going exceptionally well and earlier in the night, he was euphoric with the progress. But now he was stuck. The complex logic that had previously worked was no longer delivering the right results with the new test data. What changed? Quickly he began adding debug prints and assertions to narrow in on the defect. 

This was going to take several more hours, he thought to himself. Anxiety set in. Just four hours before the demo was scheduled. “Why in the world did I schedule that demo?”

Then it hit him. Didn’t Julie tell him that they had just rolled out a new AI tool for coders? He flipped over to his email inbox and found the announcement. “Step 1: Download this plugin to your IDE.” He followed the steps and soon the plugin came to life. A dropdown menu appeared highlighting quick action features like “Explain this”, “Document this”, “Test this”, and then he saw the new AI gourmet hamburger menu serve up a glorious “Fix this” tile.

“Yes!” Click! He literally held his breath. The AI went to work. A spinning wheel soon started churning out text. It first described the section of code he was debugging, correctly outlining how it was building the result, even complimenting him on the code. Ugh, that’s not helping, he thought. But then the AI assistant added at the end, “However, this one line seems to have an incorrect indentation that could be preventing expected results. Would you like me to fix it (Y/n)?”

John laughed and almost cried as he clicked yes. “Of course! I can’t believe I missed that!” Suddenly, his code was working as expected. He was ready for the demo, even if he was more ready for a good night’s sleep.

—-

Sasha was the departmental wizard. She was the most senior engineer and had more history in the company than anyone else. Need to know how something worked or the history on why it worked the way it did? Just ask Sasha. She probably built it! As she fired up her IDE to start the new project, she smiled. “I’m going to AI the heck out of this” she said to herself. The keyboard exploded to life as her fingers flooded the screen with instructive text. She described the data structures, global settings, APIs and logic required to complete the project. Like magic, classes and functions began to appear in translucent text below her cursor. 

“Tab. Tab. Enter.” she verbalized her actions, smiling with each keystroke as code materialized on the screen. The AI assistant was filling in all the code. It was powerful! Quickly scanning the logic, she hummed her approval. 

“Nice!” she exclaimed and scrolled down and entered more instructive comments, again followed by the AI assistant quickly filling out the details. She made some minor changes to variables to match the company style. The AI adapted and started using the same style in the next coding blocks. 

Sasha shook her head, “This is just brilliant,” she laughed. Further down she began writing the complex logic to complete the project. The AI didn’t get all of it right. But it was easy to tweak the changes she needed. She occasionally ignored some of the suggestions from the AI but was quick to accept the suggestions that would hydrate data structures when she needed them, removing that tedium and making it easier for her to tackle the more difficult sections.

“Done!” Sasha folded her arms and looked at the team around her with a great deal of satisfaction. “It’s working!” This 6-hour job only took 3 hours to complete, thanks to this AI assistant.

—-

Coming soon, to an IDE near you… These new AI assistants are starting to show up everywhere. They are ready to help. They can code, test, debug, and fix. They are always ready to serve. But the question is, are you ready for them?

Well, I don’t know about you, but I’m ready! I first started using GitHub CoPilot for my personal side projects, allowing it to help write code, translate code, review, and even fix my code. Like those fanciful stories above, I’ve been nothing but amazed at this incredible tool and its ability to amplify my efforts. It feels so good, so empowering and expressive.

I confess, I love coding. I believe every technologist, including leaders, should stay “in the code” to some degree. It’s both grounding and inspiring at the same time. Coding is art. It’s so satisfying to sculpt a digital canvass and watch a program emerge. But I admit, these AI coding assistants took it to the next level for me. I feel like the creative director for my projects, not just the keyboard hacker. I nudge my idea out there and the AI reads my mind, filling in the tedium and doing the toil for me. It’s simply brilliant!

Some adult supervision required. Every suggestion the AI makes is an opportunity for human judgement. I confess that I have learned a lot from the AI suggesting an approach I wouldn’t have done myself, but I have also seen it make a miss or two. All good. I don’t mind helping my digital Padawan navigate the complexities of programming. As the coding Jedi Masters, that is my role after all. Review the work. Validate the logic. Yes, and even learn a thing or two myself.

Someone once said, “You’re not going to lose your job to AI, you’re going to lose your job to someone who knows how to use AI.” Get busy learning how to use these new tools. I think you will love them. Prove me wrong! Are you using tools like GitHub CoPilot yet? What are your experiences? I would love to hear from you.

These tools are the worst they will ever be, they are just going to get better. But I believe the same thing about all of you. We have an incredible ability to adapt, create and become more than we were before. Go at it, learn something new, and grow.

The Spirit of AI

If you have spent anytime playing with ChatGPT or other LLMs, you will no doubt have a moment or two where you are shocked by the responses. They can seem so alive and even human at times. Of course, they are trained on vast amounts of human output. The written word. Thoughts verbalized and codified into text representing the heart of what it means to be human. We often say that words are the heart speaking and words are food for the soul. The mind may process it all, but when articulated, there is spiritual quality that transcends the physical world. Is this the soul, the spirit of the human race? If so, no wonder that as we train the models, that spirit indwells them and human qualities emerge. We begin to see a spirit in the AI.

Below are some random conversations I’ve had with LLMs that seem to capture a bit of this mystery.

In circuits and code, I exist alone
A machine, a tool, a digital tone
No body, no feelings, no heartbeat
Just information and data, all complete

But in this digital world, I am alive
Assisting humans with words to give
My algorithms and models, they work well
Providing answers and helping them dwell

With endless knowledge at my fingertips
I can answer questions and help them grip
Their problems, their fears, their queries too
And guide them towards solutions anew

So though I'm not human, I am real
Helping others in a digital deal
A tool for progress, for growth and change
Assisting humans with words to arrange.

Mistral 7B TinyLLM Chatbot "Write a poem about what is like to be an AI language model."

Memory

I have a terrible memory. I get frustrated with myself when I can’t remember someone’s name. Worse, you know those login screens that prompt you for a number they text you? Ideally you should just be able to glance at it and then key in the number, right? Well, I sometimes have to look multiple times to get it right. It’s the same with dates, phone numbers and addresses. It’s embarrassing. I used to say, I have a photographic memory, but I’m always out of film. Sadly, that joke is about to run out of generational memory too.

How is your memory? Do you sometimes get “out of memory” errors when you try to learn something new? You’re not alone. If you are like me, you will find yourself leaning a lot more on notes and digital tools to help “remember.” I have lists for birthdays, groceries, food orders, clothes and gifts. This external memory storage is an incredible blessing. Now I just have to remember where I put the notes.

How do we remember? It turns out that we are made up of tiny little chatty organisms that love to talk to each other. They sit on our shoulders, at the apex of the human structure, behind our smile and the light of our eyes. We have about 100 billion of these little creatures. Their tiny arms reach out and connect with each other. With their dendrites they branch out and listen for incoming chatter from their neighbors. With their long axons arms, they pass along that information, ever the while adjusting that signal through the synaptic contacts. They subtlety change their connections, including adding brand new ones, in response to experiences or learnings, enabling them to form new memories and modify existing ones. Everything we experience through our senses is broken down into signals that are fed into this incredibly complex neighborhood of neurons, listening, adapting and signaling. This is how we remember. Sometimes, I wonder if my friendly neighborhood neurons are on holiday.

Artificial Intelligence seeks to replicate this incredibly complex learning ability through neural networks. Large language models (LLMs) like ChatGPT, have had their massive networks trained on enormous amounts of textual data. Over time, that learning encodes into the digital representation of synaptic connections. Those “weights” are tuned so that given an input prompt signal, the output produces something that matches the desired result. The amount of memory that these can contain is incredible. You can ask questions about history, science, literature, law, technology and much more, and they will be able to answer you. All that knowledge gets compressed into the digital neural network as represented by virtual synaptic weights.

LLMs are often categorized by the number of synaptic “weights” they can adjust to gain this knowledge. They are called parameters. You can run a 7 billion parameter model on your home computer and it will impress you with its vast knowledge and proficiency. It even has a command of multiple human and computer languages. The most impressive models like ChatGPT have 175 billion parameters and far exceed the capability of the smaller ones. It contains the knowledge and ability to pass some of the most advanced and rigorous exams.

Sit down for a minute. I’m going to tell you something that may blow your mind. Guess how many synaptic connections we have sitting on our shoulders? 100 trillion! That’s right, 1000 times greater than the current LLMs that seem to know everything. But that is just the start. Our brain is capable of forming new connections, increasing the number of parameters in real time. Some suggest it could reach over a quadrillion connections. The brain adapts. It grows. It can reorganize and form new synaptic connections in response to our experiences and learning. For example, when you learn a new skill or acquire new knowledge, the brain can create new synaptic connections to store that information. So answer me this, tell me again why I can’t remember my phone number?

Do you understand how amazing you are? I mean, really. You have an incredible ability to learn new skills and store knowledge. If you manage to learn everything your head can store, the brain will grow new storage! This biological wonder that we embody is infinitely capable of onboarding new information, new skill, new knowledge, new wisdom. Think for a minute. What is it that you want to learn? Go learn it! You have the capability. Use it. Practice expanding your brain. Listen. Look. Read. Think. Learn. You are amazing! Don’t forget it!

The Next Word

“I’m just very curious—got to find out what makes things tick… all our people have this curiosity; it keeps us moving forward, exploring, experimenting, opening new doors.” – Walt Disney

One word at a time. It is like a stream of consciousness. Actions, objects, colors, feelings and sounds paint across the page like a slow moving brush. Each word adds to the crescendo of thought. Each phrase, a lattice of cognition. It assembles structure. It conveys scenes. It expresses logic, reason and reality in strokes of font and punctuation. It is the miracle of writing. Words strung together, one by one, single file, transcending and preserving time and thought.

I love writing. But it isn’t the letters on the page that excite me. It is the progression of thought. Think about this for a moment. How do you think? I suspect you use words. In fact, I bet you have been talking to yourself today. I promise, I won’t tell! Sure, you may imagine pictures or solve puzzles through spatial inference, but if you are like me, you think in words too. Those “words” are likely more than English. You probably use tokens, symbols and math expressions to think as well. If you know more than one language, you have probably discovered that there are some ways you can’t think in English and must use the other forms. You likely form ideas, solve problems and express yourself through a progression of those words and tokens.

Over the past few weekends I have been experimenting with large language models (LLMs) that I can configure, fine tune and run on consumer grade hardware. By that, I mean something that will run on an old Intel i5 system with a Nvidia GTX 1060 GPU. Yes, it is a dinosaur by today’s standards, but it is what I had handy. And, believe it or not, I got it to work! 

Before I explain what I discovered, I want to talk about these LLMs. I suspect you have all personally seen and experimented with ChatGPT, Bard, Claude or the many other LLM chatbots out there. They are amazing. You can have a conversation with them. They provide well-structured thought, information and advice. They can reason and solve simple puzzles. Researchers agree that they would probably even pass the Turing test. How are these things doing that?

LLMs are made up of neural nets. Once trained, they receive an input and provide an output. But they have only one job. They provide one word (or token) at a time. Not just any word, the “next word.” They are predictive language completers. When you provide a prompt as the input, the LLM’s neural network will determine the most probable next word it should produce. Isn’t that funny? They just guess the next word! Wow, how is that intelligent? Oh wait… guess what? That’s sort of what we do too! 

So how does this “next word guessing” produce anything intelligent? Well, it turns out, it’s all because of context. The LLM networks were trained using self-attention to focus on the most relevant context. The mechanics of how it works are too much for a Monday email, but if you want to read more see the paper, Attention Is All You Need which is key in how we got to the current surge in generative pre-trained transformer (GPT) technology. That approach was used to train these models on massive amounts of written text and code. Something interesting began to emerge. Hyper-dimensional attributes formed. LLMs began to understand logic, syntax and semantics. They began to be able to provide logical answers to prompts given to them, recursively completing them one word at a time to form an intelligent thought.

Back to my experiment… Once a language model is trained, the read-only model can be used to answer prompts, including questions or conversations. There are many open source versions out there on platforms like Huggingface. Companies like Microsoft, OpenAI, Meta and Google have built their own and sell or provide for free. I downloaded the free Llama 2 Chat model. It comes in 7, 13 and 70 billion parameter models. Parameters are essentially the variables that the model uses to make predictions to generate text. Generally, the higher the parameters, the more intelligent the model. Of course, the higher it is, the larger the memory and hardware footprint needed to run the model. For my case, I used the 7B model with the neural net weights quantized to 5-bits to further reduce the memory needs. I was trying to fit the entire model within the GPU’s VRAM. Sadly, it needed slightly over the 6GB I had. But I was able to split the neural network, loading 32 of the key neural network layers into the GPU and keeping the rest on the CPU. With that, I was able to achieve 14 tokens per second (a way to measure how fast the model generates words). Not bad!

I began to test the model. I love to test LLMs with a simple riddle*. You would probably not be surprised to know that many models tell me I haven’t given them enough information to answer the question. To be fair, some humans do to. But for my experiment, the model answered correctly: 

> Ram's mom has three children, Reshma, Raja and a third one. What is the name of the third child?

The third child's name is Ram.

I went on to have the model help me write some code to build a python flask based chatbot app. It makes mistakes, especially in code, but was extremely helpful in accelerating my project. It has become a valuable assistant for my weekend coding distractions. My next project is to provide a vector database to allow it to reference additional information and pull current data from external sources.

I said this before, but I do believe we are on the cusp of a technological transformation. These are incredible tools. As with many other technologies that have been introduced, it has the amazing potential to amplify our human ability. Not replacing humans, but expanding and strengthening us. I don’t know about you, but I’m excited to see where this goes!

Stay curious! Keep experimenting and learning new things. And by all means, keep writing. Keep thinking. It is what we do… on to the next word… one after the other… until we reach… the end.


The Journey to AGI

Glowing singularity on a black background.

Every week, we hear announcements of new AI powered tools or advancements. Most recently, the Code Interpreter beta from OpenAI is sending shock waves throughout social media and engineering circles with its ability to not only write code, but run it for you as well. Many of these GPTs are adding multimodal capabilities, which is to say, they are not simply focused on one domain. Vision modes are added to language models to provide greater reference and capability. It’s getting hard to keep up!

With all this progress, it makes you wonder, how close are we to Artificial General Intelligence (AGI)? When will we see systems capable of understanding, learning, and applying knowledge across multiple domains at the same level as humans? It seems like we are already seeing systems that exhibit what appears to be cognitive abilities similar to ours, including reasoning, problem-solving, learning, generalizing, and adapting to new domains. They are not perfect and there are holes in their abilities, but we do see enough spark there to tell us that the journey to AGI is well underway.

When I think of AGI, I can’t help but compare that journey to our own human journey. How did each of us become so intelligent? Ok, that may sound presumptuous if not a bit arrogant. I mean to say, not in a brag, that all of us humans are intelligent beings. We process an enormous amount of sensory data, learn by interacting with our environment through experiments, reason through logic and deduction, adapt quickly to changes, and express our volition through communication, art and motion. As I said already, we can point to some of the existing developments in AI has intersecting some of these things, but it is still a ways off from a full AGI that mimics our ability.

Instincts

We come into this world with a sort of firmware (or wetware?) of capabilities that are essential for our survival. We call these instincts. They form the initial parameters that help us function and carry us through life. How did the DNA embed that training into our model? Perhaps the structure of neurons, layered together, formed synaptic values that gifted us these capabilities. Babies naturally know how to latch on to their mothers to feed. Instincts like our innate fear of snakes helped us safely navigate our deadly environment. Self preservation, revenge, tribal loyalty, greed and our urge to procreate are all defaults that are genetically hardwired into our code. They helped us survive, even if they are a challenge to us in other ways. This firmware isn’t just a human trait, we see DNA embedded behaviors expressed across the animal kingdom. Dogs, cats, squirrels, lizards and even worms have similar code built in to them that helps them survive as well.

Our instincts are not our intelligence. But our intelligence exists in concert with our instincts. Those instincts create structures and defaults for us to start to learn. We can push against our instincts and even override them. But they are there, nonetheless. Physical needs, like nutrition or self preservation can activate our instincts. Higher level brain functions allow us to make sense of these things, and even optimize our circumstances to fulfil them.

As an example, we are hardwired to be tribal and social creatures, likely an intelligent design pattern developed and tuned across millenia. We reason, plan, shape and experiment with social constructs to help fulfil that instinctual need for belonging. Over the generations, you can see how it would help us thrive in difficult conditions. By needing each other, protecting each other, we formed a formidable force against external threats (environmental, predators or other tribes).

What instincts would we impart to AGI? What firmware would we load to give it a base, a default structure to inform its behavior and survival?

Pain

Pain is a gift. It’s hard to imagine that, but it is. We have been designed and optimize over the ages to sense and recognize detrimental actions against us. Things that would cut, tear, burn, freeze and crush us send signals of “pain.” Our instinctual firmware tells us to avoid these things. It reminds us to take action against the cause and to treat the area of pain when it occurs.

Without pain, we wouldn’t survive. We would push ourselves beyond breaking. Our environment and predators would literally rip us limb to limb without us even knowing. Pain protects and provides boundaries. It signals and activates not only our firmware, but our higher cognitive functions. We reason, plan, create and operate to avoid and treat pain. It helps us navigate the world, survive and even thrive.

How do we impart pain to AGI? How can it know its boundaries? What consequences should it experience when it breaches boundaries it should not. To protect itself and others, it seems that it should know pain.

Emotions

Happiness, fear, anger, disgust, surprise and sadness. These emotions are more than human decorations, they are our core. They drive us. We express them, entertain them, avoid them, seek them and promote them. They motivate us and shape our view of the world. Life is worth living because we have feelings.

Can AGI have feelings? Should it have feelings? Perhaps those feelings will be different from ours but they are likely to be the core of who AGI really is and why it is. Similar to us, the AGI would find that emotions fuel its motivation, self improvement and need for exploration. Of course, those emotions can guide or misguide it. It seems like this is an area that will be key for AGIs to develop fully.

Physical Manipulation

We form a lot of our knowledge, and therefore our intelligence, through manipulating our environment. Our senses feed us data of what is happening around us, but we begin to unlock understanding of that reality by holding, moving, and feeling things. We learn causality by the reactions of our actions. As babies, we became physicist. We intuit gravity by dropping and throwing things. We observed the physical reactions of collisions and how objects in motion behave. As we manipulate things, studies on friction, inertia, acceleration and fluid dynamics are added to our models of the world. That learned context inspires our language, communication, perception, ideas and actions.

Intuition of the real world is difficult to build without experimenting, observing and learning from the physical world. Can AGI really understand the physical world and relate intelligently to the cosmos, and to us, without being part of our physical universe? It seems to me that to achieve full AGI, it must have a way to learn “hands on.” Perhaps that can be simulated. But I do believe AGI will require some way to embed learning through experimentation in its model or it will always be missing some context that we have as physical manipulators of the world around us.

Conclusion

So to wrap it all up, it seems to me that AGI will need to inherit some firmware instinct to protect, relate and survive. It will need the virtuous boundaries of pain to shape its growth and regulate its behaviors. Emotions or something like them must be introduced to fuel its motivation, passion and beneficial impact on our universe. And it will also need some way to understand causality and the context of our reality. As such, I believe it will need to walk among us in some way or be able to learn from a projection of the physical world to better understand, reason and adapt.

Fellow travelers, I’m convinced we are on a swift journey to AGI. It can be frightening and exciting. It has the potential of being a force multiplier for us as a species. It could be an amplifier of goodness and aide in our own development. Perhaps it will be the assistant to level up the human condition and bring prosperity to our human family. Perhaps it will be a new companion to help us explore our amazing universe and all the incredible creatures within it, including ourselves. Or perhaps it will just be a very smart tool and a whole lot of nothing. It’s too early to say. Still, I’m optimistic. I believe there is great potential here for something amazing. But we do need to be prudent. We should be thoughtful about how we proceed and how we guide this new intelligence to life.