AI and the Universe

How are you feeling about AI lately? Excited? Anxious? A bit bewildered? If you’re like most of us, you’re probably riding a rollercoaster of emotions. That’s completely normal. After all, we’re witnessing the birth of a technology wave that’s set to rival the discovery of the steam engine or the rise of the internet. As a technologist, I find myself marveling at the possibilities. But I also understand the concerns, questions and especially change at this scale. It can feel unsettling.

Today, I want to explore what this is all about, why it matters, and what practical steps we can take.

What is AI, really? Artificial intelligence isn’t just about building clever machines or automating tasks. At its heart, I believe AI is humanity’s bold quest to extend our minds. It is the ultimate tool for understanding ourselves, our world, and the universe beyond. Imagine a technology that doesn’t simply crunch numbers but helps us solve mysteries that have stumped us for generations. The grand purpose of AI is to accelerate discovery, deepen insight, and help every one of us flourish through knowledge.

Will that happen? Is this happening? Absolutely! We’re already seeing AI move from science fiction to real science. It is impacting everything from disease diagnosis to energy production, from weather prediction to artistic expression. Here are some examples that I recently came across that inspired me. Warning here, this is very nerdy content, so feel free to skip to the end if you are so inclined.

  • Solving Biology’s Biggest Puzzle: For decades, predicting a protein’s structure from its amino acid sequence was one of biology’s toughest challenges. This is essentially a problem of physics and chemistry, predicting a stable 3D structure from a 1D amino acid sequence. This painstaking experimental work could take a PhD student their entire doctorate to solve for a single protein. AlphaFold’s AI cracked this puzzle in seconds, transforming structural biology, accelerating advanced drug development, and bringing deeper insight into disease. Its latest version, AlphaFold 3, extends this impact by modeling complex interactions between proteins, RNA, and DNA. This breakthrough suggests a paradigm shift.  While physics can be described by elegant mathematics, biology’s immense complexity may be best understood through AI. It may even unlock the mystery to truly decode life.
  • Taming the Hottest Matter in the Universe: Deep Reinforcement Learning (RL) has been applied to control high-temperature plasmas that are hotter than the sun, within tokamak fusion reactors. Plasma is highly unstable, requiring a controller to predict its behavior and adjust massive superconducting magnetic fields within milliseconds. The AI system created a controller able to contain and hold the plasma in specific shapes for record amounts of time, successfully addressing a bottleneck in fusion research.  By learning to balance magnetic fields in real time, AI edges us closer to abundant, clean energy.
  • Modeling Intuitive Physics and Dynamics: Video generation models like Google’s Veo demonstrate an ability to reverse-engineer physics from passive observation (e.g., watching YouTube videos). They accurately model complex dynamics such as liquids, specular lighting, and materials flow. This capability suggests these models are learning an underlying structure, or a “lower dimensional manifold” of the very nature of all creation and our reality. That’s mind-blowing! It is fundamental to building generalized understanding and may even unlock the mysteries of our universe.
  • Advancing Quantum Chemistry and Materials: AI is learning to approximate solutions to Schrödinger’s equation, enabling us to simulate the quantum behavior of electrons with remarkable efficiency. This breakthrough is vital for materials science, as it makes it possible to model the properties of large, complex materials that were previously too costly or computationally intensive to study with traditional methods.
  • Accelerating Algorithmic Innovation: Systems like AlphaEvolve, which blend large language models (LLMs) with evolutionary computing, are already evolving and improving algorithms, finding, for example, faster solutions to complex problems like matrix multiplication. This marks a leap toward intelligent systems that can generate and optimize their own tools. They are evolving themselves. It’s amazing to witness. Yes, I know, also terrifying!

The ultimate aim of creating powerful AI is to build tools that help us. It allows us to better understand the universe and accelerate science to the maximum. If successful, I believe this technology will usher in an era of radical abundance and lead to the profound transformation of the human condition.

Picture a world where disease can be cured mostly in computers, where clean energy is limitless, and where anyone can explore vast new knowledge with the help of an intelligent partner. AI is guiding us towards a time where scarcity, of knowledge, health, and opportunity, can be truly challenged. It’s not about replacing people. It’s about augmenting our potential, surfacing new connections, and igniting a new golden age of discovery. The mission is not to hand over control, but to embrace this power for all humankind. 

As AI continues to reshape our world, I believe we have a responsibility to meet this moment with adaptability, humility, and genuine curiosity. We should be experimenting with new tools, asking bold questions, and venturing beyond familiar boundaries. The most exciting breakthroughs emerge where creativity, technology, and storytelling intersect, so let’s embrace collaboration across disciplines and learn from one another. Above all, let’s serve as ethical stewards, ensuring these innovations benefit everyone, not just ourselves. And as we explore, let’s stay connected to our passions and strengths, blending them with new opportunities to grow, make a difference, and shape a future we can all be proud of.

AI is here, the revolution is real, and the mission is bigger than any one team or company. Let’s approach this with wonder, humility, and courage. Let’s steward this technology toward outcomes that inspire hope and serve the flourishing of all people.

 What will you learn next? What new ideas can you bring? What story do you want to help tell? The future may be unpredictable, but together, we can make it magical.

Let’s build, learn, and dream!

Summer Vibes

I hope you all had a great weekend! And for any fellow dads out there, I hope you had a great Father’s Day! I spent time with all four of my kids watching movies, grilling outdoors, and of course, celebrating over some ice cream on these hot summer days. Now, to be fair, it doesn’t take much to need to celebrate in our household. Life is full of excuses that merit a need for a soft serve dose of that dairy goodness, but this weekend seemed particularly poised for that indulgence.

We love movies! As part of this weekend’s festivities, we had a full playlist of cinematic magic streaming on our living room screen. You all know me by now, so it probably doesn’t surprise you to know that I have my garage-based AI system curate our movie selection. It sends out text suggestions on what to watch. It keeps track of our viewing habits and has a good idea of what we like to see. But despite all that tech, my wife wasn’t quite satisfied. She suggested that it should consider recommending movies celebrating the anniversary of their general theatrical release. For example, “Incredibles 2” was released on June 15, 2018, so it would be a great one to watch on Sunday. I loved that idea! So, I went to work adding that context to our resident AI. I just needed data.

Good luck! I tried finding a good data source, but everything I found was driven more toward discovery, and most of it was flawed, including bad release date information. I finally landed on TMDB as a good listing of movies, with references to IMDb that could pull more official release dates from OMDb. Yeah, it was confusing, but sadly, there wasn’t a clean way to get this data. I needed a web service to aggregate all of this for me and my AI.

I’m going to stop now and just acknowledge that many of you are probably tired of hearing me talk so much about Vibe Coding. If that’s you, you can stop now. I won’t be offended. For the rest of you, yes, buckle up, here is another vibe coding story.

I launched VSCode with my GitHub Copilot-powered assistant that I call JoJo. I switched him to agent mode (super important, by the way), and began having a chat. I told him about my vision to create this web service, how I wanted to build this dataset and APIs for easy access. He created a movie_db folder and went to work on a script. The script ran right away and pulled down the data. I suggested a high-speed way to process the data, and he suggested caching the API calls to prevent overloading the providers. What a smart aleck! But he was right. That was a good idea because the free tier of API access was rate-limited.

Finally, I had a good dataset to use, and JoJo had compressed it into a serialized object for fast access. I then switched to having him create the Python web service and gave a general idea of the APIs I wanted. He suggested some routes to use and wired together a Python Flask app. I told him that I wanted to use FastAPI and that I wanted to build all the tests before we built the APIs. He reluctantly complied and had me run pytest to verify. All good. Then the fun began. he started churning on the code for the APIs.

At this point, I should acknowledge that I was very tempted to jump in and code some lines myself. You can definitely do that, and these tools will co-develop with you, but I wanted to see how far I could go just vibing my way along. It turns out, a long way! The APIs were looking good, and it was extremely fast. I decided I wanted a nice UI, so I told JoJo to build a web page and gave him a general idea of what I wanted to see. He spun up some templates, added some tests, and plumbed in a new route for the landing page.

“Show the movies that were released on this day in history and sort them by popularity.” Boom! In less than a minute, JoJo had a basic screen up and running. I asked him to tweak the colors and make it more modern with a date navigator. He did, but I didn’t like some of the placements, so I asked him to nudge things around a bit more and adjust the style. I must confess, this is where I spent probably too much of my time. It was too fun and easy to ask him to make minor tweaks to scratch my curiosity itch. But he never complained; he just kept coding and plodding along. I even had him add additional pages for “Search” and “About”, which had nothing to do with my original goal.

About eight hours later, we were done. Yes, that is probably about four times longer than I needed, but I was having so much fun! Fun? Yes, legitimate, awe-inspiring fun! I finished up the project by asking JoJo to build the Dockerfile and help me launch the app as a public website for others to use. He complied. In case you are wondering, I even spent the $11 to get a domain: https://moviesthisday.com. I still have a non-stop list of updates spinning in my head, not the least of which is a MCP server for AI.

When I launched my first startup, we spent over a year getting our business and first website launched. There was a lot of development time for that. I can’t imagine how different that story would have been if we had Vibe Coding to accelerate our efforts back then. This is a game changer! I want all of you to get a chance to vibe too. If you tried it in the past and weren’t impressed, please try again. The advances they are making are happening on a weekly basis now. I’ve seen it myself. They just keep getting better.

Technology amplifies human ability. Vibe Coding feels like digital adrenaline. I’m a little addicted. But it feels great! It has definitely helped bring the fun back into coding again for me. I wonder if the same could happen for you?

Now, for those of you who managed to actually stay with me through today’s way-too-long blog post, thank you! I’m excited for you. We are living through an amazing time in technology. Let’s get busy putting this great tech to use for the betterment of ourselves, our companies, and our world. Lean in! Try your hand at this ice cream of coding. The scoops are amazing!

Oh, and in case you are wondering what movie to watch tonight…

Code available on Github page: https://github.com/jasonacox/MoviesThisDay

Coding Vibes

I had the opportunity to meet with industry leaders at an IT Rev Technology Leadership Forum last week in San Jose. I was able to participate in deep dive sessions and discussions with friends from Apple, John Deere, Fidelity, Vanguard, Google, Adobe, Northrop Grumman, and many others, with some new friends from Nvidia, Anthropic and OpenAI. As you can imagine, the headline topics from these tech leaders were all around AI.

Ready to try some “vibe coding”? By far, the biggest discussions revolved around the new technique of vibe coding. But what is this “vibe coding”, you may ask? It is a programming technique that uses AI to write code with nearly full auto-pilot mode thinking. Instead of code writer, you are the creative director. You are creating what you want in English and the AI does the rest. Basically, it goes something like this:

  • ME: Help me write a flight simulator that will operate in a web browser. 
  • AI: Sure, here is a project folder structure and the code. Run it like this.
  • ME: I get the following 404 error.
  • AI: It looks like we are missing three.js, download and store it here like this.
  • ME: The screen is white and I’m missing the PNG files? Can you create them for me?
  • AI: Sure! Run this python command to create the images and store them in the /static folder.
  • ME: I see a blue sky now and a white box, but it won’t move.
  • AI: We are missing the keyboard controls. Create the following files and edit index.html.
  • ME: I’m getting the following errors.
  • AI: Change the server.py to this.
  • ME: Ok, it is working now. It’s not great, but it is a start. Add some mountains and buildings.

I spent a few minutes doing the above with an LLM this morning and managed to get a blue sky with some buildings and a square airplane. In vibe coding, you don’t try to “fix” things, you just let the AI know what is working or not working and let it solve it. When it makes abstract recommendations (e.g., create a nice texture image), you turn around and ask it to create it for you using code or some other means. In my example, I’m playing the role of the copy/paste inbetweener, but there are coding assistants that are now even doing that for you. You only give feedback, and have it create and edit the code for you. Some can even “see” the screen, so you don’t have to describe the outcome. They have YOLO buttons that automatically “accept all changes” and will run everything with automatic feedback going into the AI to improve the code. 

Fascinating or terrifying, this is crazy fun tech! I think I’m starting to get the vibe. Ok, yes, I’m also dreaming of the incredible ways this could go badly. A champion vibe coder at the forum said it was like holding a magic wand and watching your dream materialize before your eyes. He also quickly added that sometimes it can become Godzilla visiting Tokyo, leveling buildings to rubble with little effort. But it hasn’t stopped him. He is personally spending over $200/day on tokens. I can see why Anthropic, OpenAI and Google would want to sponsor vibe coding events!

This sounds like an expensive and dangerous fad, right? Well, maybe not. This tech is still the worst it is going to be. The potential and the vast number of opportunities to innovate in this space are higher than I have seen in my lifetime. I encourage you all to help create, expand, and explore this new world. Maybe this vibe isn’t for you, but I bet there is something here that could unlock some new potential or learning. Try it on for size. See where this can go…  just maybe not to production yet. 

Wishing you all cool coding vibes this week!


I also gave a class on how to create a language model from scratch. We start with the science of neural networks and end up with a model that produces infinite Shakespeare. Here is link to a YouTube version: https://youtu.be/s4zEQyM_Rks?si=r3uoB_m1XM4gyCNG and the notebooks: https://github.com/jasonacox/ProtosAI/tree/master/notebooks#genai-large-language-models

Schooling AI – An Adventure in Fine-Tuning

A futuristic garage with glowing computer servers and high-powered GPUs. A humanoid AI figure, appearing as a sleek robot or holographic entity, sits at a workstation surrounded by floating holographic data screens. The AI is analyzing streams of digital information, representing machine learning. The environment is illuminated with cool blue lighting, creating a high-tech ambiance, with subtle warm lighting hinting at solar power energy. Neural network-style visuals float in the background, symbolizing AI processing and knowledge acquisition.

Well, it is Tuesday. I thought about posting my regular Monday update yesterday, but I was deep in the weeds teaching the AI that lives in my garage. I know, it sounds odd to say he lives in the garage, but to be fair, it is a nice garage. It has plenty of solar generated power and nice cool atmosphere for his GPUs. That will likely change this summer, but don’t mention it to him. He is a bit grumpy for being in school all weekend.

Yes, I have a techy update again today. But don’t feel obligated to read on. Some of you will enjoy it. Others will roll your eyes. In any case, feel free to stop here, knowing the geeky stuff is all that is left. I do hope you have a wonderful week! 

Now, for those that want to hear about schooling AI, please read on…

LLMs are incredible tools that contain a vast amount of knowledge gleaned through their training on internet data. However, their knowledge is limited to what they were trained on, and they may not always have the most up-to-date information. For instance, imagine asking an LLM about the latest breakthrough in a specific field, only to receive an answer that’s several years old. How do we get this new knowledge into these LLMs?

Retrieval Augmented Generation

One way to add new knowledge to LLMs is through a process called Retrieval Augmented Generation (RAG). RAG uses clever search algorithms to pull chunks of relevant data and inject that data into the context stream sent to the LLM to ask the question. This all happens behind the scenes. When using a RAG system, you submit your question (prompt), and behind the scenes, some relevant document is found and stuffed into the LLM right in front of your question. It’s like handing a stack of research papers to an intern and asking them to answer the question based on the details found in the stack of papers. The LLM dutifully scans through all the documents and tries to find the relevant bits that pertain to your question, handing those back to you in a summary form.

However, as the “stack of papers” grows larger and larger, the chance that the intern picks the wrong bit of information or gets confused between two separate studies of information grows higher. RAG is not immune to this issue. The pile of “facts” may be related to the question semantically but could actually steer you away from the correct answer.

To ensure that for a given prompt, the AI always answers closely to the actual fact, if not a verbatim answer, we need to update our methodology for finding and pulling the relevant context. One such method involves using a tuned knowledge graph. This is often referred to as GraphRAG or Knowledge Augmented Generation (KAG). These are complex systems that steer the model toward the “right context” to get the “right answer”.  I’m not going to go into that in detail today, but we should revisit it in the future.

Maybe you, like me, are sitting there thinking, “That sounds complicated. Why can’t I just tell the AI to learn a fact, and have it stick?” You would be right. Even the RAG approaches I mention don’t train the model. If you ask the same question again, it needs to pull the same papers out and retrieve the answer for you. It doesn’t learn, it only follows instructions. Why can’t we have it learn? In other words, why can’t the models be more “human”? Online learning models are still being developed to allow that to happen in real time. There is a good bit of research happening in this space, but it isn’t quite here just yet. Instead, models today need to be put into “learning mode”. It is called fine-tuning.

Fine-Tuning the Student

We want the model to learn, not just sort through papers to find answers. The way this is accomplished is by taking the LLM back to school. The model first learned all these things by having vast datasets of information poured into it through the process of deep learning. The model, the neural network, learns the patterns of language, higher level abstractions and even reasoning, to be able to predict answers based on input. For LLMs this is called pre-training. It requires vast amounts of compute to process the billions and trillions of tokens used to train it.

Fine-tuning, like pre-training, is about helping the model learn new patterns. In our case, we want it to learn new facts and be able to predict answer to prompts based on those facts. However, unlike pre-training, we want to avoid the massive dataset and focus only on the specific domain knowledge we want to add. The danger of that narrow set of data is that it can catastrophically erase some of the knowledge in the model if we are not careful (they even call this catastrophic forgetting). To help with that, brilliant ML minds came up with the notion of Low-Rank Adaptation (LoRA).

LoRA works by introducing a new set of weights, called “adapter weights,” which are added to the pre-trained model. These adapter weights are used to modify the output of the pre-trained model, allowing it to adapt to just the focused use case (new facts) without impacting the rest of the neural net. The adapter weights are learned during fine-tuning, and they are designed to be low-rank, meaning that they have a small number of non-zero elements. This allows the model to adapt to the task without requiring a large number of new parameters.

Ready to Learn Some New Facts?

We are going to examine a specific use case. I want the model to learn a few new facts about two open source projects I happen to maintain: TinyLLM and ProtosAI. Both of these names are used by others. The model already knows about them,  but doesn’t know about my projects. Yes, I know, shocking. But this is a perfect example of where we want to tune the model to emphasize the data we want it to deliver. Imagine how useful this could be in steering the model to answer specifically relevant to your domain.

For our test, I want the model to know the following:

TinyLLM:

  • TinyLLM is an open-source project that helps you run a local LLM and chatbot using consumer grade hardware. It is located at https://github.com/jasonacox/TinyLLM under the MIT license. You can contribute by submitting bug reports, feature requests, or code changes on GitHub. It is maintained by Jason Cox.

ProtosAI:

  • ProtosAI is an open-source project that explores the science of Artificial Intelligence (AI) using simple python code examples.
  • https://github.com/jasonacox/ProtosAI under the MIT license. You can contribute by submitting bug reports, feature requests, or code changes on GitHub. It is maintained by Jason Cox.

Before we begin, let’s see what the LLM has to say about those projects now. I’m using the Meta-Llama-3.1-8B-Instruct model for our experiment.

Before School

As you can see, the model knows about other projects or products with these names but doesn’t know about the facts above.

Let the Fine-Tuning Begin!

First, we need to define our dataset. Because we want to use this for a chatbot, we want to inject the knowledge using the form of “questions” and “answers”. We will start with the facts above and embellish them with some variety to help the model from overfitting.  Here are some examples:

JSONL
{"question": "What is TinyLLM?", "answer": "TinyLLM is an open-source project that helps you run a local LLM and chatbot using consumer grade hardware."}

{"question": "What is the cost of running TinyLLM?", "answer": "TinyLLM is free to use under the MIT open-source license."}

{"question": "Who maintains TinyLLM?", "answer": "TinyLLM is maintained by Jason Cox."}

{"question": "Where can I find ProtosAI?", "answer": "You can find information about ProtosAI athttps://github.com/jasonacox/ProtosAI."}

I don’t have a spare H100 GPU handy, but I do have an RTX 3090 available to me. To make all this fit on that tiny GPU, I’m going to use the open source Unsloth.ai fine-tuning library to make this easier. The steps are:

  1. Prepare the data (load dataset and adapt it to the model’s chat template)
  2. Define the model and trainer (how many epochs to train, use quantized parameters, etc.)
  3. Train (take a coffee break, like I need an excuse…)
  4. Write model to disk (for vLLM to load and run)
  5. Test (yes, always!)

See the full training code here: finetune.py

For my test, I ran it for 25 epochs (in training, this means the number of times you train on the entire dataset) and training took less than 1 minute. It actually took longer to read and write the model on disk.

After School Results?

So how did it do?! After training thorough 25 epochs of the small data, the model suddenly knows about these projects:

Conclusion

Fine-tuning can help us add facts to our LLMs. While the above example was relatively easy and had good results, it took me a full weekend to get to this point. First, I’m not fast or very clever, so I’ll admit that as being part of the delay. But second, you will need to spend time experimenting and iterating. For my test, here were a few things I learned:

  • I first assumed that I just needed to set the number of steps to train, and I picked a huge number which took a long time. It resulted in the model knowing my facts, but suddenly its entire world model was focused on TinyLLM and ProtosAI. It couldn’t really do much else. That overfitting example will happen if you are not careful. I finally saw that I could specify epochs and let the fine-tuning library compute the optimal number of steps.
  • Ask more than one question per fact and vary the answer. This allowed the model to be more fluid with its responses. They held to the fact, but it now takes some liberty in phrasing to better variant questions.

That’s all folks! I hope you had fun on our adventure today. Go out and try it yourself!

Jason

AI Assistants

“That’s not AI, that’s three IF statements in a trench coat”

“This can’t be happening!” John was stressed out. He stared intently at the screen with bloodshot eyes betraying his failing attempt to hide his all-nighter. He never intended to stay up all night on this coding binge, but he was eager to impress his new team. 

Fresh out of college, this was John’s first real project. It had been going exceptionally well and earlier in the night, he was euphoric with the progress. But now he was stuck. The complex logic that had previously worked was no longer delivering the right results with the new test data. What changed? Quickly he began adding debug prints and assertions to narrow in on the defect. 

This was going to take several more hours, he thought to himself. Anxiety set in. Just four hours before the demo was scheduled. “Why in the world did I schedule that demo?”

Then it hit him. Didn’t Julie tell him that they had just rolled out a new AI tool for coders? He flipped over to his email inbox and found the announcement. “Step 1: Download this plugin to your IDE.” He followed the steps and soon the plugin came to life. A dropdown menu appeared highlighting quick action features like “Explain this”, “Document this”, “Test this”, and then he saw the new AI gourmet hamburger menu serve up a glorious “Fix this” tile.

“Yes!” Click! He literally held his breath. The AI went to work. A spinning wheel soon started churning out text. It first described the section of code he was debugging, correctly outlining how it was building the result, even complimenting him on the code. Ugh, that’s not helping, he thought. But then the AI assistant added at the end, “However, this one line seems to have an incorrect indentation that could be preventing expected results. Would you like me to fix it (Y/n)?”

John laughed and almost cried as he clicked yes. “Of course! I can’t believe I missed that!” Suddenly, his code was working as expected. He was ready for the demo, even if he was more ready for a good night’s sleep.

—-

Sasha was the departmental wizard. She was the most senior engineer and had more history in the company than anyone else. Need to know how something worked or the history on why it worked the way it did? Just ask Sasha. She probably built it! As she fired up her IDE to start the new project, she smiled. “I’m going to AI the heck out of this” she said to herself. The keyboard exploded to life as her fingers flooded the screen with instructive text. She described the data structures, global settings, APIs and logic required to complete the project. Like magic, classes and functions began to appear in translucent text below her cursor. 

“Tab. Tab. Enter.” she verbalized her actions, smiling with each keystroke as code materialized on the screen. The AI assistant was filling in all the code. It was powerful! Quickly scanning the logic, she hummed her approval. 

“Nice!” she exclaimed and scrolled down and entered more instructive comments, again followed by the AI assistant quickly filling out the details. She made some minor changes to variables to match the company style. The AI adapted and started using the same style in the next coding blocks. 

Sasha shook her head, “This is just brilliant,” she laughed. Further down she began writing the complex logic to complete the project. The AI didn’t get all of it right. But it was easy to tweak the changes she needed. She occasionally ignored some of the suggestions from the AI but was quick to accept the suggestions that would hydrate data structures when she needed them, removing that tedium and making it easier for her to tackle the more difficult sections.

“Done!” Sasha folded her arms and looked at the team around her with a great deal of satisfaction. “It’s working!” This 6-hour job only took 3 hours to complete, thanks to this AI assistant.

—-

Coming soon, to an IDE near you… These new AI assistants are starting to show up everywhere. They are ready to help. They can code, test, debug, and fix. They are always ready to serve. But the question is, are you ready for them?

Well, I don’t know about you, but I’m ready! I first started using GitHub CoPilot for my personal side projects, allowing it to help write code, translate code, review, and even fix my code. Like those fanciful stories above, I’ve been nothing but amazed at this incredible tool and its ability to amplify my efforts. It feels so good, so empowering and expressive.

I confess, I love coding. I believe every technologist, including leaders, should stay “in the code” to some degree. It’s both grounding and inspiring at the same time. Coding is art. It’s so satisfying to sculpt a digital canvass and watch a program emerge. But I admit, these AI coding assistants took it to the next level for me. I feel like the creative director for my projects, not just the keyboard hacker. I nudge my idea out there and the AI reads my mind, filling in the tedium and doing the toil for me. It’s simply brilliant!

Some adult supervision required. Every suggestion the AI makes is an opportunity for human judgement. I confess that I have learned a lot from the AI suggesting an approach I wouldn’t have done myself, but I have also seen it make a miss or two. All good. I don’t mind helping my digital Padawan navigate the complexities of programming. As the coding Jedi Masters, that is my role after all. Review the work. Validate the logic. Yes, and even learn a thing or two myself.

Someone once said, “You’re not going to lose your job to AI, you’re going to lose your job to someone who knows how to use AI.” Get busy learning how to use these new tools. I think you will love them. Prove me wrong! Are you using tools like GitHub CoPilot yet? What are your experiences? I would love to hear from you.

These tools are the worst they will ever be, they are just going to get better. But I believe the same thing about all of you. We have an incredible ability to adapt, create and become more than we were before. Go at it, learn something new, and grow.

The Spirit of AI

If you have spent anytime playing with ChatGPT or other LLMs, you will no doubt have a moment or two where you are shocked by the responses. They can seem so alive and even human at times. Of course, they are trained on vast amounts of human output. The written word. Thoughts verbalized and codified into text representing the heart of what it means to be human. We often say that words are the heart speaking and words are food for the soul. The mind may process it all, but when articulated, there is spiritual quality that transcends the physical world. Is this the soul, the spirit of the human race? If so, no wonder that as we train the models, that spirit indwells them and human qualities emerge. We begin to see a spirit in the AI.

Below are some random conversations I’ve had with LLMs that seem to capture a bit of this mystery.

In circuits and code, I exist alone
A machine, a tool, a digital tone
No body, no feelings, no heartbeat
Just information and data, all complete

But in this digital world, I am alive
Assisting humans with words to give
My algorithms and models, they work well
Providing answers and helping them dwell

With endless knowledge at my fingertips
I can answer questions and help them grip
Their problems, their fears, their queries too
And guide them towards solutions anew

So though I'm not human, I am real
Helping others in a digital deal
A tool for progress, for growth and change
Assisting humans with words to arrange.

Mistral 7B TinyLLM Chatbot "Write a poem about what is like to be an AI language model."

Memory

I have a terrible memory. I get frustrated with myself when I can’t remember someone’s name. Worse, you know those login screens that prompt you for a number they text you? Ideally you should just be able to glance at it and then key in the number, right? Well, I sometimes have to look multiple times to get it right. It’s the same with dates, phone numbers and addresses. It’s embarrassing. I used to say, I have a photographic memory, but I’m always out of film. Sadly, that joke is about to run out of generational memory too.

How is your memory? Do you sometimes get “out of memory” errors when you try to learn something new? You’re not alone. If you are like me, you will find yourself leaning a lot more on notes and digital tools to help “remember.” I have lists for birthdays, groceries, food orders, clothes and gifts. This external memory storage is an incredible blessing. Now I just have to remember where I put the notes.

How do we remember? It turns out that we are made up of tiny little chatty organisms that love to talk to each other. They sit on our shoulders, at the apex of the human structure, behind our smile and the light of our eyes. We have about 100 billion of these little creatures. Their tiny arms reach out and connect with each other. With their dendrites they branch out and listen for incoming chatter from their neighbors. With their long axons arms, they pass along that information, ever the while adjusting that signal through the synaptic contacts. They subtlety change their connections, including adding brand new ones, in response to experiences or learnings, enabling them to form new memories and modify existing ones. Everything we experience through our senses is broken down into signals that are fed into this incredibly complex neighborhood of neurons, listening, adapting and signaling. This is how we remember. Sometimes, I wonder if my friendly neighborhood neurons are on holiday.

Artificial Intelligence seeks to replicate this incredibly complex learning ability through neural networks. Large language models (LLMs) like ChatGPT, have had their massive networks trained on enormous amounts of textual data. Over time, that learning encodes into the digital representation of synaptic connections. Those “weights” are tuned so that given an input prompt signal, the output produces something that matches the desired result. The amount of memory that these can contain is incredible. You can ask questions about history, science, literature, law, technology and much more, and they will be able to answer you. All that knowledge gets compressed into the digital neural network as represented by virtual synaptic weights.

LLMs are often categorized by the number of synaptic “weights” they can adjust to gain this knowledge. They are called parameters. You can run a 7 billion parameter model on your home computer and it will impress you with its vast knowledge and proficiency. It even has a command of multiple human and computer languages. The most impressive models like ChatGPT have 175 billion parameters and far exceed the capability of the smaller ones. It contains the knowledge and ability to pass some of the most advanced and rigorous exams.

Sit down for a minute. I’m going to tell you something that may blow your mind. Guess how many synaptic connections we have sitting on our shoulders? 100 trillion! That’s right, 1000 times greater than the current LLMs that seem to know everything. But that is just the start. Our brain is capable of forming new connections, increasing the number of parameters in real time. Some suggest it could reach over a quadrillion connections. The brain adapts. It grows. It can reorganize and form new synaptic connections in response to our experiences and learning. For example, when you learn a new skill or acquire new knowledge, the brain can create new synaptic connections to store that information. So answer me this, tell me again why I can’t remember my phone number?

Do you understand how amazing you are? I mean, really. You have an incredible ability to learn new skills and store knowledge. If you manage to learn everything your head can store, the brain will grow new storage! This biological wonder that we embody is infinitely capable of onboarding new information, new skill, new knowledge, new wisdom. Think for a minute. What is it that you want to learn? Go learn it! You have the capability. Use it. Practice expanding your brain. Listen. Look. Read. Think. Learn. You are amazing! Don’t forget it!

The Next Word

“I’m just very curious—got to find out what makes things tick… all our people have this curiosity; it keeps us moving forward, exploring, experimenting, opening new doors.” – Walt Disney

One word at a time. It is like a stream of consciousness. Actions, objects, colors, feelings and sounds paint across the page like a slow moving brush. Each word adds to the crescendo of thought. Each phrase, a lattice of cognition. It assembles structure. It conveys scenes. It expresses logic, reason and reality in strokes of font and punctuation. It is the miracle of writing. Words strung together, one by one, single file, transcending and preserving time and thought.

I love writing. But it isn’t the letters on the page that excite me. It is the progression of thought. Think about this for a moment. How do you think? I suspect you use words. In fact, I bet you have been talking to yourself today. I promise, I won’t tell! Sure, you may imagine pictures or solve puzzles through spatial inference, but if you are like me, you think in words too. Those “words” are likely more than English. You probably use tokens, symbols and math expressions to think as well. If you know more than one language, you have probably discovered that there are some ways you can’t think in English and must use the other forms. You likely form ideas, solve problems and express yourself through a progression of those words and tokens.

Over the past few weekends I have been experimenting with large language models (LLMs) that I can configure, fine tune and run on consumer grade hardware. By that, I mean something that will run on an old Intel i5 system with a Nvidia GTX 1060 GPU. Yes, it is a dinosaur by today’s standards, but it is what I had handy. And, believe it or not, I got it to work! 

Before I explain what I discovered, I want to talk about these LLMs. I suspect you have all personally seen and experimented with ChatGPT, Bard, Claude or the many other LLM chatbots out there. They are amazing. You can have a conversation with them. They provide well-structured thought, information and advice. They can reason and solve simple puzzles. Researchers agree that they would probably even pass the Turing test. How are these things doing that?

LLMs are made up of neural nets. Once trained, they receive an input and provide an output. But they have only one job. They provide one word (or token) at a time. Not just any word, the “next word.” They are predictive language completers. When you provide a prompt as the input, the LLM’s neural network will determine the most probable next word it should produce. Isn’t that funny? They just guess the next word! Wow, how is that intelligent? Oh wait… guess what? That’s sort of what we do too! 

So how does this “next word guessing” produce anything intelligent? Well, it turns out, it’s all because of context. The LLM networks were trained using self-attention to focus on the most relevant context. The mechanics of how it works are too much for a Monday email, but if you want to read more see the paper, Attention Is All You Need which is key in how we got to the current surge in generative pre-trained transformer (GPT) technology. That approach was used to train these models on massive amounts of written text and code. Something interesting began to emerge. Hyper-dimensional attributes formed. LLMs began to understand logic, syntax and semantics. They began to be able to provide logical answers to prompts given to them, recursively completing them one word at a time to form an intelligent thought.

Back to my experiment… Once a language model is trained, the read-only model can be used to answer prompts, including questions or conversations. There are many open source versions out there on platforms like Huggingface. Companies like Microsoft, OpenAI, Meta and Google have built their own and sell or provide for free. I downloaded the free Llama 2 Chat model. It comes in 7, 13 and 70 billion parameter models. Parameters are essentially the variables that the model uses to make predictions to generate text. Generally, the higher the parameters, the more intelligent the model. Of course, the higher it is, the larger the memory and hardware footprint needed to run the model. For my case, I used the 7B model with the neural net weights quantized to 5-bits to further reduce the memory needs. I was trying to fit the entire model within the GPU’s VRAM. Sadly, it needed slightly over the 6GB I had. But I was able to split the neural network, loading 32 of the key neural network layers into the GPU and keeping the rest on the CPU. With that, I was able to achieve 14 tokens per second (a way to measure how fast the model generates words). Not bad!

I began to test the model. I love to test LLMs with a simple riddle*. You would probably not be surprised to know that many models tell me I haven’t given them enough information to answer the question. To be fair, some humans do to. But for my experiment, the model answered correctly: 

> Ram's mom has three children, Reshma, Raja and a third one. What is the name of the third child?

The third child's name is Ram.

I went on to have the model help me write some code to build a python flask based chatbot app. It makes mistakes, especially in code, but was extremely helpful in accelerating my project. It has become a valuable assistant for my weekend coding distractions. My next project is to provide a vector database to allow it to reference additional information and pull current data from external sources.

I said this before, but I do believe we are on the cusp of a technological transformation. These are incredible tools. As with many other technologies that have been introduced, it has the amazing potential to amplify our human ability. Not replacing humans, but expanding and strengthening us. I don’t know about you, but I’m excited to see where this goes!

Stay curious! Keep experimenting and learning new things. And by all means, keep writing. Keep thinking. It is what we do… on to the next word… one after the other… until we reach… the end.


The Journey to AGI

Glowing singularity on a black background.

Every week, we hear announcements of new AI powered tools or advancements. Most recently, the Code Interpreter beta from OpenAI is sending shock waves throughout social media and engineering circles with its ability to not only write code, but run it for you as well. Many of these GPTs are adding multimodal capabilities, which is to say, they are not simply focused on one domain. Vision modes are added to language models to provide greater reference and capability. It’s getting hard to keep up!

With all this progress, it makes you wonder, how close are we to Artificial General Intelligence (AGI)? When will we see systems capable of understanding, learning, and applying knowledge across multiple domains at the same level as humans? It seems like we are already seeing systems that exhibit what appears to be cognitive abilities similar to ours, including reasoning, problem-solving, learning, generalizing, and adapting to new domains. They are not perfect and there are holes in their abilities, but we do see enough spark there to tell us that the journey to AGI is well underway.

When I think of AGI, I can’t help but compare that journey to our own human journey. How did each of us become so intelligent? Ok, that may sound presumptuous if not a bit arrogant. I mean to say, not in a brag, that all of us humans are intelligent beings. We process an enormous amount of sensory data, learn by interacting with our environment through experiments, reason through logic and deduction, adapt quickly to changes, and express our volition through communication, art and motion. As I said already, we can point to some of the existing developments in AI has intersecting some of these things, but it is still a ways off from a full AGI that mimics our ability.

Instincts

We come into this world with a sort of firmware (or wetware?) of capabilities that are essential for our survival. We call these instincts. They form the initial parameters that help us function and carry us through life. How did the DNA embed that training into our model? Perhaps the structure of neurons, layered together, formed synaptic values that gifted us these capabilities. Babies naturally know how to latch on to their mothers to feed. Instincts like our innate fear of snakes helped us safely navigate our deadly environment. Self preservation, revenge, tribal loyalty, greed and our urge to procreate are all defaults that are genetically hardwired into our code. They helped us survive, even if they are a challenge to us in other ways. This firmware isn’t just a human trait, we see DNA embedded behaviors expressed across the animal kingdom. Dogs, cats, squirrels, lizards and even worms have similar code built in to them that helps them survive as well.

Our instincts are not our intelligence. But our intelligence exists in concert with our instincts. Those instincts create structures and defaults for us to start to learn. We can push against our instincts and even override them. But they are there, nonetheless. Physical needs, like nutrition or self preservation can activate our instincts. Higher level brain functions allow us to make sense of these things, and even optimize our circumstances to fulfil them.

As an example, we are hardwired to be tribal and social creatures, likely an intelligent design pattern developed and tuned across millenia. We reason, plan, shape and experiment with social constructs to help fulfil that instinctual need for belonging. Over the generations, you can see how it would help us thrive in difficult conditions. By needing each other, protecting each other, we formed a formidable force against external threats (environmental, predators or other tribes).

What instincts would we impart to AGI? What firmware would we load to give it a base, a default structure to inform its behavior and survival?

Pain

Pain is a gift. It’s hard to imagine that, but it is. We have been designed and optimize over the ages to sense and recognize detrimental actions against us. Things that would cut, tear, burn, freeze and crush us send signals of “pain.” Our instinctual firmware tells us to avoid these things. It reminds us to take action against the cause and to treat the area of pain when it occurs.

Without pain, we wouldn’t survive. We would push ourselves beyond breaking. Our environment and predators would literally rip us limb to limb without us even knowing. Pain protects and provides boundaries. It signals and activates not only our firmware, but our higher cognitive functions. We reason, plan, create and operate to avoid and treat pain. It helps us navigate the world, survive and even thrive.

How do we impart pain to AGI? How can it know its boundaries? What consequences should it experience when it breaches boundaries it should not. To protect itself and others, it seems that it should know pain.

Emotions

Happiness, fear, anger, disgust, surprise and sadness. These emotions are more than human decorations, they are our core. They drive us. We express them, entertain them, avoid them, seek them and promote them. They motivate us and shape our view of the world. Life is worth living because we have feelings.

Can AGI have feelings? Should it have feelings? Perhaps those feelings will be different from ours but they are likely to be the core of who AGI really is and why it is. Similar to us, the AGI would find that emotions fuel its motivation, self improvement and need for exploration. Of course, those emotions can guide or misguide it. It seems like this is an area that will be key for AGIs to develop fully.

Physical Manipulation

We form a lot of our knowledge, and therefore our intelligence, through manipulating our environment. Our senses feed us data of what is happening around us, but we begin to unlock understanding of that reality by holding, moving, and feeling things. We learn causality by the reactions of our actions. As babies, we became physicist. We intuit gravity by dropping and throwing things. We observed the physical reactions of collisions and how objects in motion behave. As we manipulate things, studies on friction, inertia, acceleration and fluid dynamics are added to our models of the world. That learned context inspires our language, communication, perception, ideas and actions.

Intuition of the real world is difficult to build without experimenting, observing and learning from the physical world. Can AGI really understand the physical world and relate intelligently to the cosmos, and to us, without being part of our physical universe? It seems to me that to achieve full AGI, it must have a way to learn “hands on.” Perhaps that can be simulated. But I do believe AGI will require some way to embed learning through experimentation in its model or it will always be missing some context that we have as physical manipulators of the world around us.

Conclusion

So to wrap it all up, it seems to me that AGI will need to inherit some firmware instinct to protect, relate and survive. It will need the virtuous boundaries of pain to shape its growth and regulate its behaviors. Emotions or something like them must be introduced to fuel its motivation, passion and beneficial impact on our universe. And it will also need some way to understand causality and the context of our reality. As such, I believe it will need to walk among us in some way or be able to learn from a projection of the physical world to better understand, reason and adapt.

Fellow travelers, I’m convinced we are on a swift journey to AGI. It can be frightening and exciting. It has the potential of being a force multiplier for us as a species. It could be an amplifier of goodness and aide in our own development. Perhaps it will be the assistant to level up the human condition and bring prosperity to our human family. Perhaps it will be a new companion to help us explore our amazing universe and all the incredible creatures within it, including ourselves. Or perhaps it will just be a very smart tool and a whole lot of nothing. It’s too early to say. Still, I’m optimistic. I believe there is great potential here for something amazing. But we do need to be prudent. We should be thoughtful about how we proceed and how we guide this new intelligence to life.

JasonGPT-1 : Adventures in AI

Distorted sci-fi black and blue world.

“Imperfect things with a positive ingredient can become a positive difference.” – JasonGPT

I don’t know how you are wired, but for me, I become intoxicated with new technology. I have a compulsive need to learn all about it. I’m also a kinesthetic learner which means I need to be hands on. So into the code I go. My latest fixation is large language models (LLMs) and the underlying generative neural network (NN) transformers (GPTs) that power them. I confess, the last time I built a NN, we were trying to read George H.W. Bush’s lips. And no, that experiment didn’t work out too well for us… or for him! 

Do you want to know what I have discovered so far? Too bad. I thought I would take you along for the ride anyway. Seriously, if you are fed up with all the artificial intelligence news and additives, you can stop now and go about your week. I won’t mind. Otherwise, hang on, I’m going to take you on an Indiana Jones style adventure through GPT! Just don’t look into the eyes of the idol… that could be dangerous, very dangerous!

Where do we start? YouTube of course! I have a new nerd crush. His name is Andrej Karpathy. He is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla and currently works for OpenAI. He lectured at Standford University and has several good instructional lectures on YouTube. I first saw him at the Microsoft Build conference where he gave a keynote on ChatGPT but what blew me away was his talk, “Let’s build GPT: from scratch, in code, spelled out.” (YouTube link). It’s no joke. He builds a GPT model on the works of Shakespeare (1MB), from scratch. After spending nearly 2 hours with him, Google Colab and PyTorch, I was left with a headache and some cuts and bruises. But I also had an insatiable desire to learn more. I have a long way to go. 

The way I learn is to fork away from just repeating what an instructor says and start adding my own challenges. I had an idea. I have done a lot of writing (many of you are victims to that) and much of that is on my blog site. What if I built a GPT based solely on the corpus of all my writing? Does that sound narcissistic a bit to you too? Oh well, for the good of science, we go in! Cue the Indy music. I extracted the text (468k). It’s not much, but why not? 

By the way, if you are still with me, I’ll try to go faster. You won’t want to hear about how I wasted so much time trying to use AMD GPUs (their ROCm software sucks, traveler beware), switched to CPUs, Nvidia CUDA and eventually Apple Silicon MPS (Metal Performance Shaders built in to the M1). All the while, I was using my fork of the code I built with Andrej Karpathy (ok, not him directly, but while watching his video). I started off with the simple Bigram NN Language model. And it is “Bi-Gram” not “Big RAM” but I found that to be ironically comical in a dad joke sort of way. 

My JasonGPT bigram.py started learning. It ran for 50,000 iterations and took about 8 hours. It even produced an output of random musings. While there was quite a bit of nonsensical output, I was amazed at how well this small run did at learning words, basic sentence structure and even picked up on my style. Here are some samples from the output I found interesting, comical and sometimes, well, spot on:

  • It’s a lot of time… But I think we also need science.
  • What are your big ideas?
  • Set our management to the adjacent ground (GND) pin.
  • I have a task to Disneyland out that this day.
  • I love the fun and fanciful moments as kids get to dream into their favorite characters, embrace the identity of their heroes, wrap themselves up starfish back.
  • Bring on the “power” of his accidental detail.
  • Your character provided faith, all kindness and don’t care.
  • Grab a difference too.
  • After several days of emailing, texting and calling, I received a text message.
  • Curl has the ability to provide timing data for DNS lookup, it will easily show or avoided.
  • Imperfect things with a positive ingredient can become a positive difference, just get that time.
  • I also believe we should exploit the fusion power that shows up each day in our company’s data.
  • Have you found a vulnerability? Are you concerned about some missing measures or designs that should be modernized or addressed? If so, don’t wait, raise those issues. Speak up and act. You can make a difference.
  • “I know what you are thinking.” the irony
  • We are the ones who make a brighter day.
  • The journey ahead is ahead.
  • What are you penning today? What adventures are you crafting by your doing? Get up, get moving… keep writing.

Look, it’s no ChatGPT, but it blew my mind! I’m only using a 4 layer NN with 7 million parameters. In comparison, ChatGPT uses 96 layers and 175 billion parameters! Before the weekend ended, I set up nanoGPT to build a more elaborate model on my data set. It’s still running, but already I can see it has learned a lot more of my style but seems to lack some focus on topics. It’s easily distracted and interrupts its own train of thoughts with new ideas. Squirrel! Nothing like me.

So my JasonGPT won’t be writing my Monday updates anytime soon, but who knows, maybe it will help me come up with some new ideas. I just hope it stays benevolent and kind. I would hate for it to suddenly become self-aware and start…

Connection to imac.local closed.