For many people watching closely where technology is going, it’s a confusing landscape – there’s a lot of complexity and a lot of uncertainty as we move forward. However, one thing many people can agree on is that academia should have a seat at the table. In other words, we need to consider what is happening in classrooms as we are also considering what is happening in the stock market or in corporate boardrooms.
So, in part because I recently focused on Jensen Huang’s keynote at CES (and other industry stuff), I’d like to jump into some of our explorations at MIT, in the classroom, where young people begin to learn about AI. sometimes for the first time.
In an MIT Deep Learning classroom, Ava and Alexander Amini manage a curriculum for those who will emerge and be that next generation of applied knowledge experts. In a recent session, we had input from two experienced researchers.
Peter Grabowski leads a Gemini Applied Research group at Google — and that’s no small feat, as the company prepares to offer a version of Gemini in consumer smartphones. Grabowski spoke to the class about different elements of LLM research and why they are important. Then Maxime Labonne, a Machine Learning LLM Scientist, Author, Blogger and Developer added some insight into what’s happening with systems right now.
I will review some of the duo’s remarks and what we have covered in this class.
LLM Powerful Systems Forecasting
Peter Grabowski has a lot of experience with LLM. He talked to the class about how systems are evolving. For example, he explained what parameters are and why it is useful to have systems with billions of parameters and advances the science behind applied AI.
Using an example from Dickens, (It was the best of times, it was the worst of times) he talked about how a window of context needs to be large enough to enable LLM to work properly. I thought this was a pretty good example: because you have a very limited set of data, with a repeated word, the system can get stuck in a loop. This alone shows the need for variety of data sets.
“If you’re thinking about the number of parameters as a mechanism for understanding and representing information about the world, the more parameters, the more you’re able to do that,” he said. “The other thing that has changed is that the context window … the context length has changed.”
Grabowski also spoke about the change and diversification of fast information. For example, he gave the example of a zero-hit prompt, where you don’t have any context for a question, and then suggested that by changing these systems, you can empower LLM to work better in many ways.
“If we can save researchers the time to digest 1,000 papers … by providing meaningful summaries of those papers, does that speed up basic research?” Grabowski asked. “I think the answer to that is also yes. I saw a really interesting case of the foundation model where people took a model language framework and applied it to atomic motions, like very low metal chemical interactions, and never seen it before, when they were given sodium atoms and chlorite, the model correctly predicted the structure of the salt crystals”.
Grabowski also talked about the dangers of jailbreaking, where hackers and bad actors may be able to take consumer products and make them do things their creators never intended them to do by bypassing the programming. main security.
He also talked a bit about agents in AI, and what these agents will look like, what they will do, and how they will work together to solve problems. I’ve covered contributions from various experts on the state of “agentive” AI and what this means for the advancements that will occur in 2025 and beyond. It’s big news.
Fluid network research
When Maxime Labonne took the stage, he talked about the different stages of LLM development with new fluid networks that can do more with fewer parameters. This is changing the math in the markets and enabling companies to do much more in enterprise IT. Disclaimer: I have been consulting with the team at the MIT CSAIL lab working on liquid networks and the company Liquid AI where Labonne is working.
Moving on to the applied science here, Labonne talked about the query phase, in which engineering teams can apply their processes to model usage and preference alignment, as well as finalizing features. By adding knowledge, he suggested, you can adjust these functions to give the system better power for accuracy with a variety of samples.
“During supervised engineering,” he explained, “what we do is … we give instructions and answers to the model and the input, and we ask the question, and we teach the model to answer the question, and we do that a lot. We teach the model a structure, a conversation template that we’ll talk about later, … Then it’s a model that’s able to follow directions, it’s a model that’s able to answer questions.”
Regarding the scope of preferences:
“During preference alignment, we give preferences to the model,” Labonne explained. “We give not only one question and one answer, we give two answers. One is the chosen answer. This is how we want the model to behave, and the other is the rejected response.”
He also talked about data formats, such as real-life conversations as a dataset, and why this is so valuable and important. Speaking of post-training tools and automated benchmarks, eh also had a few things to say about scoring and the role of human bias.
“People are also incredibly biased,” he said. “We like to think we’re kind of the definitive (source) for evaluation, but we’re really not.”
I appreciated both of these professionals visiting the classroom: it all illuminates what researchers are doing to make their work in LLM systems more meaningful and to add value to those processes of bringing products and services to market.
So if you’re in the enterprise, trying to understand the context of AI right now, maybe you can benefit from these ideas coming out of the classroom. I will bring you more as we go through the semester.