Building in AI

Building in AI

Tactical, practitioner-focused content for people building AI products — model selection, infrastructure, prompt engineering, and what actually works in production.

100 clips · 48 podcasts
Listen to the Playlist

Curated by Ridealong from the best podcast clips.

Top Podcast Clips

Super Data Science: ML & AI Podcast with Jon Krohn
“does model selection really matter? I mean, are your clients like picking, oh, okay, I'm going to use this Lama model of this size or Quinn or do they make those decisions? Or is this something that's kind of like handled automatically by the autonomous intelligence system? So model selection matters, but it's also exhausting. So funny thing. There are two interesting phenomena that's so unique to AI. One is the model depreciation is very fast. As you can observe, every couple of weeks, there's …”
Ridealong summary
Model selection in AI is not just important; it's exhausting due to rapid changes in technology. With new models and hardware launching almost weekly, developers struggle to choose the best fit for their evolving use cases. This dual challenge of model and hardware depreciation creates a significant headache for application developers trying to keep up.
Super Data Science: ML & AI Podcast with Jon Krohn · 971: 90% of The World’s Data is Private; Lin Qiao’s Fireworks AI is Unlocking It · Mar 03, 2026
I've Had It
“Jen, you just came back from Mexico. You were talking about your dogs. There's a lot of new paperwork you have to have. And so there's this story going viral of this woman that went to an airport with her dog. And then I'll let the video play. It'll explain it. We're back with a dog left alone at the airport in Las Vegas. Video shows the woman leaving the Golden Doodle Mini Poodle at a JetBlue ticket counter earlier this month. She didn't have the right documents to travel with it as a service …”
Ridealong summary
A woman recently abandoned her dog, a Golden Doodle Mini Poodle named Jet Blue, at a JetBlue ticket counter in Las Vegas due to lack of proper travel documents. The dog was rescued and adopted by the officer who found him, but the incident raises questions about irresponsible pet ownership and societal trends, such as performative hydration with trendy cups. This story highlights the emotional impact of such neglect and the importance of responsible pet care.
I've Had It · Are We Great Yet? · Mar 05, 2026
Machine Learning Street Talk (MLST)
“And one of the extremely controversial things we did was we felt that we should focus on fine-tuning existing models because we thought fine-tuning was important. Some other folks were doing work contemporaneously with that. So Jason Ussinsky did some really great research. I think it was during his PhD on how to fine-tune models and how good they can be and some other folks in the computer vision world. We were, you know, amongst the first. There was a bunch of us kind of really investing in …”
Ridealong summary
Fine-tuning existing models can drastically improve performance, especially when using discriminative learning rates. Researchers discovered that training only the last few layers significantly speeds up the process, challenging previous assumptions about learning rates. This approach, combined with the necessity of fine-tuning batch normalization layers, has transformed how we think about transfer learning.
Machine Learning Street Talk (MLST) · "Vibe Coding is a Slot Machine" - Jeremy Howard · Mar 03, 2026
Super Data Science: ML & AI Podcast with Jon Krohn
“holding multiple possibilities in parallel backtracking when needed and converging on solutions that satisfy all rules simultaneously These are precisely the skills needed for countless real challenges in medicine law operations planning and many other spaces domains where you're balancing competing constraints under uncertainty. A system that can reason through these spaces natively, rather than forcing everything into a text-based chain of thought, could eventually do more than summarize …”
Ridealong summary
A groundbreaking architecture called BDH achieves a staggering 97.4% success rate on Sudoku puzzles, while traditional transformer models fail at 0%. This shift highlights BDH's potential to revolutionize AI capabilities beyond mere information summarization, enabling strategic reasoning in complex fields like medicine and law. With evidence mounting against transformers, the future of AI could be on the brink of a significant transformation.
Super Data Science: ML & AI Podcast with Jon Krohn · A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%) · Mar 27, 2026
The a16z Show
“The biggest moat is going to be which companies understand something that's super hard for other people to understand. And if your answer to that is, I don't know, then you maybe could get vibe coded away. Block was one of the first to make a pretty drastic decision in cutting 40% of the workforce. What led up to that decision? There's been this correlation between the number of folks at a company and the output from the company for decades and decades. I think that basically broke. And what …”
Ridealong summary
Block dramatically cut 40% of its workforce to test a radical new approach: restructuring around small teams and AI agents. This shift has shown that a few engineers can now achieve productivity levels previously thought impossible, reshaping the company's future. With tools like BuilderBot, Block is redefining what it means to build software efficiently.
The a16z Show · What Happens When a Public Company Goes All In on AI · Apr 01, 2026
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
“But you've mentioned a few times the communication. I know you're a big voice user, so I'd love to hear what your voice setup looks like or if there's anything special that you've learned that think people should follow your example on. And then you mentioned Slack too. And I'm curious about even just such practical details. Is it like one to one chats with five open calls? Or is it one channel where you like tag which one you want to assign to things and, but they can all see what's going on. …”
Ridealong summary
Using AI agents effectively requires a strategic communication setup. By leveraging platforms like Slack, the speaker has created a unique channel system that enhances team coordination and agent interaction. This innovative approach not only simplifies communication but also ensures that tasks are managed efficiently through context-aware prompts.
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis · Try this at Home: Jesse Genet on OpenClaw Agents for Homeschool & How to Live Your Best AI Life · Mar 08, 2026
Super Data Science: ML & AI Podcast with Jon Krohn
“um let's let's shift gears now a little bit chris to talking about inference specifically uh you know we talked earlier in this episode about how inference is most of what's happening at scales with ai models training is an important part but inference is most of what we're doing with gpus these days and so you talk in the book about multi-node inference and disaggregated architectures. What the heck are those? Yeah. Welcome to another chapter that was supposed to be a single chapter or a topic …”
Ridealong summary
AI inference is more complex than it seems, involving intricate processes that ensure quick responses from models like ChatGPT. While the concept of a simple forward pass sounds straightforward, it actually requires sophisticated caching strategies to avoid repetitive calculations. This is why understanding multi-node inference and KV cache is crucial for enhancing AI performance.
Super Data Science: ML & AI Podcast with Jon Krohn · 973: AI Systems Performance Engineering, with Chris Fregly · Mar 10, 2026
Latent Space: The AI Engineer Podcast
“They basically said, hey, what if we gave it more experts? So we're going to use more memory capacity, but we keep the amount of activated experts the same. We increase the expert sparsity. So we have fewer experts. The ratio of experts activated to number of experts is smaller. And we decrease the number of attention heads. And kind of for context, what we had been seeing was you make models sparser instead. So no one was really touching heads. You're just having. Well, they implicitly made it …”
Ridealong summary
Increasing expert sparsity in model training can enhance efficiency without sacrificing performance. By adjusting the number of activated experts and attention heads, researchers are discovering new ways to optimize models for specific tasks. This approach highlights the importance of hardware and model co-design in achieving better outcomes.
Latent Space: The AI Engineer Podcast · NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo) · Mar 10, 2026
The a16z Show
“Here's my pitch from a computer science point of view. Pretty rare if people ask me this question. That is, if you're working at a vertically integrated company that have an end product for, let's say, for chatbots, for a system, you are working on the vertical slice of the problem. In Infract, you will be working on a abstraction of horizontal layer. And this is similar to operating system, databases, and different kinds of abstraction that people have built over the years. Operating system, …”
Ridealong summary
The future of machine learning lies in creating a universal inference layer that abstracts the complexities of accelerated computing devices. This innovative approach parallels the evolution of operating systems and databases, enabling developers to build more efficient systems for AI applications. At Infract, we are pioneering this essential layer, pushing the boundaries of inference technology.
The a16z Show · Inferact: Building the Infrastructure That Runs Modern AI · Jan 22, 2026
TBPN
“This is great news. To form a joint venture that will distribute enterprise products across the firm's portfolio companies and beyond. the proposed deal as a free money valuation. AI is coming to Fogo de Chao. Bain Capital owns Fogo de Chao, the Brazilian steakhouse. Maybe now they can take Apple Pay. Oh yeah, we got cooked on that. Maybe they deliberately don't. I wonder if Apple Pay is expensive for them and this is actually a cost consideration. Fiji Simo said, this news came out a little …”
Ridealong summary
OpenAI is launching a joint venture to distribute enterprise AI products, a move fueled by a skyrocketing demand from over a million businesses. This initiative, which includes embedding engineers into companies, aims to help enterprises effectively deploy AI solutions. With API usage surging and a dedicated deployment arm in the works, OpenAI is sprinting to meet the urgent needs of the market.
TBPN · OpenAI Ends Side Quests, SF Housing Market is Back, Kalshi’s $1B Prize | Diet TBPN · Mar 17, 2026
Last Week in AI
“If you fuse too aggressively you make the kernel too complex and that blows your memory register your memory budget and ironically it makes it slower So the AI has to look at your whole graph and plan of your computations to actually execute this properly And anyway there like a whole bunch of challenges associated with this The the the art of knowing how to let data just flow through your system and, and how to spin up, uh, spin up kernels that do what you want efficiently is just really, …”
Ridealong summary
A groundbreaking approach uses reinforcement learning to optimize CUDA kernels, potentially surpassing traditional methods. By automatically generating 6,000 training problems from PyTorch, the AI learns to write, compile, and improve CUDA code efficiently. This innovative method could change how developers tackle kernel optimization challenges in AI research.
Last Week in AI · #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!! · Mar 16, 2026
TBPN
“Take me through DepreciationGate. How did you process that, and where do we stand now with the fear that GPUs will depreciate precipitously? And H100s will be worthless in 6 to 12 months. It's totally not a problem right now. Like CoreWeave has talked about these things are lasting 5 to 6 years. And they're getting like almost 90, 95% of the pricing. So it could potentially be a problem if this is a bubble. I don't think it's a bubble. But if it's a bubble 2, 3 years from now and there's a …”
Ridealong summary
Currently, fears of GPUs depreciating drastically in value are unfounded, as demand for AI compute is outpacing supply. Companies like CoreWeave report that even older GPUs are still in high demand, with rental prices remaining strong. However, if a compute glut occurs in the next few years, it could change the landscape dramatically.
TBPN · The Lawyer Who Beat Meta and Google, Revisiting The Jetsons, Japan Twitter | Tae Kim, Logan Bartlett, Sam Stephenson, Ben Broca, Brett Adcock, Andrei Serban · Mar 30, 2026
Dwarkesh Podcast
“there enough uh victory giant is there enough pcb this is like one of the largest suppliers of pcbs to nvidia and they're a chinese company all the all the pcbs come from china sort of from them um or many of them and anyways they're like do you have enough pcb capacity great oh hey uh memory vendors who has all the memory capacity okay nvidia does great um so when you look at sort of in the same way you know who who is agi pilled enough to buy compute and long timelines at levels that seem …”
Ridealong summary
NVIDIA is outpacing competitors like Google and Amazon in the AI chip market due to its aggressive data center expansion and better supply chain management. While Google struggles to deploy enough TPUs, NVIDIA is capitalizing on the growing demand for GPUs, positioning itself as a leader in accelerated computing. This dynamic highlights the critical role of semiconductor supply chains in the race for AI dominance.
Dwarkesh Podcast · Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute · Mar 13, 2026
Last Week in AI
“And anyway, a lot to say from the first hand experience of using those models, but for the sake of speed, we should probably move on. And next model release coming from Google, also fast model, Gemini 3.1 Flash Lite, getting an improvement in both cost and speed. They say this is 2.5x faster time to first token. And first token time, by the way, when you're using fast models, typically you want them for shorter tasks. At least often that's the case. You want to have a quick output with a short …”
Ridealong summary
Google’s Gemini 3.1 Flash Lite model boasts a staggering 2.5x faster time to first token, making it a game-changer for quick tasks. With a 45% increase in overall output speed, it delivers 360 tokens per second, creating an almost instantaneous experience. This leap in speed not only enhances user perception but also significantly improves operational efficiency, making it a vital update for interactive products.
Last Week in AI · #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk · Mar 12, 2026
Syntax - Tasty Web Development Treats
“Speaking of AI stuff, I noticed you all have both an MCP and a llms.txt, which is just the docs in a text format. What's the MCP doing for you all? It's just doc search. I believe it's using the Cloudflare AI gateway and then Autorag pointed at the doc site itself. So just an easier way, you know, I think if you just like pointed at that server and tell it to install Varlock and do the onboarding, it'll do a pretty good job. But I probably need to retest that because it's been a few weeks. …”
Ridealong summary
The MCP enhances document search by leveraging the Cloudflare AI gateway, making it easier to access and manage documentation. This tool simplifies onboarding and server installations, proving invaluable for tech startups like the one discussed at Ben Vinegar's event. The integration with GitHub actions also streamlines environment variable management, a common pain point for developers.
Syntax - Tasty Web Development Treats · 985: Stop putting secrets in .env · Mar 09, 2026
Elon Musk Podcast
“And they are making physical changes to the booster to help with that exact recovery process, right? Yes. Booster 19 has new hardware. They installed three grid fins that are 50% larger than the previous generation. 50% larger? That's a massive aerodynamic change. It is. And they are mounted lower on the hull and equipped with specific lifting pins. Those lifting pins are the contact points for the mechanical arms. Correct. The entire system is being optimized for rapid capture and turnaround. …”
Ridealong summary
Booster 19 features 50% larger grid fins designed to enhance its recovery process, aiming for a safer long-term reusability. However, engineers will only attempt land catches after two successful ocean landings, prioritizing safety over speed. This careful approach minimizes risks while optimizing the entire system for rapid turnaround.
Elon Musk Podcast · Starship V3 Launch update · Mar 11, 2026
The Growth Podcast
“So while the PRD is generating, can we take a look at that Cloud MD file? Yep. Let me pull it up for you now. And that trick, guys, that he's doing there to open preview, make sure you do that because it's a pain to look at markdown files with all of those pound symbols. Annoying. Cursor, it's actually on their forums. You'll see me on the forums whinging about it every month. don't know why they can't fix it but there used to be a setting that by default it would open up in preview anyway but …”
Ridealong summary
To enhance your Product Requirements Document (PRD), leveraging AI tools like Claude can streamline the process and improve efficiency. By using a structured Claude MD file and session start hooks, product managers can ensure adherence to best practices while maintaining version control through GitHub. This approach not only tidies up documentation but also maximizes the potential of AI in product management.
The Growth Podcast · This CPO Uses Claude Code to Run his Entire Work Life | Dave Killeen, Field CPO @ Pendo · Mar 11, 2026
Brad & Will Made a Tech Pod.
“Things were a bit crass back then. Also, speaking of crass, this editorial at the beginning from John sure hits a little different in this modern era of hardware being like wildly unaffordable. And like, here he is going like, hey, make games that require this brand new $500 graphics card. Like, stop supporting the entire rest of the market. Well, but it's, hey... I get it. I mean... Like this was before crisis, right? We didn't know how bad it would be if you released a game that wouldn't run …”
Ridealong summary
The GeForce 3 graphics card revolutionized game development when it launched in 2001, paving the way for advanced technologies like ray tracing. At the time, its $500 price tag was a gamble that many developers took, not knowing how it would affect the industry's future. This rapid evolution in graphics technology led to a compressed timeline of innovations that shaped the gaming landscape we know today.
Brad & Will Made a Tech Pod. · 329: A Plaid Decade · Mar 08, 2026
TFTC: A Bitcoin Podcast
“Or that could be like a cloud skill or something. It could be anything. Yeah. Anything that's writing down steps. Skills can be tools, though, too. I'm less familiar with how Cloud does the skills. But then the last piece is memory right So and memory right now is in the form of files So whenever you talk to your club bot or whatever you calling these things is just it writing down these markdown files and it reading them So the analogy people use is that movie Memento where the guy wakes up …”
Ridealong summary
AI agents are struggling with memory retention, akin to Alzheimer's patients missing chunks of their past. As these models rely on text files for memory, they face limitations in processing long sequences, leading to mistakes and lost information. Exploring alternatives like graph databases could be the key to enhancing AI memory capabilities.
TFTC: A Bitcoin Podcast · #726: Mapping The Mind Of The Machine with Brian Murray & Paul Itoi · Mar 14, 2026
Behind the Craft
“I do think the cursors and the UI makes a huge difference. Because if I use cloud code, I can theoretically do this too, but it's just generally a bunch of code that I can't see. But being able to see this happen is like a game time changer. It's actually really hard. I mean, sure, a lot of these platforms that are now going into parallel mode, you know? Yeah. They're like assigning different roles to different agents and so on. But what I wanted really to do is to have parallelism with same …”
Ridealong summary
Parallelism in AI design agents can drastically speed up processes by allowing multiple agents to work simultaneously on the same task without conflict. This approach not only enhances efficiency but also provides visibility into each agent's contributions, making the workflow easier to manage. Imagine instructing AI agents to build something while you grab coffee, all while knowing they have a clear plan in front of you.
Behind the Craft · I Watched 6 AI Agents Design an App In Real Time And It Blew My Mind | Tom Krcha · Mar 08, 2026

Top Podcasts in This Playlist

The a16z Show
The a16z Show
5 clips
Latent Space: The AI Engineer Podcast
Latent Space: The AI Engineer Podcast
5 clips
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
4 clips
TBPN
TBPN
4 clips
Super Data Science: ML & AI Podcast with Jon Krohn
Super Data Science: ML & AI Podcast with Jon Krohn
3 clips
Last Week in AI
Last Week in AI
3 clips
Elon Musk Podcast
Elon Musk Podcast
3 clips
TFTC: A Bitcoin Podcast
TFTC: A Bitcoin Podcast
3 clips

Stories in This Playlist

Top Podcasts on AI Agents & Productivity
AI agents are being integrated into workflows across multiple industries, leading to significant productivity improvements. These AI-driven solutions automate routine tasks and provide advanced data analysis, allowing businesses to optimize operations and focus on strategic initiatives. This trend highlights the growing importance of AI in transforming traditional business processes.
Mar 29, 2026 · 12 clips · 8 podcasts
Best Podcasts on Nvidia's GB300 & DLSS 5
At the GTC 2026 conference, Nvidia unveiled its new GB300 chip and DLSS 5 technology, projecting a trillion-dollar revenue forecast. This highlights Nvidia's continued innovation and dominance in the AI and graphics sectors, impacting the broader tech industry.
Nvidia DLSS 5
Mar 18, 2026 · 40 clips · 17 podcasts
Top Podcasts on AI Agents & Workforce Changes
The rapid development and deployment of AI agents, particularly tools like Claude Code and OpenAI's enterprise focus, are transforming software development and knowledge work. While promising massive productivity gains and enabling non-technical users to build software, this shift is also raising concerns about job displacement, especially in entry-level white-collar roles, and the need for new security and governance frameworks.
War
Mar 17, 2026 · 24 clips · 14 podcasts
Best Podcast Episodes on AI's Impact on Jobs
Artificial intelligence continues to be a dominant topic, with podcasts exploring its profound effects on the labor market and the broader economy. Discussions range from the potential for AI to displace white-collar jobs and create new opportunities, to the ethical implications of AI-generated content and the emergence of an 'AI bubble.' The conversation also covers how AI agents are changing workflows and the race among tech giants like OpenAI and Google.
Mar 14, 2026 · 32 clips · 17 podcasts