I Built a Pipeline That Turns 40 YouTube Cooking Videos into Obsidian Recipe Notes
My YouTube “watch and cook later” playlist had ballooned to 43 videos. I kept saving cooking videos and never actually using them — every time I stood in the kitchen I’d end up pulling out my phone and reopening YouTube, wondering “wait, how did that recipe go again?” I’ve lost count of how many times I’ve done that. I already manage my recipes in Obsidian, so if I could just convert the contents of these videos into Obsidian notes, the problem would go away. I gave it a shot with Cowork (the desktop version of Claude), and the process turned out to have more twists and turns than I expected — worth writing down.
First, I tried one video by hand
I started simple: I handed Cowork a video URL and asked it to “put this recipe into Obsidian.” It was a video of chef Masahiro Kasahara making homemade nametake (simmered mushrooms). Cowork can open YouTube through a Chrome extension, so it read the description, searched the web for supplementary details, and somehow managed to produce a recipe from that one video.
But this approach has a fatal flaw: the token cost per video is brutal. Browser operations, page fetches, web searches, text formatting — do that 43 times and you’ll hit the session limit with room to spare. I needed a different approach.
Transcribing locally with Whisper
The part burning through all those tokens is “understanding what’s in the video.” If I could handle that locally, all I’d need to hand Claude would be a short block of text.
I wasn’t sure my MacBook Neo (A18 Pro / 8GB) could handle it, but Whisper’s medium model (750MB) turned out to run comfortably. Thanks to Apple Silicon’s Neural Engine, a 12-minute video transcribes in one to three minutes.
I used two tools: yt-dlp (to pull just the audio from YouTube) and mlx-whisper (a Whisper implementation optimized for Apple Silicon).
The setup wasn’t exactly smooth, though. First, yt-dlp requires ffmpeg, so I installed that too. Then, trying to install mlx-whisper, I found pip was outdated and needed an upgrade first. On top of that, the system Python was 3.9, and mlx-whisper requires 3.10+. Installing Python 3.12 via Homebrew ran straight into PEP 668’s “externally-managed-environment” restriction. In the end, setting up a venv was what actually solved it.
brew install yt-dlp ffmpeg python@3.12
python3.12 -m venv ~/scripts/.venv
source ~/scripts/.venv/bin/activate
pip install mlx-whisperAs always, the time sink is the environment setup you have to get through before you even reach the actual task.
Hitting the wall of Cowork’s sandbox
At first I tried running both yt-dlp and Whisper inside Cowork’s Linux sandbox, but access to YouTube got blocked by the proxy. Apparently the sandbox isn’t allowed to freely reach external sites. So I ended up running the audio download and transcription locally on the Mac, and only passing the resulting text to Cowork.
I tried Ollama, then gave up on it
I wanted to keep the conversion from transcript to recipe format local too, so I tried running gemma3:4b through Ollama. It technically worked, but the accuracy of pulling ingredients and steps out of a cooking transcript wasn’t great — especially around catching spoken quantities and separating actual recipe content from small talk.
In the end, I settled on a division of labor: local Whisper for the heavy lifting (transcription), and Claude for the intelligent part (formatting into a recipe). Leave it to the specialists.
Turning it into a Cowork skill
Cowork has a mechanism called “skills.” You write out how to do a task in Markdown, and from then on you just say something like “turn this into a recipe” and it follows that skill.
The skill I built here is simple: it reads the transcript text sitting in _transcripts/, converts it to match the format of my existing recipes (frontmatter + ingredient list + steps), and saves it into the Obsidian folder. The skill also has real examples of existing recipes embedded in it, which keeps the output format consistent.
Fully automating it with the Claude CLI
The last piece was the Claude Code CLI. Since you can pass it a prompt from the terminal with claude -p, I chained the whole thing together in a shell script:
#!/bin/bash
URL="${1:-https://www.youtube.com/playlist?list=PLh4huXs8Qi9HkMIXptPbw2gIYUDNtqNic}"
# Step 1: Transcribe locally (progress is visible)
"$HOME/scripts/.venv/bin/python3" ~/scripts/transcribe.py "$URL"
if [ $? -ne 0 ]; then
echo "Transcription failed. Aborting."
exit 1
fi
# Step 2: Turn it into recipes with the Claude CLI
claude -p "Convert the .txt files in _transcripts into Obsidian recipe format and save them"In Step 1, Whisper’s progress shows up as [1/43], [2/43], and so on, and in Step 2 the Claude CLI batch-converts everything into recipes. If transcription fails across the board, it aborts before reaching Step 2. Early on I hit a trap where the Claude CLI would still run even after an error, burning API costs for nothing. Now a single run of ~/scripts/recipe turns the entire playlist into Obsidian recipe notes.
The final architecture
The heavy lifting (audio to text) runs on the Mac’s Neural Engine, and the intelligent work (text to structured recipe) runs on Claude — each handling what it’s best at.
Takeaways
What started as a simple wish — “I want to get YouTube recipes into Obsidian” — turned out to run into a string of obstacles once I actually tried it: Cowork’s sandbox has network restrictions, YouTube’s caption API isn’t straightforward to pull from, and local LLM accuracy falls short depending on the task. On top of that, I stepped on all the usual environment-setup landmines, like Python version conflicts and needing a venv. In the end, I landed on a combination of local Whisper plus the Claude CLI.
What’s interesting is that this whole process of trial and error unfolded as a conversation with Cowork. Whenever an error came up, I’d paste the terminal output, Cowork would suggest the next move, and I’d try again. As a debugging pair-programming partner, it was genuinely solid. Now that the 43-video playlist is sorted, maybe I’ll end up cooking more. Or maybe I won’t.