OP here. Just a quick breakdown of the technical architecture for those interested.
I realized early on that using "One Giant System Prompt" was like hiring one chef to make Sushi, Pizza, and Pastries simultaneously. The result was always mediocre.
So, I tore down the v1 architecture and rebuilt the backend using Next.js with what I call a "Kitchen Brigade" Agentic Workflow:
1. The "Maître D'" (The Router Layer): Before any generation happens, a lightweight model intercepts the request. It doesn't generate the answer. It classifies the intent: Is this a Logic problem? A Creative writing task? Or an emotional complaint? This routing step is crucial for latency vs. quality trade-offs.
2. Dynamic Prompt Assembly (The Assembler): Instead of a static system prompt, I assemble the "Order Ticket" dynamically based on the User Context.
Output: A constructed prompt that instructs the model on how to think, not just what to say.
3. The Model Matrix (The Specialists): The backend routes to different models based on the Maître D's classification:
DeepSeek/o1: For heavy logic, math, and "First Principles" derivations.
Claude 3.5/Gemini: For high-EQ responses and nuanced creative explanations.
GPT-4o-mini: For quick, routine routing tasks to keep latency low.
It’s all deployed on Vercel.
I'm curious: For those of you building Agents, do you prefer this kind of "Hard Routing" (explicit logic) or do you trust the newer models to "Self-Route" via tool use? Would love to hear your thoughts on the architecture.
I realized early on that using "One Giant System Prompt" was like hiring one chef to make Sushi, Pizza, and Pastries simultaneously. The result was always mediocre.
So, I tore down the v1 architecture and rebuilt the backend using Next.js with what I call a "Kitchen Brigade" Agentic Workflow:
1. The "Maître D'" (The Router Layer): Before any generation happens, a lightweight model intercepts the request. It doesn't generate the answer. It classifies the intent: Is this a Logic problem? A Creative writing task? Or an emotional complaint? This routing step is crucial for latency vs. quality trade-offs.
2. Dynamic Prompt Assembly (The Assembler): Instead of a static system prompt, I assemble the "Order Ticket" dynamically based on the User Context.
Input: User's selected "Depth Level" (ELI5 vs PhD) + Intent Class.
Output: A constructed prompt that instructs the model on how to think, not just what to say.
3. The Model Matrix (The Specialists): The backend routes to different models based on the Maître D's classification:
DeepSeek/o1: For heavy logic, math, and "First Principles" derivations.
Claude 3.5/Gemini: For high-EQ responses and nuanced creative explanations.
GPT-4o-mini: For quick, routine routing tasks to keep latency low.
It’s all deployed on Vercel.
I'm curious: For those of you building Agents, do you prefer this kind of "Hard Routing" (explicit logic) or do you trust the newer models to "Self-Route" via tool use? Would love to hear your thoughts on the architecture.