🛠️ BuildAIers Talking #3: MLOps on $0 Budget, Agentic AI, RAG in Prod, K8s vs. Slurm, and Sensitive Data Detection
Community Insights #3: From budget MLOps stacks and RAG deployments to the agentic AI debate—here’s what AI builders are talking about this week inside MLOps Community.
The discussions and insights in this post originally appeared in the MLOps Community Newsletter as part of our weekly column contributions for the community (published every Tuesday and Thursday).
Want more deep dives into MLOps conversations, debates, and insights? Subscribe to the MLOps Community Newsletter (20,000+ builders)—delivered straight to your inbox every Tuesday and Thursday! 🚀
Every Thursday, we surface the most thought-provoking, practical, and occasionally spicy conversations happening inside the AI community so you never miss out—even when life gets busy.
In this edition of 🗣️ BuildAIers Talking 🗣️, we’re spotlighting five conversations you won’t want to miss from the MLOps Community Slack. Get practical tips on zero-budget MLOps, agentic definitions, RAG stacks in production, Kubernetes vs. SLURM for model quality, and the best tools for detecting sensitive data.
Ready? Jump right in! 🚀

🔧 MLOps 101: Can You Run MLflow on a $0 Budget?
How can we set up a free or nearly free MLOps stack for MLflow, artifact storage, and a publicly accessible database without breaking the bank?
Claus Agerskov is running an MLOps 101 course with a tight budget and is looking for ways to host MLflow’s DB and artifacts with public access and free-tier infrastructure.
His current stack: MLflow in GitHub Codespaces, storing artifacts on S3.

Key Insights:
Médéric Hurier recommended using SQLite locally for small-scale demos, but GitHub Actions (CI) needs public access to write artifacts. Médéric also shared his open-source MLOps coding course.
Shubham Gandhi pointed out that DagsHub offers free hosted MLflow servers with experiment tracking and artifact management.
David suggested Supabase and Modal as options for database and infra, though setup and usage constraints vary, so you’d need to check their docs to be sure you wouldn’t have to pay for certain features.
Takeaway:
If you’re running a small-scale or educational setup, mixing local DB (e.g., SQLite) + free-tier cloud services or community platforms like Dagshub can jumpstart your MLOps pipeline without upfront costs—just be mindful of public access and ingress/egress costs.
🔗 Join the conversation in MLOps Community →
🕵️♂️ Presidio, AWS Comprehend, or Macie: Which Tool Is Best for Detecting Sensitive Data?
When removing or detecting PII from documents, do you pick Amazon Macie, Comprehend, or Microsoft Presidio—and why use both in tandem?
In another thread in #mlops-questions-answered, Adam Becker wanted to know how teams balance cost and effectiveness when it comes to PII/sensitive detection.

Key Insights:
Médéric Hurier shared his team’s dual approach: using Presidio for high-volume batch processing and AWS Comprehend for low-volume tasks. His take:
Microsoft Presidio: Good for larger data batches; open-source but may need more set-up.
Amazon Comprehend: Convenient for low-volume text processing (10–20 pages daily) and on-demand usage.
Amazon Macie: Ideal cost-saving alternative if you can batch your data, though it’s more specialized in S3-based workflows.
While no formal benchmarks were available, Médéric’s take helps teams balance cost, volume, and inference mode.
Trade-offs: It often comes down to cost, batch vs. real-time, and how much data you’re processing per day (volume of documents).
Takeaway:
Small, daily documents may suit a “managed service” approach (Comprehend).
Open-source solutions like Presidio or a managed service like Macie could benefit large-scale or varied workloads.
Choose a service based on data volume, cost considerations, and team familiarity—sometimes combining solutions optimizes cost and coverage.
🧠 Is Your AI App Truly “Agentic” or Just a Workflow in Disguise?
How do you define “agentic AI,” and where do you draw the line between simple, deterministic workflows and autonomous decision-making agents?
Médéric Hurier opened a discussion around Google's definitions of "Agentic Workflow" and "AI Agents."
Lots of responses on the range between deterministic workflows and AI that can make its own decisions.

Key Insights:
Alex Strick van Linschoten pointed out that many AI "agents" in production are actually more like structured workflows and aren't as autonomous as people think.
Igor Mameshin and Amy Bachir agreed that “agentic” is best seen as a spectrum—the more autonomy and decision-making an AI exhibits, the higher its agentic quality.
The consensus: A blend of fixed and dynamic, autonomous actions is key, with agentic workflows often relying on the ReAct pattern to combine reasoning and action.
Takeaway:
“Agentic” isn’t all-or-nothing—your AI can become more agentic as it gains autonomy and tool access, but robust guardrails are crucial.
Evaluate how much autonomy your system truly needs. A 'semi-agentic' approach, which mixes set workflows with AI-powered choices, is usually the best fit for real-world deployments.
🧪 4. Kubernetes vs. Slurm: Does Your Training Framework Really Impact LLM Quality?
Do SOTA (state-of-the-art) models get better results on Slurm (e.g., Stable Diffusion) vs. Kubernetes (e.g., DBRX), and does it even matter?
In #discussions, Demetrios cites examples—Stable Diffusion’s slurm-based setup vs. massive k8s clusters at TikTok and Bloomberg.
Is K8s only for mega-corporations, or can smaller teams really make it work without hitting GPU or networking roadblocks?

Key Insights:
Patrick Barker explained that Kubernetes isn’t naturally optimized for the high-throughput, low-latency networking needed by LLMs and struggles with scaling across regions.
Prasad Paravatha pointed out that while big companies like OpenAI (also see most recent post) and TikTok use Kubernetes, smaller companies may struggle with limitations like not having enough GPUs and the cost of running Kubernetes. These companies might opt for specialized HPC (high-performance computing) solutions like Slurm.
Trade-off: K8s offers flexibility and easier deployment at scale but may add complexity for large-scale GPU training.
Takeaway:
For smaller organizations, HPC-focused solutions (Slurm) or managed HPC cloud services might be more cost-effective and performant than large K8s clusters—especially when GPUs are scarce or distributed.
K8s can handle big loads but often requires bigger budgets and deeper expertise.
⚡️ Oh, hi! Since you are here, see also: Designing an NVIDIA GPU Architecture from Scratch.
🧱 What's Your Go-To Stack for Production-Ready RAG Systems?
Which retrieval-augmented generation architecture do you trust for real-world deployments, and how do you ensure security, accuracy, and performance?
Over in #llmops, Samuel Taiwo is building an RAG system with LangChain, Azure OpenAI, and security guardrails, but he's looking for ways to improve it.
Folks suggest adding a vector database (Pinecone, Elasticsearch, or Postgres), re-ranking strategies (BGE-m3, Cohere), and stronger security measures.
🪣 See our comparison blog: Top 6 AI Vector Databases Compared (2025): Which One Should You Choose as an AI Builder?
How do you build a production-grade RAG system?

Key Insights:
Richa suggested adding hybrid search methods (vector databases like Azure AI Search or Pinecone + keywords), re-ranking strategies (e.g., BGE-m3 or Cohere Reranker), and caching mechanisms like storing previous query-response pairs in Redis or a local database to reduce latency and minimize duplicate computation.
mark54g, who works with Elasticsearch, suggested using the tool because it has a broad range of features (search and embedding flexibility) compared to Pinecone.
William Han recommended Fiddler Guardrails as a free enterprise-grade option for content safety, including moderation and prompt injections.
PhilWinder recommended PostgreSQL solutions like pgvector for solid reliability.
Apurva Misra shared her YouTube talk on evaluating and improving RAG systems.
Igor Mameshin mentioned online/offline evaluations, document-level permissions, and accuracy improvement techniques.
Valdimar Eggertsson recommends trying different embeddings, like Voyage AI or some top performers on the Hugging Face leaderboard, and evaluating your results.
Takeaway:
Combining short text chunks, quality embeddings, and re-ranking can drastically improve your RAG stack. Please remember robust guardrails for privacy, security, and user trust!
🔗 See how people are building →
⬇️ Reply to this post and let us know your experience!
🎉 That’s all for this edition! Know someone who would love this? Share it with your AI builder friends!
We created 🗣️ BuildAIers Talking: Community Insights 🗣️ to be the pulse of the AI builder community—a space where we share and amplify honest conversations, lessons, and insights from real builders. Every Thursday!
📌 Key Goals for Every Post:
✅ Bring the most valuable discussions from AI communities to you.
✅ Build a sense of peer learning and collaboration.
✅ Give you and other AI builders a platform to share your voices and insights.