Blog

Thoughts and insights about reliability, performance, observability and AI.

EN
AILLM

Mastering LLMs: How Strategic Prompting Transforms Technical Outputs

Learn fundamental prompt engineering techniques including Zero-shot, Few-shot, Chain-of-Thought, and role-specific prompting to achieve professional-grade AI outputs.

4 min
Read more
EN
AIDORA

Beyond the Hype: How AI Integration Impacts DORA Metrics and Software Performance

Explore how AI adoption affects DORA metrics, the new fifth metric (Deployment Rework Rate), and the seven organizational capabilities needed to turn AI into a performance amplifier rather than a bottleneck.

5 min
Read more
EN
AIRAG

The Architect's Guide to Hybrid Search, RRF, and RAG in the AI Era

Traditional search engines excel at exact matches but fail to grasp user intent. Learn how hybrid search combines lexical and vector methods with RRF to build accurate, context-aware retrieval systems.

5 min
Read more
EN
AIAgents

From Chatbots to Autonomous Agents: The 7 Patterns of Agentic AI Evolution

Software development is transforming as natural language becomes the primary programming interface. Learn seven AI patterns from simple loops to autonomous agent-to-agent systems and Model Context Protocol.

5 min
Read more
EN
AIGenAI

The Reality of Enterprise AI: Avoiding Common Pitfalls When Scaling GenAI

Moving from a Generative AI pilot to an enterprise-wide rollout is not a linear process. Learn the lessons learned by technical leaders at Liberty Mutual during their journey of deploying GenAI at scale, including adoption strategies, cost management, and RAG complexity.

5 min
Read more
EN
ObservabilityLGTM

Observing the Stochastic: Tuning the LGTM Stack for AI Infrastructure

LGTM—Loki, Grafana, Tempo, Mimir—has earned its place in production environments. This article explores how well this stack holds up when pushed into one of the most hostile observability environments: LLM and ML production systems.

6 min
Read more
EN
AWSAI

The AWS AI Stack: Moving Beyond Proof-of-Concept to Five-Nines Reliability

AWS can absolutely support Tier-1 AI systems, but only if you treat AI workloads as first-class distributed systems, not experiments wrapped in SDKs. This is an evaluation of what actually holds up when you push beyond a demo, beyond a PoC, and toward five-nines expectations.

6 min
Read more
EN
AzureAI

Azure for AI: An SRE's Guide to Provisioning for High Availability

Azure can run LLM-backed systems with uptime expectations comparable to Tier-1 services, but the path runs through quotas, networking, and observability—not prompts and SDKs. This is an evaluation of what actually holds up under load, failure, and budget scrutiny.

6 min
Read more
EN
AILLM

From PoC to Production: What Breaks When You Ship LLM-Based Systems

The gap between a Proof of Concept and a production system is not primarily about model quality. It's about everything that happens after the first successful response. LLM systems don't fail because the prompt is wrong. They fail because production is hostile to assumptions.

6 min
Read more
Blog | Personal Website