<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Chris' Web Log]]></title><description><![CDATA[Chris' Web Log]]></description><link>https://blog.coutinho.io</link><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 15:43:47 GMT</lastBuildDate><atom:link href="https://blog.coutinho.io/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[One Person, 34 Services: How AI Tooling Changed the Economics of Running a Production-Grade Homelab]]></title><description><![CDATA[A platform engineer's honest account of using Claude Code, MCP integrations, and GitOps to manage a Kubernetes cluster that has no business being run by a single person.
The Absurd Premise
Here are the numbers. Three Proxmox bare-metal Dell Optiplex ...]]></description><link>https://blog.coutinho.io/how-ai-tooling-changed-the-economics-of-running-a-production-grade-homelab</link><guid isPermaLink="true">https://blog.coutinho.io/how-ai-tooling-changed-the-economics-of-running-a-production-grade-homelab</guid><category><![CDATA[Homelab]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[claude-code]]></category><category><![CDATA[observability]]></category><category><![CDATA[renovate]]></category><dc:creator><![CDATA[Chris Coutinho]]></dc:creator><pubDate>Mon, 09 Feb 2026 06:47:41 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/ISG-rUel0Uw/upload/5acbe54085b356bb78a80b5a79cd162d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>A platform engineer's honest account of using Claude Code, MCP integrations, and GitOps to manage a Kubernetes cluster that has no business being run by a single person.</em></p>
<h2 id="heading-the-absurd-premise">The Absurd Premise</h2>
<p>Here are the numbers. Three Proxmox bare-metal Dell Optiplex 3070 micro nodes. Nine virtual machines running MicroK8s. Twenty-five Helm charts pulling from 33 upstream chart dependencies. Eighteen ArgoCD Applications orchestrating continuous deployment. Nearly four thousand git commits spanning several years of incremental improvement. And one person keeping it all running.</p>
<p>The service list reads like a mid-size startup's infrastructure: OpenEBS for distributed block storage across nodes, dual NGINX ingress controllers separating public and internal traffic, External Secrets Operator wired to AWS Secrets Manager, Velero for cluster backup, a full observability stack (Prometheus, Grafana, Loki, Tempo, OpenTelemetry Collector and Operator, Thanos for long-term metric retention), Harbor as a private container registry, Nextcloud for file sharing and knowledge management, Argo Workflows for batch processing, Kepler for energy monitoring, Goldpinger for network health, a PostgreSQL operator managing multiple database clusters, and a handful of application workloads on top.</p>
<p>This is not a sensible setup for a single operator. The question was never whether one person could <em>build</em> all of this--homelabbers build ambitious things all the time. The question is whether one person can <em>keep it running</em> without it becoming a second full-time job. Before AI tooling entered the picture, the honest answer was: barely, and with a growing maintenance backlog. Now the answer is: yes, sustainably, with caveats worth being honest about.</p>
<h2 id="heading-the-dependency-treadmill">The Dependency Treadmill</h2>
<p>The foundation of this setup's sustainability is Renovate, a dependency update bot that watches every <code>Chart.yaml</code> and container image reference in the repository. When an upstream project releases a new Helm chart version or container image tag, Renovate opens a pull request automatically. On an average week, that means somewhere between three and ten PRs.</p>
<p>Staying current matters more than most people realize. Security patches are the obvious reason, but the subtler one is migration path preservation. Falling three minor versions behind on a Helm chart is annoying. Falling three <em>major</em> versions behind can mean days of manual migration work, breaking schema changes, and deprecated APIs. Renovate keeps the distance between your running version and the latest version as small as possible, which keeps each individual upgrade trivial.</p>
<p>But there is a failure mode baked into this automation. The happy path looks like this: Renovate opens a PR, CI runs <code>helm lint</code> and <code>helm template</code> to validate syntax, the PR is reviewed and merged, and ArgoCD picks up the new commit and syncs the changes to the cluster. The problem is that linting catches syntax errors, not behavioral changes. A chart can template perfectly and still break your monitoring, change a default that matters, or introduce an incompatibility that only manifests at runtime.</p>
<p>This is exactly what happened in early February 2026. A Renovate PR bumped the OpenTelemetry operator chart from 0.102.0 to 0.105.0. It linted clean. It templated clean. It merged, ArgoCD synced, and within minutes a <code>TargetDown</code> alert fired--the operator's metrics endpoint had gone dark.</p>
<p>The guardrails in the repository--version pins on critical services, <code>ignorePaths</code> for components like the mediaserver stack that are too fragile for automated updates--are scars from previous incidents like this one. They help, but they cannot catch everything. Renovate is a force multiplier, but it requires monitoring to back it up. Automation without observability is just faster failure.</p>
<h2 id="heading-anatomy-of-an-upstream-bug">Anatomy of an Upstream Bug</h2>
<p>This incident deserves a close look because it illustrates a failure mode that is both common and genuinely difficult to catch, with or without AI tooling.</p>
<p>The Renovate PR upgraded the OpenTelemetry operator Helm chart from version 0.102.0 to 0.105.0, which bumped the operator binary from v0.141.0 to v0.144.0. The chart templated cleanly. No schema changes, no removed fields, no deprecation warnings. ArgoCD synced the new manifests and rolled out the updated operator pod.</p>
<p>The symptom was immediate: the Prometheus <code>up</code> metric for the <code>observability-opentelemetry-operator</code> job dropped to 0, and within fifteen minutes the <code>TargetDown</code> alert fired. Prometheus could no longer scrape the operator's metrics endpoint.</p>
<p>The root cause was a transitive default change buried two levels deep. The operator binary itself--not the Helm chart, but the Go binary packaged inside the container image--introduced a <code>--metrics-secure</code> flag in v0.142.0 that <strong>defaults to</strong> <code>true</code>. This caused the manager process to serve its metrics endpoint over HTTPS using self-signed certificates, while the Helm chart's ServiceMonitor still told Prometheus to scrape over plain HTTP. Every thirty seconds, Prometheus attempted an HTTP GET, the operator responded with a TLS handshake, and the scrape failed.</p>
<p>What makes this particularly insidious is the gap between the component that changed and the component that configures it. The Helm chart's <code>values.yaml</code> schema had no field for <code>--metrics-secure</code>. The chart's own changelog made no mention of the change. You had to read the <em>operator binary's</em> release notes--a different repository, a different release cadence--to discover that a default had flipped. The upstream maintainer later confirmed in <a target="_blank" href="https://github.com/open-telemetry/opentelemetry-helm-charts/issues/2063">issue #2063</a> that "secure serving is enabled by default, despite the description in the changelog," and a fix was already in progress in <a target="_blank" href="https://github.com/open-telemetry/opentelemetry-helm-charts/pull/2004">PR #2004</a>.</p>
<p>This is where open source shows its strength--and where it demands responsible stewardship. The upstream issue existed because the project is open and transparent; anyone can read the discussion, confirm the root cause, and verify the fix. But that openness is not free. Maintainers of projects like the OpenTelemetry Helm charts are fielding issues from thousands of downstream consumers, most of whom are running configurations the maintainers have never seen. The least a downstream operator can do is file well-researched issues: include the exact versions, the specific behavior change, the logs that prove the failure mechanism, and ideally a link to the commit that introduced it. AI tooling makes this kind of thorough issue reporting much easier--the same investigation that finds your fix also produces the evidence an upstream maintainer needs to confirm and prioritize the bug. Dumping a vague "it broke after upgrading" on an open-source issue tracker is not contributing to the ecosystem; it is adding to the maintainer's workload. The power of open source requires that its users invest in the quality of their participation.</p>
<p>The fix itself was two lines: pass <code>--metrics-secure=false</code> via the chart's <code>manager.extraArgs</code> field. The <em>investigation</em> to arrive at those two lines was the hard part.</p>
<h2 id="heading-the-investigation-step-by-step">The Investigation, Step by Step</h2>
<p>Here is how the debugging actually unfolded, using Claude Code with MCP (Model Context Protocol) integrations to Grafana, Kubernetes, and Nextcloud. MCP is a protocol that gives AI tools structured access to external systems--instead of copying and pasting from browser tabs, the AI can query Prometheus, read pod logs, and inspect Kubernetes resources directly.</p>
<p>The first step was alert discovery. Through the Grafana MCP integration, Claude Code queried the <code>ALERTS{alertstate="firing"}</code> metric to get a structured view of every firing alert in the cluster. This returned not just the OTel <code>TargetDown</code> alert but also a <code>TempoHighLiveTraces</code> alert and several known-benign alerts (the Watchdog dead man's switch, CPUThrottling on OpenEBS io-engine pods, and <code>openipmi.service</code> failures on Proxmox nodes). Having the full alert landscape immediately distinguished the new problem from background noise.</p>
<p>Next, metric correlation. A Prometheus query for <code>up{job="observability-opentelemetry-operator"}</code> over the previous 24 hours pinpointed the exact moment the metric dropped to zero, which aligned with the Renovate PR merge and subsequent ArgoCD sync. This established causation, not just correlation.</p>
<p>Then, log inspection. Through the Kubernetes MCP integration, Claude Code pulled logs from the operator manager pod and found it repeating every thirty seconds: <code>http: TLS handshake error from &lt;prometheus-ip&gt;: client sent an HTTP request to an HTTPS server</code>. This single log line identified the exact failure mechanism--a protocol mismatch between server and client.</p>
<p>Spec inspection followed. Examining the pod's container spec via Kubernetes MCP revealed the command-line arguments being passed to the operator binary, confirming that <code>--metrics-secure</code> was not explicitly set, meaning the new default of <code>true</code> was in effect. Comparing this with the ServiceMonitor resource confirmed it was configured for plain HTTP scraping.</p>
<p>With the failure mechanism understood, root cause analysis moved to the local filesystem. The upstream Helm chart repository, cloned locally, was searched for references to <code>metrics-secure</code>. The operator's release notes confirmed the flag was introduced in v0.142.0 with a default of <code>true</code>. A GitHub CLI search surfaced the upstream issue and fix PR, confirming this was a known problem with a pending resolution.</p>
<p>The fix was then implemented: two lines added to the observability chart's <code>values.yaml</code>. During the same session, the investigation of firing alerts had revealed that the <code>TempoHighLiveTraces</code> threshold was stale--set at 500 when the actual steady-state baseline had risen to approximately 500, causing the alert to fire continuously. This was corrected by raising the threshold to 800 to provide meaningful headroom. Both changes were validated with <code>helm template</code> and <code>pre-commit run</code>, committed, and pushed as a single PR.</p>
<p>After ArgoCD synced the merged PR, verification via Prometheus MCP confirmed the <code>up</code> metric had returned to 1 and both alerts had cleared. The full investigation was then documented in Nextcloud as a searchable note with the scope, root cause chain, metrics used, and follow-up actions (remove the workaround once the upstream fix is released).</p>
<p>Total wall-clock time: approximately one hour. Estimated time without AI tooling: two to four hours, primarily because the investigation would have involved switching between a browser with Grafana dashboards, a terminal with kubectl, the GitHub web UI for upstream issues, and an editor for the fix--each context switch carrying a cognitive tax. The key insight is that the savings do not come from the AI being "smarter" than the operator. They come from the AI holding context across all of these systems simultaneously, eliminating the serial context-switching that dominates incident response for a single operator.</p>
<h2 id="heading-the-prerequisites-that-make-this-work">The Prerequisites That Make This Work</h2>
<p>It would be easy to read the previous section and conclude that AI tooling is a silver bullet for infrastructure operations. It is not. The tooling is powerful, but it is powerful in the way a force multiplier is powerful: it multiplies whatever you already have, including zero.</p>
<p><strong>Observability is not optional.</strong> Claude Code can query Prometheus only if you have Prometheus. It can correlate metrics with logs only if you are collecting both. The observability stack in this cluster--Prometheus, Grafana, Loki, Tempo, OpenTelemetry Collector, Thanos--is configured across 2,352 lines of Helm values representing months of iterative tuning. Alert rules, recording rules, ServiceMonitor configurations, retention policies, and resource limits have all been hand-tuned to balance signal quality against resource cost. If none of that existed, the AI would have nothing to query and the investigation would not have been possible.</p>
<p><strong>GitOps makes AI-assisted operations safe.</strong> Every change Claude Code proposes goes through a pull request. It gets linted by pre-commit hooks, reviewed by a human, and deployed by ArgoCD. The AI cannot <code>kubectl apply</code> directly to the cluster. This is not a limitation--it is the most important safety property of the entire workflow. When an AI suggests a fix, you see the exact diff, you review it in the context of version-controlled history, and you can revert it with a single git operation. Without GitOps, AI-assisted operations would be terrifying rather than empowering.</p>
<p><strong>MCP integrations are the bridge between chat and operations.</strong> Without MCP servers for Grafana, Kubernetes, and Nextcloud, Claude Code is a conversational interface that can read and edit local files. Useful, but not transformative for operations work. The MCP integrations are what allow it to query live metrics, inspect running pods, and document findings in a persistent knowledge base. Setting these up is not trivial--the project's integration instructions alone span hundreds of lines of structured guidance covering query generation workflows, discovery patterns, and context-pollution prevention strategies.</p>
<p><strong>Institutional knowledge compounds.</strong> The Nextcloud knowledge base now contains 61 investigation notes spanning over a year of operations. Each note includes the scope, symptoms, root cause, resolution, and the exact metrics and queries used. When a new incident occurs, semantic search surfaces past investigations with similar patterns. The OTel investigation drew on the existing understanding that CPUThrottling alerts on OpenEBS io-engine pods are benign--context documented in a previous investigation. Without that documented knowledge, the alert triage step would have taken longer and risked a false lead.</p>
<p>The punchline: AI amplifies existing operational maturity. No monitoring plus AI equals faster hallucinations, not faster resolutions.</p>
<h2 id="heading-broader-patterns-from-ai-assisted-operations">Broader Patterns from AI-Assisted Operations</h2>
<p>Beyond the OTel incident, several patterns have emerged from months of using AI tooling for cluster operations.</p>
<p><strong>Resource right-sizing becomes data-driven.</strong> One of the most tedious tasks in Kubernetes operations is setting appropriate CPU and memory requests and limits. The AI can query actual resource utilization from Prometheus, compare it against the values configured in Helm charts, and suggest adjustments with specific numbers. This turns a task that requires bouncing between Grafana dashboards and YAML files into a single conversation.</p>
<p><strong>Multi-system investigations are where the value concentrates.</strong> The most impactful sessions are the ones that cross-cut multiple services and data sources. An alert in one namespace leads to a log entry in another, which correlates with a metric from a third. These investigations are where human context-switching costs are highest and where holding everything in a single conversation context pays off the most.</p>
<p><strong>Incidental discoveries are a real benefit.</strong> The Tempo alert threshold fix was not the goal of the investigation--it was discovered incidentally while triaging the full list of firing alerts. A human operator deep in a focused debugging session might have noted the stale alert and filed it away for later. Having an AI that can address both issues in the same session, with the same context, reduces the backlog of "I'll fix that later" items that accumulate in any system.</p>
<p><strong>Iteration is fast but not free.</strong> Not every fix lands on the first attempt. Under-documented upstream components, especially those with transitive dependencies or implicit defaults, sometimes require multiple rounds of investigation. The AI accelerates each round, but it does not eliminate them. Honesty about this matters: the tooling is not magic, and setting expectations accordingly prevents disillusionment.</p>
<h2 id="heading-what-the-numbers-say-and-dont-say">What the Numbers Say (and Don't Say)</h2>
<p>The claims above are qualitative. Here is what the git history and investigation records actually show.</p>
<p><strong>The throughput story.</strong> Over the past twelve months, Renovate has merged between 80 and 140 pull requests per month--roughly a thousand upstream dependency updates per year. Of those, exactly three resulted in documented incidents requiring investigation: a Loki ruler config regression in December 2025, the OpenEBS etcd subchart label change in February 2026, and the OTel operator metrics flag described above. That is a 99.7% success rate for automated updates. But the 0.3% that fail are disproportionately expensive--each one demands cross-system investigation, upstream research, and careful remediation. The automation's value is not that it eliminates incidents; it is that it compresses the surface area where incidents can hide.</p>
<p><strong>The adoption curve.</strong> Claude Code entered the workflow in late October 2025. Usage peaked immediately: 79 co-authored commits in November 2025, covering a burst of proactive work--resource right-sizing across the cluster, Thanos setup for long-term metric retention, and alert rule tuning. By January 2026, that had settled to roughly ten co-authored commits per month, each targeting a specific incident or operational task. The pattern is not "AI does everything now." It is "AI enabled a one-time paydown of accumulated operational debt, and now assists with the incidents that remain."</p>
<p><strong>The honest MTTR picture.</strong> It would be satisfying to report a clean before-and-after comparison of mean time to resolution. The data does not support one. Structured investigation notes did not exist before AI tooling was introduced, so there is no documented baseline to compare against. What exists is the resolution timeline for the three Renovate-triggered incidents: the Loki regression was caught and fixed same-day, the OTel incident was investigated in roughly an hour, and the OpenEBS upgrade took two days to detect and four hours to resolve once investigated--requiring significant manual intervention that AI could not shortcut. Three data points do not make a trend. MTTR depends on the nature of the failure, not just the tooling available to investigate it.</p>
<p><strong>The documentation dividend.</strong> The most measurable change is not speed--it is the 61 investigation notes that now exist in the knowledge base. Each preserves the exact queries, timestamps, and reasoning chain from an investigation session. This institutional memory is what makes future MTTR improvements possible: when the OTel <code>TargetDown</code> alert fir</p>
<p>ed, the investigation started by triaging it against known-benign alerts like OpenEBS CPUThrottling--context documented months earlier. The next time an upstream dependency breaks something, the investigation starts with context instead of from scratch. The compounding value of this documentation is harder to quantify than a response time metric, but it is arguably more important.</p>
<h2 id="heading-the-new-economics-of-small-teams">The New Economics of Small Teams</h2>
<p>Platform engineering used to have a minimum viable team size. Below a certain threshold of services and complexity, the operational overhead was manageable by one person. Above it, you needed a team--not because the work was intellectually beyond one person, but because the context-switching and interrupt-driven nature of operations work exceeded one person's bandwidth.</p>
<p>AI tooling lowers that threshold. It does not eliminate it, but it meaningfully shifts where the line falls. A cluster with 25 Helm charts and 33 upstream dependencies generates a steady stream of updates, alerts, and subtle breakages--roughly a thousand Renovate-driven updates per year, with only a handful requiring human intervention. The sustainable operations loop now looks like this: Renovate automates dependency updates, ArgoCD automates deployment, the observability stack automates detection, and Claude Code with MCP integrations accelerates investigation and resolution. Each link in the chain reduces the human time required per incident.</p>
<p>What changed is not any single capability but the closing of a loop. Before MCP integrations, the AI could help you write Helm values but could not check whether the change actually worked. Before the observability stack, there was nothing to check against. Before GitOps, there was no safe way to let an AI propose changes. Before Renovate, staying current was itself a full-time job. Each piece existed independently; the economic shift comes from connecting them into a cycle where the output of each stage feeds the next.</p>
<p>The key word is "sustainable," not "effortless." You still need to build the observability stack. You still need to review every pull request. You still need to understand your systems deeply enough to evaluate the AI's suggestions critically. The operational maturity cannot be outsourced. But for a single operator who has already invested in that maturity, AI tooling is the difference between a homelab that slowly drowns in maintenance debt and one that stays current, well-monitored, and--against all reasonable expectations--actually production-grade.</p>
<p>The real unlock is not speed. It is continuity. Context is preserved across investigations, operational patterns are reinforced through documentation, and mean time to resolution stays bounded even as the system grows. For a team of one, that is the margin between sustainable and unsustainable.</p>
<hr />
<p><em>The upstream issue referenced in this post is tracked at</em> <a target="_blank" href="https://github.com/open-telemetry/opentelemetry-helm-charts/issues/2063"><em>open-telemetry/opentelemetry-helm-charts#2063</em></a><em>, with a fix in</em> <a target="_blank" href="https://github.com/open-telemetry/opentelemetry-helm-charts/pull/2004"><em>PR #2004</em></a><em>.</em></p>
<p><em>This blog post was written with the help of AI, and reviewed by a Human</em></p>
]]></content:encoded></item><item><title><![CDATA[Building an EU-Only AI Stack: Nextcloud MCP on Leaf.cloud]]></title><description><![CDATA[A journey through self-hosted LLMs, MCP integration challenges, and cost-effective observability

The promise is compelling: connect your personal knowledge base to AI assistants while keeping everything within EU borders. No data leaving the contine...]]></description><link>https://blog.coutinho.io/eu-only-ai-stack-nextcloud-leafcloud</link><guid isPermaLink="true">https://blog.coutinho.io/eu-only-ai-stack-nextcloud-leafcloud</guid><category><![CDATA[Nextcloud]]></category><category><![CDATA[ollama]]></category><category><![CDATA[openwebui]]></category><category><![CDATA[eu ai act]]></category><dc:creator><![CDATA[Chris Coutinho]]></dc:creator><pubDate>Mon, 02 Feb 2026 09:56:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/AZAlS3OnstM/upload/ec6b22c2c408b516bafa9f382c2675fb.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>A journey through self-hosted LLMs, MCP integration challenges, and cost-effective observability</em></p>
<hr />
<p>The promise is compelling: connect your personal knowledge base to AI assistants while keeping everything within EU borders. No data leaving the continent. Full control over your infrastructure. GDPR compliance by design.</p>
<p>We set out to build exactly this—a private AI stack running on EU-only infrastructure, integrating with Nextcloud for notes, files, and project management. Here's what we learned.</p>
<h2 id="heading-the-infrastructure">The Infrastructure</h2>
<p><a target="_blank" href="http://Leaf.cloud">Leaf.cloud</a> caught our attention as an EU-only cloud provider running managed Kubernetes via Gardener. They offer a two-week free tier for evaluation, which gave us time to properly test GPU workloads without upfront commitment.</p>
<p>Our test cluster:</p>
<ul>
<li><p><strong>2 worker nodes</strong> running <code>eg1.v100x1.2xlarge</code></p>
</li>
<li><p><strong>8 vCPU, 16GB RAM, 1x Nvidia V100 GPU (16GB VRAM)</strong> per node</p>
</li>
<li><p>Managed Kubernetes with automatic updates and built-in DNS/TLS via Gardener</p>
</li>
</ul>
<p>The pricing is competitive for GPU instances:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Instance</td><td>GPU</td><td>$/hr</td><td>$/month</td></tr>
</thead>
<tbody>
<tr>
<td>eg1.v100x1.2xlarge</td><td>V100 16GB</td><td>$1.22</td><td>~$890</td></tr>
<tr>
<td>eg1.a100x1.V12-84</td><td>A100 80GB</td><td>$1.61</td><td>~$1,174</td></tr>
<tr>
<td>eg1.h100x1.V24_96</td><td>H100</td><td>$4.12</td><td>~$3,006</td></tr>
</tbody>
</table>
</div><p>For our 2-node V100 cluster: approximately $1,780/month at full utilization.</p>
<h2 id="heading-the-stack">The Stack</h2>
<p>Our architecture connects several components:</p>
<pre><code class="lang-mermaid">flowchart LR
    subgraph leafcloud["☁️ Leaf.cloud (Amsterdam)"]
        subgraph k8s["⎈ Kubernetes Cluster"]
            webui["Open WebUI&lt;br/&gt;(Chat)"] --&gt; ollama["Ollama&lt;br/&gt;(GPU LLM)"]
            webui --&gt; mcp["Nextcloud&lt;br/&gt;MCP Server"]
            alloy["Grafana Alloy&lt;br/&gt;(Telemetry)"]
        end
    end

    subgraph external["External Services"]
        nextcloud["Nextcloud&lt;br/&gt;(Hetzner)"]
        grafana["Grafana Cloud&lt;br/&gt;(Free Tier)"]
    end

    mcp --&gt; nextcloud
    alloy --&gt; grafana
</code></pre>
<p><strong>Open WebUI</strong> serves as our chat interface, chosen for its MCP client support and clean UI. It connects to <strong>Ollama</strong> running on the GPU nodes for local model inference.</p>
<p>The <strong>Nextcloud MCP Server</strong> bridges the gap between LLMs and Nextcloud APIs—exposing Deck boards, Notes, and WebDAV file operations as MCP tools that AI assistants can invoke.</p>
<h2 id="heading-the-mcp-server">The MCP Server</h2>
<p>The <a target="_blank" href="https://github.com/cbcoutinho/nextcloud-mcp-server">Nextcloud MCP Server</a> exposes several Nextcloud apps as MCP tools:</p>
<ul>
<li><p><strong>Deck</strong> - Kanban boards for project management</p>
</li>
<li><p><strong>Notes</strong> - Markdown note-taking with categories</p>
</li>
<li><p><strong>WebDAV</strong> - Full file system operations</p>
</li>
<li><p><strong>Calendar</strong> - Event management (available but not enabled in our test)</p>
</li>
</ul>
<p>Deployment is straightforward:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">nextcloud-mcp</span>
  <span class="hljs-attr">namespace:</span> <span class="hljs-string">ai</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">1</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">mcp</span>
          <span class="hljs-attr">image:</span> <span class="hljs-string">ghcr.io/cbcoutinho/nextcloud-mcp-server:latest</span>
          <span class="hljs-attr">command:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">/app/.venv/bin/nextcloud-mcp-server</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">run</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">--host</span>
            <span class="hljs-bullet">-</span> <span class="hljs-number">0.0</span><span class="hljs-number">.0</span><span class="hljs-number">.0</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">--enable-app</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">deck</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">--enable-app</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">webdav</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">--transport</span>
            <span class="hljs-bullet">-</span> <span class="hljs-string">streamable-http</span>
          <span class="hljs-attr">envFrom:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">secretRef:</span>
                <span class="hljs-attr">name:</span> <span class="hljs-string">nextcloud-mcp-secret</span>
</code></pre>
<p>The server uses streamable HTTP transport, making it accessible to MCP clients over the network.</p>
<h3 id="heading-the-context-overhead-problem">The Context Overhead Problem</h3>
<p>Here's where reality diverges from the ideal. With all tools enabled, the MCP server presents approximately <strong>20,000 tokens</strong> of tool definitions to the LLM. This includes detailed schemas for every Deck operation (create board, create card, assign labels, move cards between stacks), every WebDAV operation (list, read, write, copy, move, search), and all Notes functionality.</p>
<p>For cloud LLMs with 100k+ context windows, this overhead is negligible. For local models running on a V100 with 16GB VRAM, it's a significant constraint.</p>
<h2 id="heading-model-performance-reality">Model Performance Reality</h2>
<p>We tested a range of models through Ollama:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Model</td><td>Size</td><td>Tool Use Reliability</td></tr>
</thead>
<tbody>
<tr>
<td>mistral:7b</td><td>7B</td><td>Unreliable with 20k context overhead</td></tr>
<tr>
<td>deepseek-r1:8b</td><td>8B</td><td>Inconsistent tool selection</td></tr>
<tr>
<td>qwen2.5:14b</td><td>14B</td><td>Better but still misses tool calls</td></tr>
<tr>
<td>deepseek-r1:14b</td><td>14B</td><td>Moderate success rate</td></tr>
<tr>
<td>ministral-3:14b</td><td>14B</td><td>Similar to qwen2.5</td></tr>
<tr>
<td>gpt-oss:20b</td><td>20B</td><td>Improved but not reliable</td></tr>
<tr>
<td>deepseek-r1:32b</td><td>32B</td><td>Best local option, still imperfect</td></tr>
</tbody>
</table>
</div><p><strong>Key findings:</strong></p>
<ol>
<li><p><strong>Small models (7B-14B)</strong> struggle with the cognitive load of 60+ tool definitions. They often hallucinate tool names, miss required parameters, or fail to recognize when a tool should be used at all.</p>
</li>
<li><p><strong>Larger models (32B+)</strong> perform better but still show inconsistency. The V100's 16GB VRAM limits which models we can run effectively—an A100 80GB would significantly expand our options.</p>
</li>
<li><p><strong>Cloud LLMs (Claude, Mistral AI)</strong> handle the tool definitions without issue. They correctly identify when to use tools, select the right ones, and structure arguments properly.</p>
</li>
</ol>
<p>This isn't a criticism of local models—they're impressive for their size. But MCP's design assumes LLMs can handle large tool catalogs gracefully, which is currently only reliable with frontier models.</p>
<h2 id="heading-mcp-client-limitations">MCP Client Limitations</h2>
<p>Open WebUI supports MCP connections, but with significant limitations:</p>
<ol>
<li><p><strong>No MCP Sampling Support</strong> - The MCP specification includes a "sampling" feature that lets servers request LLM completions for sub-tasks. Open WebUI doesn't implement this, nor do other MCP clients like <code>claude-code</code> and <code>gemini-cli</code>, meaning the MCP server can only provide tools, not leverage the LLM for intelligent operations.</p>
</li>
<li><p><strong>Static Tool Listing</strong> - Tools are loaded once when the connection is established. There's no dynamic tool registration based on context or user needs.</p>
</li>
<li><p><strong>No Tool Filtering</strong> - You can't selectively enable/disable tools per conversation or assistant.</p>
</li>
</ol>
<h3 id="heading-the-app-expert-workaround">The "App Expert" Workaround</h3>
<p>To reduce context overhead and improve reliability, we found success with an <strong>App Expert pattern</strong>:</p>
<p>Instead of one assistant with all tools, create multiple specialized assistants:</p>
<ul>
<li><p><strong>Deck Expert</strong> - Only Deck tools enabled</p>
</li>
<li><p><strong>Notes Expert</strong> - Only Notes tools enabled</p>
</li>
<li><p><strong>Files Expert</strong> - Only WebDAV tools enabled</p>
</li>
</ul>
<p>Each expert has a smaller tool set (~5-8k tokens instead of 20k), which smaller models handle more reliably. Users switch between experts based on their current task.</p>
<p>This works, but it's a workaround for what should be a protocol-level feature. The MCP specification supports dynamic tool sets, but clients need to implement it.</p>
<h2 id="heading-observability-on-a-budget">Observability on a Budget</h2>
<p>Grafana Cloud's free tier provides:</p>
<ul>
<li><p><strong>1,500 samples/second</strong> ingestion rate</p>
</li>
<li><p><strong>15,000 sample burst</strong> limit</p>
</li>
<li><p>Prometheus metrics, Loki logs, and basic dashboards</p>
</li>
</ul>
<p>The challenge: a Kubernetes cluster generates thousands of metrics per scrape. Without filtering, we'd exceed the free tier immediately.</p>
<p>Our solution uses <strong>Grafana Alloy</strong> with aggressive metric filtering:</p>
<pre><code class="lang-yaml"><span class="hljs-string">prometheus.relabel</span> <span class="hljs-string">"cadvisor_filter"</span> {
  <span class="hljs-comment"># Drop all histogram buckets (huge cardinality)</span>
  <span class="hljs-string">rule</span> {
    <span class="hljs-string">source_labels</span> <span class="hljs-string">=</span> [<span class="hljs-string">"__name__"</span>]
    <span class="hljs-string">regex</span> <span class="hljs-string">=</span> <span class="hljs-string">".*_bucket"</span>
    <span class="hljs-string">action</span> <span class="hljs-string">=</span> <span class="hljs-string">"drop"</span>
  }
  <span class="hljs-comment"># Keep only essential container metrics</span>
  <span class="hljs-string">rule</span> {
    <span class="hljs-string">source_labels</span> <span class="hljs-string">=</span> [<span class="hljs-string">"__name__"</span>]
    <span class="hljs-string">regex</span> <span class="hljs-string">=</span> <span class="hljs-string">"container_(cpu_usage_seconds_total|memory_working_set_bytes|memory_usage_bytes|network_receive_bytes_total|network_transmit_bytes_total|fs_usage_bytes|fs_limit_bytes)|machine_(cpu_cores|memory_bytes)"</span>
    <span class="hljs-string">action</span> <span class="hljs-string">=</span> <span class="hljs-string">"keep"</span>
  }
  <span class="hljs-comment"># Drop kube-system containers to reduce noise</span>
  <span class="hljs-string">rule</span> {
    <span class="hljs-string">source_labels</span> <span class="hljs-string">=</span> [<span class="hljs-string">"namespace"</span>]
    <span class="hljs-string">regex</span> <span class="hljs-string">=</span> <span class="hljs-string">"kube-system"</span>
    <span class="hljs-string">action</span> <span class="hljs-string">=</span> <span class="hljs-string">"drop"</span>
  }
}
</code></pre>
<p>We apply similar filtering to node exporter, kubelet, and DCGM (GPU) metrics. The result: comprehensive visibility into what matters while staying within free tier limits.</p>
<p>Key metrics we kept:</p>
<ul>
<li><p><strong>GPU</strong>: utilization, memory usage, temperature, power consumption</p>
</li>
<li><p><strong>Containers</strong>: CPU, memory, network I/O for our workloads</p>
</li>
<li><p><strong>Nodes</strong>: CPU, memory, disk, network at the host level</p>
</li>
<li><p><strong>MCP Server</strong>: Request rates and latencies</p>
</li>
</ul>
<h3 id="heading-what-the-metrics-revealed">What the Metrics Revealed</h3>
<p>We ran the POC over five working days, with the cluster auto-hibernating overnight and over weekends. This gave us clean data on actual usage patterns versus idle overhead.</p>
<p><strong>Cluster Activity Windows:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Day</td><td>Active Hours (CET)</td><td>Duration</td></tr>
</thead>
<tbody>
<tr>
<td>Jan 26-30</td><td>08:25 - 16:55</td><td>~8.5 hrs/day</td></tr>
</tbody>
</table>
</div><p><strong>Resource Utilization Summary:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Metric</td><td>Idle</td><td>Peak (during inference)</td></tr>
</thead>
<tbody>
<tr>
<td>GPU Utilization</td><td>0%</td><td><strong>85%</strong></td></tr>
<tr>
<td>GPU Power</td><td>26-27W</td><td><strong>145.5W</strong></td></tr>
<tr>
<td>GPU Temperature</td><td>35°C</td><td><strong>69°C</strong></td></tr>
<tr>
<td>Total AI Namespace Memory</td><td>~1 GB</td><td><strong>7.1 GB</strong></td></tr>
<tr>
<td>Ollama Memory (model loaded)</td><td>15 MB</td><td><strong>5.3 GB</strong></td></tr>
</tbody>
</table>
</div><p><strong>MCP Server Performance:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Metric</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>Median latency (GET)</td><td>175-212 ms</td></tr>
<tr>
<td>Median latency (POST/PUT)</td><td>~175 ms</td></tr>
<tr>
<td>P95 latency</td><td>244-470 ms</td></tr>
<tr>
<td>Error rate</td><td><strong>0%</strong></td></tr>
</tbody>
</table>
</div><p>The V100 GPU was genuinely utilized—85% utilization during inference with power draw jumping from 27W idle to 145W. This confirms we weren't just burning GPU hours on CPU-bound work.</p>
<p><strong>The Honest Assessment:</strong></p>
<p>The infrastructure performed well. Zero errors across the five-day POC, sub-500ms API latencies, and efficient auto-hibernation. However, the observability data confirmed what we suspected from qualitative testing: <strong>smaller models struggled with MCP tool interactions due to context constraints</strong>.</p>
<p>With 20,000+ tokens of tool definitions competing for context space, models in the 7B-14B range frequently:</p>
<ul>
<li><p>Failed to recognize when tools should be invoked</p>
</li>
<li><p>Hallucinated tool names or parameters</p>
</li>
<li><p>Lost track of multi-step operations</p>
</li>
</ul>
<p>The 32B models showed improvement but still exhibited inconsistency. The V100's 16GB VRAM ceiling limits us to these smaller models—running a 70B parameter model that might handle the full tool catalog reliably would require an A100 80GB or H100.</p>
<p><strong>Future Investigation:</strong></p>
<p>A follow-up evaluation with an A100 instance ($1.61/hr vs $1.22/hr for the V100) would let us test whether larger models like <code>deepseek-r1:70b</code> or <code>qwen2.5:72b</code> can reliably handle the full MCP tool catalog. The 5x VRAM increase (80GB vs 16GB) opens up model sizes that may cross the threshold from "sometimes works" to "reliably works."</p>
<p>For now, the <strong>App Expert pattern</strong> (specialized assistants with reduced tool sets) remains the practical path for self-hosted deployments on V100-class hardware.</p>
<h2 id="heading-lessons-learned">Lessons Learned</h2>
<h3 id="heading-1-mcp-specification-vs-reality">1. MCP Specification vs. Reality</h3>
<p>The MCP specification is thoughtful and comprehensive. Client implementations are still catching up. Features like sampling, dynamic tools, and resource subscriptions exist in the spec but are rare in practice.</p>
<p><strong>Recommendation for MCP server developers</strong>: Design for the lowest common denominator. Provide fewer, more focused tools rather than comprehensive coverage. Consider offering multiple tool "profiles" that clients can select.</p>
<h3 id="heading-2-context-reduction-strategies">2. Context Reduction Strategies</h3>
<p>If you're building MCP servers:</p>
<ul>
<li><p><strong>Minimize tool descriptions</strong> - Every token counts for small models</p>
</li>
<li><p><strong>Consolidate related operations</strong> - One <code>manage_card</code> tool with an <code>action</code> parameter beats five separate tools</p>
</li>
<li><p><strong>Make parameters optional</strong> with sensible defaults</p>
</li>
<li><p><strong>Consider tool "tiers"</strong> - Basic tools always available, advanced tools on request</p>
</li>
</ul>
<h3 id="heading-3-gpu-memory-is-the-constraint">3. GPU Memory is the Constraint</h3>
<p>For local LLM deployments, GPU VRAM determines what's possible more than compute. The V100's 16GB limits us to models that fit with room for context. The A100 80GB at only $0.40/hr more would dramatically expand model options.</p>
<h3 id="heading-4-eu-infrastructure-is-viable">4. EU Infrastructure is Viable</h3>
<p><a target="_blank" href="http://Leaf.cloud">Leaf.cloud</a> proved capable for this workload. Gardener-based Kubernetes "just works"—automated TLS via cert-manager, DNS management, and straightforward GPU scheduling. The two-week free trial is genuinely useful for evaluation.</p>
<h2 id="heading-where-this-goes-next">Where This Goes Next</h2>
<p>The pieces are almost there. We need:</p>
<ol>
<li><p><strong>Better MCP client implementations</strong> - Sampling support, dynamic tools, tool filtering</p>
</li>
<li><p><strong>Smarter tool presentation</strong> - Lazy-load tool definitions based on conversation context</p>
</li>
<li><p><strong>Smaller, more capable models</strong> - The gap between 14B and 70B models is closing</p>
</li>
<li><p><strong>Quantization improvements</strong> - Running larger models in less VRAM</p>
</li>
</ol>
<p>The dream of a private AI assistant that knows your notes, manages your projects, and respects your data sovereignty is achievable today—with the right model and some workarounds. It'll be seamless within a year or two.</p>
<h2 id="heading-try-it-yourself">Try It Yourself</h2>
<p>The stack we tested:</p>
<ul>
<li><p><a target="_blank" href="http://Leaf.cloud">Leaf.cloud</a> - EU Kubernetes with GPU instances</p>
</li>
<li><p><a target="_blank" href="https://github.com/open-webui/open-webui">Open WebUI</a> - Chat interface with MCP support</p>
</li>
<li><p><a target="_blank" href="https://ollama.ai/">Ollama</a> - Local model serving</p>
</li>
<li><p><a target="_blank" href="https://github.com/cbcoutinho/nextcloud-mcp-server">Nextcloud MCP Server</a> - MCP bridge to Nextcloud</p>
</li>
<li><p><a target="_blank" href="https://grafana.com/docs/alloy/">Grafana Alloy</a> - Observability pipeline</p>
</li>
</ul>
<p>Start with cloud LLMs (Claude, Mistral) for reliable tool use, then experiment with local models once your MCP server is working. And if you're building MCP clients or servers—please prioritize the sampling specification. The ecosystem needs it.</p>
<hr />
<p><em>Questions or experiences to share? The Nextcloud MCP server is open source and welcomes contributions.</em></p>
]]></content:encoded></item><item><title><![CDATA[Introducing Astrolabe: Navigate Your Data Universe in Nextcloud]]></title><description><![CDATA[Your Nextcloud instance holds years of notes, projects, recipes, contacts, and documents. But when you need to find something, you're stuck typing exact keywords and hoping for the best. Search "car repair" and miss that note titled "Vehicle maintena...]]></description><link>https://blog.coutinho.io/introducing-astrolabe-navigate-your-data-universe-in-nextcloud</link><guid isPermaLink="true">https://blog.coutinho.io/introducing-astrolabe-navigate-your-data-universe-in-nextcloud</guid><category><![CDATA[Nextcloud]]></category><category><![CDATA[semantic search]]></category><dc:creator><![CDATA[Chris Coutinho]]></dc:creator><pubDate>Fri, 30 Jan 2026 19:28:09 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/yBgC-qVCxMg/upload/a93879158581de360e2763f0c521ac2a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Your Nextcloud instance holds years of notes, projects, recipes, contacts, and documents. But when you need to find something, you're stuck typing exact keywords and hoping for the best. Search "car repair" and miss that note titled "Vehicle maintenance tips." Search "meeting agenda" and overlook the calendar event called "Team sync." Traditional keyword search demands that you remember exactly how you wrote things down.</p>
<p>What if your search could understand what you <em>mean</em>, not just what you type?</p>
<p>Meet <strong>Astrolabe</strong>—a Nextcloud app that brings AI-powered semantic search to your self-hosted cloud. Named after the ancient navigational instrument that helped travelers chart courses by the stars, Astrolabe helps you navigate your personal knowledge by mapping the semantic connections between your documents.</p>
<h2 id="heading-the-astrolabe-metaphor">The Astrolabe Metaphor</h2>
<p>The astrolabe was one of humanity's most elegant scientific instruments—an analog computer for solving problems related to time and the position of celestial bodies. Its theoretical foundation traces back to <strong>Hipparchus of Nicaea</strong> (c. 190–120 BCE), who discovered the stereographic projection that allows a three-dimensional celestial sphere to be represented on a flat surface. Later Greek scholars like <strong>Theon of Alexandria</strong> and his daughter <strong>Hypatia</strong> refined it into a practical instrument, and during the Islamic Golden Age, astronomers in Baghdad, Damascus, and Cordoba perfected its design and applications.</p>
<p>For nearly two millennia, astrolabes served astronomers, navigators, scholars, and religious officials across the Greek, Byzantine, Islamic, and medieval European worlds. These instruments allowed users to determine time, find celestial positions, calculate daylight hours, identify constellations, and even determine the direction of Mecca for prayer—all without complex calculations. The astrolabe made the vast complexity of the heavens understandable and navigable.</p>
<p><strong>Astrolabe</strong> (the app) does the same for your data. Every document, note, and calendar event becomes a point of light in your personal data universe. The app maps their semantic relationships—their meaning, not just their words—and suddenly the connections become visible. Documents cluster by topic, related ideas sit nearby, and you can navigate this landscape as naturally as medieval scholars once read the stars. Where the original astrolabe projected the celestial sphere onto brass, this one projects your knowledge into explorable semantic space.</p>
<h2 id="heading-semantic-search-find-meaning-not-just-keywords">Semantic Search: Find Meaning, Not Just Keywords</h2>
<p>The core feature of Astrolabe is semantic search. Instead of matching exact keywords, it understands the concepts in your query and finds related content.</p>
<p><strong>What this looks like in practice:</strong></p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>You Search For</td><td>Traditional Search Finds</td><td>Astrolabe Also Finds</td></tr>
</thead>
<tbody>
<tr>
<td>"car repair"</td><td>Documents containing "car repair"</td><td>Notes about "vehicle maintenance," "fixing the truck"</td></tr>
<tr>
<td>"team planning"</td><td>Documents with "team planning"</td><td>Calendar events titled "Q2 kickoff," Deck cards about "project roadmap"</td></tr>
<tr>
<td>"pasta recipes"</td><td>Documents with "pasta recipes"</td><td>Notes about "Italian cooking," "homemade noodles," "carbonara tips"</td></tr>
</tbody>
</table>
</div><p>This works across multiple Nextcloud apps: Notes, Files (including PDFs with OCR), Deck cards, Calendar events, Contacts, and News/RSS items. One search bar, all your content, understood by meaning.</p>
<h3 id="heading-hybrid-search-best-of-both-worlds">Hybrid Search: Best of Both Worlds</h3>
<p>Sometimes you want exact matches ("PROJ-2024-001"), sometimes you want semantic understanding ("that project from last year about authentication"). Astrolabe's hybrid search combines both approaches:</p>
<ul>
<li><p><strong>Semantic search</strong> uses embeddings to find conceptually related content</p>
</li>
<li><p><strong>BM25 keyword search</strong> finds exact matches and important terms</p>
</li>
<li><p><strong>Reciprocal Rank Fusion (RRF)</strong> intelligently merges the results</p>
</li>
</ul>
<p>You can adjust the balance or switch modes entirely depending on your needs.</p>
<p><img src="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/third_party/astrolabe/screenshots/01-unified-search-astrolabe.png?raw=1" alt="Unified Search Integration" /></p>
<p><em>Astrolabe results appear alongside traditional search in Nextcloud's unified search bar</em></p>
<h2 id="heading-visualize-your-data-universe">Visualize Your Data Universe</h2>
<p>Beyond search, Astrolabe includes an interactive 3D visualization that shows your documents positioned in semantic space. Similar documents cluster together. Topics form constellations. You can rotate, zoom, and explore.</p>
<p>This isn't just eye candy—it's a practical tool for knowledge discovery:</p>
<ul>
<li><p><strong>Find forgotten connections</strong>: Search for your current project and watch as related documents from months ago light up nearby</p>
</li>
<li><p><strong>Spot topic clusters</strong>: See how your notes naturally group by subject</p>
</li>
<li><p><strong>Explore the unknown</strong>: Click on points near your search results to discover content you didn't know was related</p>
</li>
</ul>
<p>The visualization uses Principal Component Analysis (PCA) to project high-dimensional embeddings (768 dimensions) down to 3D space while preserving the relationships between documents. We implemented a lightweight, custom PCA specifically for this—no heavyweight ML libraries required.</p>
<p><img src="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/third_party/astrolabe/screenshots/02-semantic-search-with-plot.png?raw=1" alt="3D Vector Visualization" /></p>
<p><em>Documents cluster by semantic similarity. The query point (red) shows your search, and related documents cluster nearby</em></p>
<h2 id="heading-power-your-ai-agents">Power Your AI Agents</h2>
<p>Astrolabe isn't just for humans—it's for your AI assistants too.</p>
<p>The backend runs a <strong>Model Context Protocol (MCP)</strong> server, which means AI tools like Claude Desktop, Cursor, or custom agents can connect directly to your Nextcloud data. Your AI assistant can:</p>
<ul>
<li><p>Search your notes semantically ("Find everything related to the Kubernetes migration")</p>
</li>
<li><p>Retrieve document content for context</p>
</li>
<li><p>Get AI-generated answers with citations from your documents (RAG)</p>
</li>
</ul>
<p>The critical point: <strong>your data never leaves your infrastructure</strong>. The MCP server runs on your hardware. Your AI assistant sends queries, the server returns results, and you maintain full control. No documents uploaded to third-party services.</p>
<h3 id="heading-retrieval-augmented-generation-rag">Retrieval-Augmented Generation (RAG)</h3>
<p>Ask a question, and Astrolabe can retrieve relevant documents and have your AI synthesize an answer—complete with citations:</p>
<pre><code class="lang-plaintext">You: "What were the main issues we had deploying to production last month?"

Astrolabe finds: 3 relevant notes, 2 Deck cards, 1 calendar event

AI generates: "Based on your documents, there were three main issues:
1. Database migration timeout (see Note: 'Prod deploy 2024-01-15')
2. SSL certificate renewal (see Deck card: 'Ops Tasks')
3. Resource limits on the new pods (see Note: 'K8s troubleshooting')
</code></pre>
<p>This uses MCP's sampling capability—the server doesn't run its own LLM. Instead, it asks your client's AI to generate the response. You choose the model, you control the costs.</p>
<h2 id="heading-under-the-hood">Under the Hood</h2>
<p>For the technically curious, here's how Astrolabe works:</p>
<h3 id="heading-embedding-providers">Embedding Providers</h3>
<p>Astrolabe supports multiple backends for generating semantic embeddings:</p>
<ul>
<li><p><strong>Amazon Bedrock</strong>: Enterprise-grade, Titan embeddings</p>
</li>
<li><p><strong>OpenAI</strong>: Direct OpenAI API or compatible endpoints (including GitHub Models)</p>
</li>
<li><p><strong>Ollama</strong>: Self-hosted, privacy-focused, runs entirely on your hardware</p>
</li>
</ul>
<p>The system auto-detects available providers based on environment variables and falls back gracefully. Deploy Ollama on your server for full privacy, or use Bedrock for enterprise scale—same codebase, zero code changes.</p>
<h3 id="heading-background-indexing">Background Indexing</h3>
<p>Documents are indexed automatically via webhooks. When you create or edit a note, Nextcloud fires an event, and the MCP server processes it in the background. No manual sync required.</p>
<p>The indexing pipeline:</p>
<ol>
<li><p><strong>Scanner</strong> detects changes via ETags and modification timestamps</p>
</li>
<li><p><strong>Queue</strong> manages backpressure (up to 10k pending documents)</p>
</li>
<li><p><strong>Worker pool</strong> processes embeddings concurrently (configurable, default 3 workers)</p>
</li>
<li><p><strong>Qdrant</strong> stores vectors for fast similarity search</p>
</li>
</ol>
<h3 id="heading-lightweight-by-design">Lightweight by Design</h3>
<p>We deliberately avoided heavyweight dependencies:</p>
<ul>
<li><p><strong>Custom PCA</strong>: No scikit-learn, just efficient eigendecomposition</p>
</li>
<li><p><strong>In-process async</strong>: No separate message queues or worker processes—just anyio TaskGroups</p>
</li>
<li><p><strong>Plugin architecture</strong>: New apps (Notes, Calendar, etc.) are simple scanner/processor implementations</p>
</li>
</ul>
<p>This means Astrolabe runs comfortably alongside your Nextcloud on modest hardware.</p>
<pre><code class="lang-plaintext">┌──────────────┐     ┌─────────────┐     ┌─────────┐
│   Nextcloud  │────▶│ MCP Server  │────▶│ Qdrant  │
│ (Astrolabe)  │◀────│  (Python)   │◀────│ (Vectors)│
└──────────────┘     └─────────────┘     └─────────┘
       │                    │
       │ OAuth/Token        │ Embeddings
       ▼                    ▼
   ┌────────┐         ┌──────────┐
   │  User  │         │ Ollama/  │
   │Browser │         │ Bedrock  │
   └────────┘         └──────────┘
</code></pre>
<h2 id="heading-getting-started">Getting Started</h2>
<h3 id="heading-requirements">Requirements</h3>
<ul>
<li><p>Nextcloud 31 or 32</p>
</li>
<li><p>MCP server instance (Docker recommended)</p>
</li>
<li><p>Vector database (Qdrant, included in Docker setup)</p>
</li>
<li><p>Embedding provider (Ollama for self-hosted, or cloud options)</p>
</li>
</ul>
<h3 id="heading-quick-setup">Quick Setup</h3>
<ol>
<li><p><strong>Install the Astrolabe app</strong> from the Nextcloud App Store (or manually)</p>
</li>
<li><p><strong>Start the MCP server</strong> (Docker Compose makes this easy):</p>
<pre><code class="lang-bash"> docker compose up -d mcp qdrant ollama
</code></pre>
</li>
<li><p><strong>Configure the connection</strong> in your Nextcloud <code>config.php</code>:</p>
<pre><code class="lang-php"> <span class="hljs-string">'astrolabe'</span> =&gt; [
     <span class="hljs-string">'mcp_server_url'</span> =&gt; <span class="hljs-string">'http://localhost:8000'</span>,
 ],
</code></pre>
</li>
<li><p><strong>Authorize access</strong> in Settings → Personal → Astrolabe</p>
</li>
<li><p><strong>Start searching</strong> using Nextcloud's unified search bar</p>
</li>
</ol>
<p>For detailed setup instructions, including OAuth configuration and embedding provider options, see the <a target="_blank" href="https://github.com/cbcoutinho/nextcloud-mcp-server">documentation</a>.</p>
<h2 id="heading-what-can-you-index">What Can You Index?</h2>
<p>Astrolabe currently supports:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>App</td><td>What Gets Indexed</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Notes</strong></td><td>Full text and metadata</td></tr>
<tr>
<td><strong>Files</strong></td><td>PDFs (with OCR), DOCX, text files</td></tr>
<tr>
<td><strong>Deck</strong></td><td>Card titles and descriptions</td></tr>
<tr>
<td><strong>Calendar</strong></td><td>Event titles, descriptions, and details</td></tr>
<tr>
<td><strong>Contacts</strong></td><td>Names, notes, and contact information</td></tr>
<tr>
<td><strong>News</strong></td><td>RSS/Atom feed articles</td></tr>
</tbody>
</table>
</div><p>Each result shows the document type, relevance score, and a direct link to the source. For large documents, it shows which chunk (section) matched.</p>
<p><img src="https://github.com/cbcoutinho/nextcloud-mcp-server/blob/master/third_party/astrolabe/screenshots/03-chunk-viewer-open.png?raw=1" alt="Chunk Viewer" /></p>
<p><em>Click a result to see the matching chunk in context</em></p>
<h2 id="heading-who-is-this-for">Who Is This For?</h2>
<p><strong>Researchers and students</strong>: Find all notes related to your thesis topic, even when you used different terminology across semesters. Discover connections between papers you read months apart.</p>
<p><strong>Teams and organizations</strong>: Surface institutional knowledge that would otherwise stay buried. New team members can search for concepts instead of knowing exactly what to look for.</p>
<p><strong>Developers</strong>: Connect your AI coding assistant to your Nextcloud. Give it access to project notes, meeting records, and documentation without copy-pasting context.</p>
<p><strong>Personal knowledge managers</strong>: Discover forgotten documents related to your current work. Watch your knowledge base evolve over time through the visualization.</p>
<h2 id="heading-try-it-out">Try It Out</h2>
<p>Astrolabe is open source (AGPL) and ready to use. Your data universe has been waiting in the dark—it's time to turn on the lights.</p>
<ul>
<li><p><strong>Install</strong>: <a target="_blank" href="https://apps.nextcloud.com/apps/astrolabe">Nextcloud App Store</a></p>
</li>
<li><p><strong>Source</strong>: <a target="_blank" href="https://github.com/cbcoutinho/nextcloud-mcp-server">GitHub</a></p>
</li>
<li><p><strong>Documentation</strong>: <a target="_blank" href="https://github.com/cbcoutinho/nextcloud-mcp-server/tree/master/docs">Setup Guide</a></p>
</li>
<li><p><strong>Issues</strong>: <a target="_blank" href="https://github.com/cbcoutinho/nextcloud-mcp-server/issues">Report bugs or request features</a></p>
</li>
</ul>
<hr />
<p><em>Astrolabe is maintained by</em> <a target="_blank" href="https://github.com/cbcoutinho"><em>Chris Coutinho</em></a><em>. Contributions welcome.</em></p>
]]></content:encoded></item><item><title><![CDATA[Introducing the Nextcloud MCP Server]]></title><description><![CDATA[Nextcloud MCP Server
The Model Context Protocol (MCP) ecosystem continues to grow, and I’m excited to introduce the new Nextcloud MCP server! This server bridges the gap between your development environment and your Nextcloud instance, specifically t...]]></description><link>https://blog.coutinho.io/introducing-the-nextcloud-mcp-server</link><guid isPermaLink="true">https://blog.coutinho.io/introducing-the-nextcloud-mcp-server</guid><category><![CDATA[Nextcloud]]></category><category><![CDATA[mcp]]></category><category><![CDATA[mcp server]]></category><dc:creator><![CDATA[Chris Coutinho]]></dc:creator><pubDate>Mon, 05 May 2025 01:50:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/9BJRGlqoIUk/upload/82211539d22645960efeb3adbe0782a6.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-nextcloud-mcp-server">Nextcloud MCP Server</h1>
<p>The Model Context Protocol (MCP) ecosystem continues to grow, and I’m excited to introduce the new Nextcloud MCP server! This server bridges the gap between your development environment and your Nextcloud instance, specifically targeting the powerful Notes app.</p>
<h2 id="heading-what-is-mcp">What is MCP?</h2>
<p>For those unfamiliar, MCP allows AI assistants like Claude Desktop and Cline to interact with local tools and resources securely. It’s becoming the defacto standard of integrating third-party systems into Large Language Models (LLMs). MCP servers enable developers to automate tasks, access data, and integrate various services directly into their AI applications.</p>
<h2 id="heading-nextcloud-notes-integration">Nextcloud Notes Integration</h2>
<p>The Nextcloud MCP server leverages the Nextcloud Notes API to provide tools for managing your notes directly from your development environment. This means you can:</p>
<ul>
<li><p><strong>Create new notes:</strong> Quickly jot down ideas, code snippets, or task lists without leaving your editor.</p>
</li>
<li><p><strong>Read existing notes:</strong> Access information stored in your Nextcloud Notes.</p>
</li>
<li><p><strong>Update notes:</strong> Modify and add content to your notes programmatically.</p>
</li>
<li><p><strong>Delete notes:</strong> Clean up notes that are no longer needed.</p>
</li>
</ul>
<p>The Nextcloud API client enables using <code>etags</code> to <a target="_blank" href="https://github.com/nextcloud/notes/blob/main/docs/api/v1.md#preventing-lost-updates-and-conflict-solution">prevent logs updates and conflict resolution</a></p>
<h2 id="heading-example-use-cases">Example Use Cases</h2>
<p>Imagine you're debugging a complex issue. You can use the Nextcloud MCP server to:</p>
<ol>
<li><p>Create a new note titled "Debugging Issue XYZ".</p>
</li>
<li><p>Log findings, error messages, and potential solutions directly into the note content.</p>
</li>
<li><p>Access this note later from any device using the Nextcloud Notes app or web interface.</p>
</li>
</ol>
<p>Or, perhaps you're planning a new feature:</p>
<ol>
<li><p>Create a note outlining the feature requirements and tasks.</p>
</li>
<li><p>Update the note with progress as you work through the implementation.</p>
</li>
</ol>
<p>Cline can be setup to check the notes before and after completing any task to accumulate knowledge about things that would otherwise have been forgotten.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The Nextcloud MCP server offers a convenient way to integrate note-taking into your development process, keeping your thoughts and project details organized within your trusted Nextcloud environment. We believe this will be a valuable addition for developers using Nextcloud.</p>
<p>For detailed setup instructions and to contribute, please visit the <a target="_blank" href="https://github.com/cbcoutinho/nextcloud-mcp-server">GitHub repository</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Leveling up KafkaOps with Conduktor]]></title><description><![CDATA[As organizations embrace streaming data, Apache Kafka has emerged as the backbone for low-latency, high-throughput event processing. This distributed streaming platform enables companies to build real-time data pipelines and streaming applications. H...]]></description><link>https://blog.coutinho.io/leveling-up-kafkaops-with-conduktor</link><guid isPermaLink="true">https://blog.coutinho.io/leveling-up-kafkaops-with-conduktor</guid><category><![CDATA[conduktor]]></category><category><![CDATA[kafka]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Chris Coutinho]]></dc:creator><pubDate>Wed, 06 Nov 2024 14:03:30 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1730901770829/93878cee-a884-49ec-b1ff-6715d27c78f1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As organizations embrace streaming data, Apache Kafka has emerged as the backbone for low-latency, high-throughput event processing. This distributed streaming platform enables companies to build real-time data pipelines and streaming applications. However, as Kafka-based applications scale, teams often grapple with the complexities of debugging and feature development due to Kafka's multi-layered architecture. Traditional observability and debugging tools fall short in ensuring reliable performance at scale, leading to the rise of KafkaOps.</p>
<p>KafkaOps - a growing approach to operationalizing Kafka management - encompasses monitoring, triaging, and troubleshooting Kafka workflows. It's become crucial for maintaining the health and performance of Kafka-based systems as they grow in complexity and scale.</p>
<p>In this blog post we will introduce Conduktor Console along with its primary benefits. Next we’ll show you how you can use Conduktor Console locally to develop a simple <code>ksqldb</code> application. Finally, we’ll reflect on how Conduktor Console can be deployed as a shared service within an organization that is looking for a way to tame larger Kafka deployments.</p>
<h2 id="heading-why-conduktor-console">Why Conduktor Console?</h2>
<p>Conduktor Console stands out by simplifying complex tasks like schema management, message replay, and access control within an intuitive UI. Unlike alternatives such as Confluent's Control Center or Provectus' KafkaUI, Conduktor Console offers unique capabilities that make it particularly valuable for both development and platform teams.</p>
<p>Let's explore three key features that have driven successful adoption within our team and partner organizations:</p>
<h3 id="heading-1-seamless-schema-management">1. Seamless Schema Management</h3>
<p>Schema management is crucial for maintaining data consistency, but it can become complex as systems evolve. Conduktor Console excels here by providing:</p>
<ul>
<li><p>Integration with both AWS Glue and Confluent schema registries</p>
</li>
<li><p>Visual interface for managing schema compatibility</p>
</li>
<li><p>Simplified subject management across multiple registry types</p>
</li>
</ul>
<p>One notable limitation: While Conduktor supports Azure Event Hubs through its Kafka protocol compatibility, it doesn't integrate with Azure's Schema Registry. This means teams using Azure Event Hubs won't be able to leverage schema-related features within Conduktor Console.</p>
<h3 id="heading-2-powerful-message-replay-capabilities">2. Powerful Message Replay Capabilities</h3>
<p>Message replay is perhaps Conduktor's most immediately valuable feature for developers. It enables:</p>
<ul>
<li><p>Testing new code against production-like data</p>
</li>
<li><p>Reproducing specific scenarios for debugging</p>
</li>
<li><p>Simulating message flows without affecting production systems</p>
</li>
</ul>
<p>This capability is particularly powerful for development teams working on new features or investigating production issues, as it allows them to work with realistic data in a controlled environment.</p>
<h3 id="heading-3-enterprise-grade-access-control">3. Enterprise-Grade Access Control</h3>
<p>For platform teams, Conduktor's granular access control (available in the Enterprise Tier) provides:</p>
<ul>
<li><p>Streamlined permissions management</p>
</li>
<li><p>Self-service access for developers</p>
</li>
<li><p>Enhanced security through role-based controls</p>
</li>
<li><p>Single Sign-On (SSO) integration for seamless authentication</p>
</li>
</ul>
<p>While credential management remains a challenge for many IT departments, Conduktor's approach enables platform teams to focus on enabling self-service while maintaining security controls.</p>
<h2 id="heading-implementing-conduktor-console-in-your-workflow">Implementing Conduktor Console in Your Workflow</h2>
<p>Conduktor publishes a number of docker images to Docker Hub that can be deployed to local container environments. In the following example, we’re going to build on a recent version of the <em>Confluent Community</em> <code>docker-compose.yml</code> file by adding the Conduktor Console and it’s dependencies.</p>
<p>In addition to the Apache Kafka broker, Schema Registry, and ksqldb server, we’re also including a Kafka Connect container that will be used to generate example data. For this example we’ll leverage the Confluent Datagen Source Connector to generate random data for us to experiment with. For a list of predefined Datagen configurations, please see the examples from the <a target="_blank" href="https://github.com/confluentinc/kafka-connect-datagen">kafka-connect-datagen</a> GitHub repository.</p>
<h3 id="heading-local-development-setup">Local Development Setup</h3>
<p>Conduktor publishes Docker images on Docker Hub, making it easy to deploy in local container environments. Here's how to get started:</p>
<ol>
<li><p>Pull the latest Conduktor Console image from Docker Hub</p>
</li>
<li><p>Create a <code>docker-compose.yml</code> file, building on the Confluent Community version</p>
</li>
<li><p>Add Conduktor Console and its dependencies to the composition</p>
</li>
<li><p>Launch your local Kafka environment with Conduktor Console</p>
</li>
</ol>
<div class="gist-block embed-wrapper" data-gist-show-loading="false" data-id="33d8fb0b8f25befe8d84f205c42b1b07"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a href="https://gist.github.com/cbcoutinho/33d8fb0b8f25befe8d84f205c42b1b07" class="embed-card">https://gist.github.com/cbcoutinho/33d8fb0b8f25befe8d84f205c42b1b07</a></div><p> </p>
<p>After spinning up the Docker Compose environment, you can navigate to the Conduktor Console which will be available at <a target="_blank" href="http://localhost:8080">http://localhost:8080</a>. The username/password for Conduktor Console are defined in the <code>docker-compose.yml</code> file above as the environmental variables <code>CDK_ADMIN_EMAIL</code> and <code>CDK_ADMIN_PASSWORD</code> - please update or override the values when deploying this docker compose environment into an insecure environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730829192771/aea3a571-14bf-4f7a-b2e9-78799e3f3f07.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730830138400/040612e0-0670-4589-8292-f40555e2f134.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730830150772/3002cc17-6c81-49c0-828a-0cb7cfd2a7f0.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730830193590/460133ac-eade-47aa-8dce-55944c1b3dea.png" alt class="image--center mx-auto" /></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730830186037/c17d0390-6086-4dab-8b8a-50faafcbaba1.png" alt class="image--center mx-auto" /></p>
<p>After configuring the Kafka cluster, associated Schema Registry, Kafka Connect workers, as well as the <code>ksqldb</code> server, we can start executing <code>ksqldb</code> statements in Conduktor Console, and perform stream processing jobs. The following shows a push query emit the total revenue generated for various item types within a window session. To learn more about <code>ksqldb</code> windowing options, please see the documentation on <a target="_blank" href="https://docs.ksqldb.io/en/0.10.2-ksqldb/concepts/time-and-windows-in-ksqldb-queries/">Time and Windows</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1730831256794/04a6eda4-455f-4f0f-8bfb-6b082eaffeaf.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-enterprise-deployment">Enterprise Deployment</h3>
<p>For larger organizations, deploying Conduktor Console as a shared service can significantly enhance Kafka operations:</p>
<ol>
<li><p>Centralized Management: Streamline workflows and reduce latency by having a single point of control</p>
</li>
<li><p>Consistent Monitoring: Ensure uniform monitoring and management across all Kafka clusters</p>
</li>
<li><p>Enhanced Collaboration: Enable teams to work together more effectively on Kafka-based projects</p>
</li>
</ol>
<p>For AWS users, Conduktor Console's integration with services like MSK and Glue Schema Registry further enhances its functionality. Leveraging AWS IAM roles and policies facilitates secure, credential-less authentication, simplifying user management and reducing security risks.</p>
<h2 id="heading-deploying-in-a-shared-environment">Deploying in a Shared Environment</h2>
<p>Centralizing Kafka management tools within your infrastructure can greatly enhance operational efficiency. By streamlining workflows and reducing latency, teams can collaborate more effectively and resolve issues faster. This centralized approach ensures consistent monitoring and management of Kafka clusters, leading to improved performance and reliability.</p>
<p>Additional requirements such as networking and security may restrict which resources Conduktor Console could access while deployed locally. For this reason, organizations may want to host Conduktor Console on their own infrastructure. For our team’s internal use, we took extensive inspiration from the documentation on <a target="_blank" href="https://docs.conduktor.io/platform/get-started/installation/get-started/AWS/">Deployment on AWS</a> based on ECS and RDS. This can expanded upon to include SSL certificates via ACM and route requests via a custom domain registered in Route53.</p>
<p>For organizations using AWS Managed Streaming for Apache Kafka (MSK), Conduktor Console's natively supports using IAM roles to authenticate with MSK and the Glue Schema Registry. AWS IAM roles and policies facilitate secure, credential-less authentication, simplifying user management and reducing the risk of credential exposure via long-lived API keys.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Conduktor Console represents a significant leap forward in Kafka management, addressing the real-world challenges of operating Kafka at scale. By simplifying complex operations, enabling self-service capabilities, and providing robust security features, it allows teams to focus on building and maintaining robust streaming applications rather than wrestling with operational complexities.</p>
<p>Whether you're a platform team managing multiple clusters or a development team building Kafka-based applications, Conduktor Console provides the features and workflows needed to work effectively with Apache Kafka in a modern enterprise environment.</p>
<p>We encourage you to explore Conduktor Console for your Kafka operations. Start with a local setup to experience its benefits firsthand, and consider how it might enhance your team's productivity and your organization's Kafka management at scale.</p>
]]></content:encoded></item><item><title><![CDATA[Enabling OAuth2 with Python streaming applications]]></title><description><![CDATA[This post demonstrates how to utilize OAuth2 in Faust Streaming applications as an alternative authorization flow compared to using static API credentials.
Faust is a streaming library for Python. It provides stream/event processing primitives a la K...]]></description><link>https://blog.coutinho.io/enabling-oauth2-with-python-streaming-applications</link><guid isPermaLink="true">https://blog.coutinho.io/enabling-oauth2-with-python-streaming-applications</guid><category><![CDATA[Python]]></category><category><![CDATA[kafka]]></category><category><![CDATA[OAuth2]]></category><dc:creator><![CDATA[Chris Coutinho]]></dc:creator><pubDate>Mon, 10 Jul 2023 10:00:00 GMT</pubDate><content:encoded><![CDATA[<p>This post demonstrates how to utilize OAuth2 in <a target="_blank" href="https://faust-streaming.github.io/faust">Faust Streaming</a> applications as an alternative authorization flow compared to using static API credentials.</p>
<p>Faust is a streaming library for Python. It provides stream/event processing primitives <em>a la</em> Kafka Streams to process Kafka messages in Python.</p>
<p>Organizations are utilizing OAuth2 for managing federated identities across service boundaries a centralized manner. With the introduction of the <code>OAUTHBEARER</code> SASL mechanism in Kafka 2.0.0, both brokers and clients can be configured to use an external identity provider for authentication, making it easier to manage identities than span across systems.</p>
<h1 id="heading-authorization-in-apache-kafka">Authorization in Apache Kafka</h1>
<p>Apache Kafka provides an Authorization system based on Access Control Lists (ACLs). Kafka ACLs are defined in the general format of "Principal P is [Allowed/Denied] Operation O From Host H On Resource R". You can read more about the ACL structure on <a target="_blank" href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+Interface">KIP-11</a>.</p>
<p>In addition to SSL encryption, Kafka supports multiple <em>authorization mechanisms</em> via the Simple Authentication and Security Layer (SASL) to enable authentication via third-party servers. This enables Kafka clusters to utilize industry-standard identity providers for all broker and client authentication requests.</p>
<p>Confluent Cloud currently supports the following authorization mechanisms:</p>
<ul>
<li><p><code>GSSAPI</code> (Kerberos)</p>
</li>
<li><p><code>PLAIN</code> (Username/Password)</p>
</li>
<li><p><code>SCRAM-SHA</code> (Zookeeper)</p>
</li>
<li><p><code>OAUTHBEARER</code> (OAuth server)</p>
</li>
</ul>
<p>See the Confluent documentation on <a target="_blank" href="https://developer.confluent.io/learn-kafka/security/authentication-ssl-and-sasl-ssl/#enabling-sasl-ssl-for-kafka">Enabling SASL SSL for Kafka</a> for more information on the different authorization models supported.</p>
<p>The <code>OAUTHBEARER</code> security mechanism enables a Kafka cluster to utilize a third-party identity provider for authentication. In the case of Confluent Cloud, setting up an external identity provider is very straight forward, assuming you're using an OIDC-compliant identity provider (e.g. Azure AD, Okta, Keycloak). See the <a target="_blank" href="https://docs.confluent.io/cloud/current/access-management/authenticate/oauth/identity-providers.html">documentation</a> for more information</p>
<h1 id="heading-authorization-in-faust-streaming">Authorization in Faust Streaming</h1>
<p>Here's an example of a streaming application demonstrating how to connect to a Kafka broker over <code>PLAINTEXT</code>, essentially anonymous and unencrypted. Our client is assuming all messages adhere to the <code>Order</code> schema.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> faust

app = faust.App(<span class="hljs-string">"myapp"</span>, broker=<span class="hljs-string">"kafka://localhost"</span>)

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">Order</span>(<span class="hljs-params">faust.Record</span>):</span>
    account_id: str
    amount: int

<span class="hljs-meta">@app.agent(value_type=Order)</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">order</span>(<span class="hljs-params">orders</span>):</span>
    <span class="hljs-keyword">async</span> <span class="hljs-keyword">for</span> order <span class="hljs-keyword">in</span> orders:
        print(<span class="hljs-string">f"Order for <span class="hljs-subst">{order.account_id}</span>: <span class="hljs-subst">{order.amount}</span>"</span>)
</code></pre>
<p>Faust recently introduced a new authorization mechanism to support <code>OAUTHBEARER</code> authentication in <code>v1.5.0</code>, enabling Faust Streaming workers to authenticate to a Kafka broker configured with an identity provider using OAuth2 Bearer tokens.</p>
<p>Using <code>OAUTHBEARER</code> broker credentials requires that we setup at least a default SSL context, and provide an instance of <code>AbstractTokenProvider</code> to the <a target="_blank" href="http://faust.App"><code>faust.App</code></a> during configuration. The new <code>faust.OAuthCredentials</code> class supports a single <code>oauth_cb</code> attribute for an instance of <code>AbstractTokenProvider</code>, which is a class with a single asynchronous method for retrieving the bearer token. Clients are responsible for managing the entire token life cycle, such as handling token refreshes, etc.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> faust
<span class="hljs-keyword">from</span> aiokafka.conn <span class="hljs-keyword">import</span> AbstractTokenProvider
<span class="hljs-keyword">from</span> aiokafka.helpers <span class="hljs-keyword">import</span> create_ssl_context

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CustomTokenProvider</span>(<span class="hljs-params">AbstractTokenProvider</span>):</span>
    _token: Optional[AccessToken] = <span class="hljs-literal">None</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, cluster_id: str, pool_id: str</span>):</span>
        self.cluster_id = cluster_id
        self.pool_id = pool_id

    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">token</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> self.get_token()

    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_token</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">if</span> self._token <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span> <span class="hljs-keyword">or</span> (self._token.expires_on - time.time()) &lt; <span class="hljs-number">5</span> * <span class="hljs-number">60</span>:
            logger.info(<span class="hljs-string">"token is expired, refreshing..."</span>)
            self._token = <span class="hljs-keyword">await</span> self._get_token()
        <span class="hljs-keyword">else</span>:
            logger.info(<span class="hljs-string">"Current bearer token is valid"</span>)

        <span class="hljs-keyword">return</span> self._token.token

    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_get_token</span>(<span class="hljs-params">self</span>):</span>
        logger.info(<span class="hljs-string">"Generating token"</span>)
        credential = DefaultAzureCredential()
        <span class="hljs-keyword">async</span> <span class="hljs-keyword">with</span> credential <span class="hljs-keyword">as</span> cred:
            token = <span class="hljs-keyword">await</span> cred.get_token(settings.KAFKA_OAUTH2_SCOPE)
        logger.info(<span class="hljs-string">"Token expires: %s"</span>, dt.datetime.fromtimestamp(token.expires_on))
        <span class="hljs-keyword">return</span> token

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">extensions</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"logicalCluster"</span>: self.cluster_id,
            <span class="hljs-string">"identityPoolId"</span>: self.pool_id,
        }


broker_credentials = faust.OAuthCredentials(
    oauth_cb=CustomTokenProvider(    
        cluster_id=settings.KAFKA_CLUSTER_ID,
        pool_id=settings.KAFKA_POOL_ID,
    ),
    ssl_context=create_ssl_context()
)


app = faust.App(
    <span class="hljs-string">"myapp"</span>,
    broker=KAFKA_BROKER,
    broker_credentials=broker_credentials,
)

<span class="hljs-comment"># Setting up Model and Agents same as above.</span>
<span class="hljs-comment"># ...</span>
</code></pre>
]]></content:encoded></item></channel></rss>