PullMD Hilfe

PullMD Help

Diese Seite erklärt, was PullMD macht, wie der Cache funktioniert und wie du den Service in Claude Code oder anderen KI-Agenten einrichtest.

This page explains what PullMD does, how the cache works, and how to wire it into Claude Code or any other AI agent.

Neu in v3New in v3v3

In v3 ist PullMD von einem reinen Web-Reader zu einem allgemeinen Alles-zu-Markdown-Dienst gewachsen. Alles über die normale Web-Extraktion hinaus ist optional und lässt sich einzeln aktivieren:

In v3, PullMD grew from a web-page reader into a general anything-to-Markdown service. Everything beyond plain web extraction is optional and enabled individually:

Sauberer Body - der Markdown-Body ist standardmäßig nur noch # Titel + Inhalt; Quell-URL, Datum und Metadaten liegen im YAML-Frontmatter, statt im Text wiederholt zu werden (spart Tokens).
Dokumente - PDF, Word, PowerPoint, Excel, EPUB u. a. per URL oder Upload (Drag-and-drop in der Web-Oberfläche).
PDF-Tabellen (OCR) - optionaler, hochwertiger PDF-Pfad mit ?pdf=ocr für saubere Tabellen.
Bilder & Audio - optionale Bildbeschreibung und Audio-Transkription.
YouTube - Titel, Beschreibung und Transkript mit klickbaren Timecodes, ohne API-Key.

Clean body - the Markdown body is now just # Title + content; the source URL, date and metadata live in the YAML frontmatter instead of being repeated in the text (saves tokens).
Documents - PDF, Word, PowerPoint, Excel, EPUB and more, by URL or upload (drag-and-drop in the web UI).
PDF tables (OCR) - an optional high-quality PDF path with ?pdf=ocr for clean tables.
Images & audio - optional image captioning and audio transcription.
YouTube - title, description and transcript with clickable timecodes, no API key.

Was macht PullMD?What PullMD does/api

PullMD ruft eine beliebige URL ab und liefert sie als sauberes Markdown zurück. Je nach Quelle wird der passende Extraktions-Pfad gewählt:

PullMD fetches any URL and returns it as clean Markdown. It picks the right extraction path depending on the source:

Reddit — eigener Pipeline für Posts, Kommentare, Subreddit-Listings (mit Tiefe und Limit konfigurierbar).
Hacker News — eigene Pipeline für Items, Kommentar-Permalinks und Listings (Front/Newest/Ask/Show/Jobs); verschachtelte Kommentare mit konfigurierbarer Tiefe.
Cloudflare-Markdown — wenn die Quelle Accept: text/markdown nativ unterstützt, wird das direkt durchgereicht (sauberster Output).
Readability + Turndown — Fallback für alles andere: Mozilla Readability extrahiert den Hauptinhalt, Turndown wandelt nach Markdown.

Reddit — dedicated pipeline for posts, comments, subreddit listings (configurable depth and limit).
Hacker News — dedicated pipeline for items, comment permalinks, and listings (front/newest/ask/show/jobs); nested comments with configurable depth.
Cloudflare-Markdown — when the source supports Accept: text/markdown natively, that's passed through directly (cleanest output).
Readability + Turndown — fallback for everything else: Mozilla Readability pulls the main content, Turndown converts to Markdown.

Welcher Pfad genommen wurde, sieht man im Response-Header X-Source und in der History neben jedem Eintrag.

Which path was used shows up in the response header X-Source and next to each entry in the history.

Cache & TTLCache & TTLSQLite

Jeder Pull wird in einer SQLite-Datenbank gespeichert. Zwei Zeitspannen sind wichtig:

Every pull is stored in a SQLite database. Two timeouts matter:

Was	Wert	Wann zurücksetzen?	What	Value	When does it reset?
Re-Fetch von der Quelle	`1 Stunde`	Bei jedem erfolgreichen Pull derselben URL — egal ob über `/api?url=…` oder über `/s/:id`.	Re-fetch from source	`1 hour`	On every successful pull of the same URL — regardless of whether it came through `/api?url=…` or `/s/:id`.
Share-Link-Lebensdauer	`90 Tage`	Bei jedem Re-Fetch (=> Cache schreibt). Auch `/s/:id`-Aufrufe verlängern, da sie nach 1 h einen Re-Fetch auslösen.	Share link lifetime	`90 days`	On every re-fetch (= cache write). `/s/:id` requests extend it too, since they trigger a re-fetch after 1h.

So funktioniert `/s/:id`

How `/s/:id` behaves

Cache < 1 h alt → liefert die gespeicherte Version sofort.
Cache ≥ 1 h alt → ruft die Quelle frisch ab, schreibt zurück, liefert das neue Markdown.
Quelle nicht erreichbar (404, Netzwerk, …) → liefert den letzten gespeicherten Stand als Snapshot. Keine Stille-Failure-Lücke.
Cache > 90 Tage ohne Re-Fetch → Eintrag wird beim nächsten Schreibvorgang gelöscht; Share-Link gibt 404 zurück.

Cache < 1 h old → returns the stored version immediately.
Cache ≥ 1 h old → re-fetches the source, writes back, serves the new markdown.
Source unreachable (404, network, …) → falls back to the last stored snapshot. No silent failure.
Cache > 90 days without a re-fetch → entry is pruned on next write; share link returns 404.

Tipp: Subreddit als Live-Feed

Tip: subreddit as a live feed

Pull einmal einen Subreddit-Listing-Link, merke dir die Share-ID — und ruf danach nur noch /s/:id auf. Nach jeder Stunde löst der nächste Aufruf einen frischen Fetch aus, die Share-ID bleibt stabil, der Inhalt aktualisiert sich. Praktisch für KI-Agenten, die einen festen Endpoint mit regelmäßig aktuellem Inhalt brauchen.

Pull a subreddit-listing URL once, remember the share ID — and from then on only call /s/:id. After each hour the next request triggers a fresh fetch, the share ID stays stable, and the content updates. Handy for AI agents that need a fixed endpoint with regularly refreshed content.

In KI-Agenten einrichtenSet up in AI agentssetup

Option 1 — Universal: Prompt einfügen

Option 1 — Universal: paste a prompt

Funktioniert in jedem Chat-Agent (ChatGPT, Claude.ai, Gemini, Perplexity, …). Kopiere den Block, füge ihn als System- oder Custom-Instruction ein:

Works in any chat-style agent (ChatGPT, Claude.ai, Gemini, Perplexity, …). Copy the block, paste it as a system or custom instruction:

prompt · drop-in

When you need to read a web page, fetch via PullMD instead of raw HTML:

  GET https://os-2-pullmd.100223.xyz/api?url=<URL>

Returns clean Markdown (text/markdown). Optional query params:

  comments=false        skip Reddit comments
  comment_depth=N       comment nesting depth (default 3)
  frontmatter=true      prepend YAML metadata block
  format=text           strip Markdown, return plain text
  nocache=true          bypass the 1h cache and refetch
  lang=de|en            language for the comments section header

Response headers worth checking:
  X-Source       reddit | cloudflare | readability | trafilatura |
                 playwright | markitdown | youtube | pdf-ocr | ...
  X-Quality      0.0-1.0 extraction confidence
  X-Share-Id     8-hex permalink, openable as /s/<id>

Reddit URLs are auto-detected (incl. redd.it short links and /s/ shares).
Use this whenever you would otherwise fetch raw HTML — the markdown is
much cleaner and saves significant context window space.

Option 2 — Claude Code Skill

Option 2 — Claude Code skill

Für Claude Code gibt es eine fertige Skill, die WebFetch automatisch durch PullMD ersetzt (mit Fallback). Lade sie als Zip und entpacke nach ~/.claude/skills/:

For Claude Code there's a ready-made skill that automatically routes WebFetch through PullMD (with fallback). Download the zip and unpack into ~/.claude/skills/:

pullmd.zip herunterladenDownload pullmd.zip

install · shell

curl -O https://os-2-pullmd.100223.xyz/pullmd.zip
mkdir -p ~/.claude/skills
unzip pullmd.zip -d ~/.claude/skills/
# Restart Claude Code; the skill activates on web-reading requests.

Upgrade von vor v3: Die Skill hieß früher web-reader. Das neue Zip ersetzt eine bestehende Installation nicht — erst die alte entfernen (rm -rf ~/.claude/skills/web-reader), sonst sind beide Skills parallel aktiv.

Upgrading from pre-v3: the skill used to be called web-reader. The new zip does not replace an existing install — remove the old one first (rm -rf ~/.claude/skills/web-reader), otherwise both skills stay active side by side.

Option 3 — MCP-Server (remote)

Option 3 — MCP server (remote)

PullMD läuft als remote MCP-Server unter https://os-2-pullmd.100223.xyz/mcp (Streamable-HTTP-Transport, stateless). Drei Tools: read_url, get_share, list_recent. Server-seitige Updates erreichen automatisch alle Clients — keine lokale Installation nötig.

PullMD runs as a remote MCP server at https://os-2-pullmd.100223.xyz/mcp (Streamable-HTTP transport, stateless). Three tools: read_url, get_share, list_recent. Server-side updates reach every client automatically — no local install needed.

Claude Code — Prompt einfügen, Claude installiert es selbst:

Claude Code — paste this prompt and Claude will install it for you:

prompt · claude code

Installiere den PullMD MCP-Server in Claude Code (User-Scope):
- Name: pullmd
- Transport: http
- URL: https://os-2-pullmd.100223.xyz/mcp

Nutze: claude mcp add --transport http pullmd https://os-2-pullmd.100223.xyz/mcp
Danach: claude mcp list zur Verifikation.

Install the PullMD MCP server in Claude Code (user scope):
- Name: pullmd
- Transport: http
- URL: https://os-2-pullmd.100223.xyz/mcp

Run: claude mcp add --transport http pullmd https://os-2-pullmd.100223.xyz/mcp
Then: claude mcp list to verify.

Claude Code — direkt im Terminal:

Claude Code — directly in the terminal:

claude code · cli

claude mcp add --transport http pullmd https://os-2-pullmd.100223.xyz/mcp

Claude Desktop / Cursor / andere — JSON-Konfig:

Claude Desktop / Cursor / others — JSON config:

mcp config snippet

{
  "mcpServers": {
    "pullmd": {
      "type": "http",
      "url": "https://os-2-pullmd.100223.xyz/mcp"
    }
  }
}

Sobald registriert, erscheinen die drei Tools nativ im Agent — keine Prompt-Anweisungen nötig, das LLM erkennt sie über ihre Schema-Beschreibungen.

Once registered, the three tools surface natively in the agent — no prompt instructions needed, the LLM picks them up via their schema descriptions.

API-ParameterAPI parametersGET /api

Param	Default	Beschreibung	Default
`url`	—	Pflicht. Beliebige öffentliche URL.	Required. Any public URL.
`comments`	`true`	Reddit-Kommentare einschließen. `false` liefert nur den Post.	Include Reddit comments. `false` returns just the post.
`comment_depth`	`3`	Maximale Verschachtelungstiefe (1–10).	Max nesting depth (1–10).
`comment_limit`	—	Optionale Obergrenze für Top-Level-Kommentare (Reddit liefert standardmäßig ~200).	Optional cap on top-level comments (Reddit returns ~200 by default).
`frontmatter`	`false`	YAML-Frontmatter mit Metadaten voranstellen.	Prepend YAML frontmatter with metadata.
`format`	`md`	`text` = Markdown-Formatierung entfernen, Plaintext zurückgeben. `json` = strukturiert mit Metadaten.	`text` = strip Markdown, return plain text. `json` = structured with metadata.
`nocache`	`false`	1-h-Cache umgehen, immer frisch holen.	Bypass the 1-hour cache, always refetch.
`lang`	`de`	Sprache des Kommentar-Headers (`de` oder `en`).	Language for the comments header (`de` or `en`).

Lokale HTML-DateienLocal HTML filesPOST /api/html

Bereits gespeicherte Seiten ("Seite speichern unter", SingleFile-Exports) lassen sich direkt konvertieren: die .html-Datei auf die PullMD-Oberfläche ziehen (Desktop) oder den gestrichelten Hinweis unter dem URL-Feld antippen, um eine Datei zu wählen (Desktop + Mobile) - oder per API:

Already-saved pages ("Save Page As", SingleFile exports) can be converted directly: drag-and-drop the .html file onto the PullMD UI (desktop) or tap the dashed hint below the URL field to pick a file (desktop and mobile) - or via the API:

curl -s -X POST --data-binary @page.html \
  -H 'Content-Type: text/html' \
  "https://os-2-pullmd.100223.xyz/api/html?filename=page.html"

Optionale Parameter: url=… (Original-URL - aktiviert Site-Recipes und den verlinkten Header), format=json|text, frontmatter=true, extractor=readability|trafilatura. Statt ?filename= geht auch der Header X-Filename (URI-encodiert) - so bleibt der Dateiname aus Access-Logs heraus. Maximal 10 MB. Aus Datenschutzgründen landen lokale Dateien nicht im Cache - kein History-Eintrag, kein Share-Link.

Optional parameters: url=… (original URL - enables site recipes and the linked header), format=json|text, frontmatter=true, extractor=readability|trafilatura. Instead of ?filename= you can send the X-Filename header (URI-encoded) - keeping the file name out of access logs. Max 10 MB. For privacy, local files are never cached - no history entry, no share link.

FrontmatterFrontmatterYAML

Mit ?frontmatter=true wird vor dem Inhalt ein YAML-Block mit Metadaten eingefügt. Felder mit leerem Wert werden weggelassen:

With ?frontmatter=true a YAML metadata block is prepended to the content. Empty fields are omitted:

example

---
title: "Why I migrated my side-project from Postgres to SQLite"
url: https://news.ycombinator.com/item?id=42424242
source: readability
fetched: 2026-04-25T13:53:00Z
quality: 0.85
author: kentonv
published: 2026-04-24T18:42:00Z
description: "After two years on managed Postgres..."
language: en
share_id: a3f9c2
---

Basis-Felder: title, url, source, fetched, quality, author, published, modified, description, language, image, site, extractor_reason, share_id. Je nach Quelle zusätzlich: subreddit, upvotes (Reddit) · duration, views (YouTube) · image_size, audio_seconds, llm_model, llm_tokens, llm_prompt_tokens, llm_completion_tokens (Media) · pdf_pages (PDF-OCR). MCP-Antworten ergänzen share_url, cached, refreshed, age_ms. Mit PULLMD_FRONTMATTER_FIELDS lässt sich die Auswahl serverseitig einschränken.

Base fields: title, url, source, fetched, quality, author, published, modified, description, language, image, site, extractor_reason, share_id. Depending on the source, additionally: subreddit, upvotes (Reddit) · duration, views (YouTube) · image_size, audio_seconds, llm_model, llm_tokens, llm_prompt_tokens, llm_completion_tokens (media) · pdf_pages (PDF OCR). MCP responses add share_url, cached, refreshed, age_ms. PULLMD_FRONTMATTER_FIELDS can trim the selection server-side.

In der Web-App schaltet der Frontmatter-Regler die Anzeige sofort um - der YAML-Block erscheint oder verschwindet, ohne dass man erneut auf Pull drücken muss.

In the web app the Frontmatter toggle switches the view instantly - the YAML block appears or disappears with no second Pull.

Client-ErkennungClient detectionhistory

Jeder Eintrag in der History kriegt ein Client-Badge — daran sieht man, woher der Pull kam:

Every history entry gets a client badge so you can see where the pull originated:

Browser — normale Web-UI im Browser-Tab.
PWA — installierte Progressive Web App (Standalone-Mode, Android/iOS/Desktop).
Claude — User-Agent enthält „Claude" (z. B. via Skill).
API — alles andere (curl, Skript, MCP-Wrapper, …).

Browser — regular web UI in a browser tab.
PWA — installed Progressive Web App (standalone mode, Android/iOS/desktop).
Claude — User-Agent contains "Claude" (e.g. via the skill).
API — anything else (curl, scripts, MCP wrappers, …).