Developer/Products/to-markdown

to-markdown

@cognipeer/to-markdown  ·  Utilities
stable

A versatile, TypeScript-first utility for converting PDF, DOCX, HTML, Excel, CSV, and more into clean Markdown — ready for RAG pipelines and LLM context windows.

License  ·  MITTypeScript  ·  100%Audience  ·  Anyone building aFits  ·  RAG ingestion
RAG ingestionDocument normalisationTypeScript-firstBrowse recipes
Production install
$npm install @cognipeer/to-markdown

What's in the box

Anyone building a RAG pipeline or document workflow that needs to normalise heterogeneous file formats into a single, LLM-friendly representation. Each capability is opt-in — use the parts that fit, leave the rest.

Multi-format support

Converts PDF, DOCX, HTML, Excel, CSV, and other formats into structured Markdown with a single API.

Promise-based API

Simple async interface that returns Markdown strings with predictable structure for downstream parsing.

TypeScript first

Written in TypeScript with full type definitions and zero-config import in modern toolchains.

Customisable conversion

Options to control table handling, image extraction, heading depth, and chunking-friendly output.

RAG-ready output

Output shape is tuned for ingestion: clean headings, stable IDs, and minimal noise from source styling.

Modular & fast

Per-format adapters keep the bundle lean — only load the converters you actually need.

How it runs

A small, modular pipeline: detect the source format, run the right adapter, normalise the structure, and emit clean Markdown — ready to chunk and embed.

Step 1
Detect
Format sniff
Step 2
Adapter
PDF · DOCX · HTML …
Step 3
Normalise
Headings · tables
Step 4
Emit
Markdown string

Inputs

  • PDF · DOCX
  • HTML · plain text
  • Excel · CSV
  • Bring-your-own buffer

Options

  • imageMode
  • tableStrategy
  • headingDepth
  • chunkHint

Output

  • Clean Markdown
  • Stable headings
  • Per-doc metadata
  • Streaming-friendly

Quickstart

Install, configure, run. The example below is the smallest piece of code that does something useful in production.

1import { toMarkdown } from "@cognipeer/to-markdown";
2 
3const md = await toMarkdown({
4  source: "./policies/handbook.pdf",
5  headingDepth: 3,
6  tableStrategy: "preserve",
7});
8 
9console.log(md.text);  // clean Markdown
10console.log(md.meta);  // title, headings, byte count

How it compares

Against the utilities options teams most often weigh — focused on operational concerns, not feature inventories.

Capabilityto-markdownpandocunstructured.ioturndown
PDF●  native◐  partial●  native○  missing
DOCX●  native●  native●  native○  missing
HTML●  native●  native●  native●  native
Excel / CSV●  native◐  partial●  native○  missing
TypeScript●  native○  missing○  missing●  native
Open source●  native●  native◐  partial●  native

Next steps