A while back I wrote about syncing remote content into markdown files with markdown-magic. The short version: you drop an HTML comment into a .md file, and the library replaces the block underneath it with generated content.
The comment carries options. That little detail is where all the pain lives.
<!-- doc-gen code src="./examples/1_simple.js" lines="1-20" -->
content gets replaced here
<!-- end-doc-gen -->src and lines are options. Easy enough when it's two strings. But I wanted authors to write options the way they already write React props, including arrays, objects, and nested config:
<!-- doc-gen table columns=['name', 'price'] style={{ align: 'left' }} -->The problem isn't generating the table. The problem is turning that string between the comment markers into a real JavaScript object, written by a human, in a text file, with no editor, no autocomplete, and no linter telling them they forgot a quote.
That parser became its own package: oparser.
The obvious move is to make people write JSON and call it a day.
<!-- doc-gen table {"columns": ["name", "price"], "style": {"align": "left"}} -->JSON.parse is strict on purpose, and that strictness is exactly wrong for hand-authored config. JSON forces double quotes on every key and string, forbids trailing commas, and explodes on a single missing brace. Nobody hand-writes JSON correctly inside an HTML comment on the first try.
What people actually type looks more like this:
columns=[name, price]
style={{ align: left }}
enabled
title=Hello worldNo quotes on the keys. No quotes on obvious strings. A bare enabled flag with no value. A value with a space in it. Every one of those is a JSON.parse crash, and every one of those is something a reasonable person would expect to just work.
So the parser has to be forgiving. It has to take loose, human input and do the obvious thing.
oparser exposes a parse() function that turns a loose string into an object:
const { parse } = require('oparser')
parse(`
width={999}
enabled=TRUE
title="Hello world"
tags=[one, "two, too", "three]still text"]
style={{ color: 'red', label: "b{c}" }}
`)
// {
// width: 999,
// enabled: true,
// title: 'Hello world',
// tags: ['one', 'two, too', 'three]still text'],
// style: { color: 'red', label: 'b{c}' }
// }Look at what it had to figure out without being told:
width={999} is a number, not the string "999".enabled=TRUE is a boolean, case-insensitive.title="Hello world" keeps the space because it's quoted.tags=[...] is an array, and the comma inside "two, too" is not a delimiter because it's inside quotes."three]still text" contains a ] that is not the end of the array.style={{ ... }} is a nested object, and "b{c}" has a { that is not a new object.The hard part of parsing loose config is knowing when a special character is structural and when it's just a character inside a string. Quotes are the signal, and most naive parsers split on delimiters before they account for quoting, which is why commas-inside-strings break them.
A few more cases that show the "do the obvious thing" philosophy.
Bare keys become true, so flags work like JSX boolean props:
parse(`disabled isLoading`)
// { disabled: true, isLoading: true }Unquoted URLs survive intact, brackets, query strings, hashes and all:
parse(`url=https://example.com?ids[]=1&ids[]=2#section`)
// { url: 'https://example.com?ids[]=1&ids[]=2#section' }Comments outside quotes get stripped, but # and // inside a string are preserved:
parse(`
width=100 // ignored
height=200 # ignored
label="keep # and // inside quotes"
`)
// { width: 100, height: 200, label: 'keep # and // inside quotes' }And because the original goal was React-like props, JSX and arrow functions inside braces are kept as literal strings instead of being mangled:
parse(`elem={<Component type="text" />}`)
// { elem: '<Component type="text" />' }
parse(`onClick={() => console.log('hi')}`)
// { onClick: "() => console.log('hi')" }The forgiving behavior isn't magic, it's mostly about respecting quotes before doing anything else. The pipeline looks roughly like this:
[, {, and quote boundaries so the scanner knows when an array or object actually closes.null, and parse loose object/array syntax.Step 2 is the whole trick. By neutralizing the contents of quoted strings before tokenizing, a comma inside "two, too" simply isn't visible as a comma when the array gets split. The structure-detection logic only ever sees real structural characters. Then the placeholders get swapped back at the end so the values come out exactly as written.
That ordering is the difference between a parser that handles tags=["a, b", "c"] and one that quietly returns ['"a', 'b"', '"c"'].
Inside markdown-magic, the block parser pulls the raw option string out of the comment and hands it straight to oparser:
const { parse } = require('oparser')
const paramString = params.trim()
const parsedOptions = paramString ? parse(paramString) : {}That's the entire integration. The block parser figures out where the options are (everything after the transform name, before the closing -->), and oparser figures out what they mean.
This split is why markdown-magic could move from its old colon-and-ampersand syntax to React-like props without rewriting the core. The legacy syntax looked like this:
<!-- DOCS:START (CODE:src=./path/to/file.js&lines=22-44) -->The modern syntax reads like JSX:
<!-- doc-gen code src="./path/to/file.js" lines="22-44" -->Both end up as { src: './path/to/file.js', lines: '22-44' }. The library still detects the old : / ? prefixes and routes them to a legacy parser for backwards compatibility, but everything new flows through oparser. The transform author just receives a clean options object and never thinks about parsing at all.
The "loose key-value text to object" problem shows up far more often than you'd expect once you go looking for it:
Anywhere a human types config into a text field and a strict parser would reject it for a missing quote, a forgiving parser does the obvious thing instead.
The lesson I keep relearning: the format your tool accepts and the format it works with internally don't have to match. Internally markdown-magic wants a clean options object. Externally I want authors to scribble JSX-ish props and have it just work. A small forgiving parser in between is what buys you both.
oparser on GitHub — npm install oparserIf you're building anything that takes hand-written config, try giving people the forgiving version first. The strict parser can always run after.