Writing a Markdown(ish) parser with nom

The posts on this website are written in a style resembling Markdown. Markdown is, at its core, a lightweight wrapper around some basic HTML elements. It maps 1-to-1 with HTML paragraphs, links, images, lists, headers, and code snippets. It can also apply basic styles like italics, bold, and underline.

There are different flavors of Markdown with slightly different features and syntax. The most common flavors are probably CommonMark, a standardization that tries to stay as close as possible to the original implementation, and Github Flavored Markdown, a CommonMark extension that adds things like tables.

On this website, I wrote my own dialect that has a few features I wanted. For example, I use a lot of em dashes — they look like the dash in this sentence. In my Markdown code, I just write two adjacent single dashes to get this character.

There are some other minor quality-of-life improvements. Special syntax for linking to other posts on this website. Inserting comments for future reference into the document. That sort of thing.

There are also a few things in standard Markdown that I omitted such as numbered lists. I'm just not a fan. I can always add them if I really want them at some point.

Anyway, to accomplish this, I had to write a parser for my syntax. I figured this would be a neat opportunity to do a post with something like literate programming.

Below is most of an implementation — I shortened it for brevity, so it has even fewer features. If you wanted to extend it, you could pretty easily get it up to a full parser.

The code

If you are reading this on mobile, the width should be styled such that turning your device sideways is sufficient to not overflow. If it still overflows... sorry. This is the first code post I've made.

If you don't have JS running, you can view the raw gist on GitHub.