Rendering HTML papers with hCite citations

Nick Doty

UC Berkeley, School of Information

January 1, 2015

1 Motivation

I’ve been using Pandoc for my writing lately. It’s great. I can write in Markdown (or in a variety of other formats) and then generate nice HTML (or nice PDFs, via a LaTeX template) automatically — see Kieran Healy’s awesome guide (Healy ).

As an academic, one of the most important parts of a paper is its citations. Rather than copying and pasting inline citations and keeping a references section up to date and never remembering how to format them, I just write [@Doty2013] and all the relevant information for that citation key (Doty and Mulligan ) is pulled in from my reference manager (I use Mendeley and import through my browser with Zotero). The citation is formatted (ALA, IEEE, APSA, whatever) using the Citation Style Language.

The CSL spec is great work, but it just describes formatting (parentheses and italics and punctuation, oh my). I want a citation processor that will output nicely formatted content, but also markup the semantics. You should be able to read my academic writing on the Web, in your favorite browser. But I also want my papers to interact with other papers and sites on the Web, rather than just being hosted there. Your software should be able to automatically parse my citations using simple microformats markup. Links to Web resources should have rich semantics and support notifications. This is my contribution to the IndieWeb movement: we host our own writing, rather than relying on a large publisher or social media silo.

2 Implementation

Professor MacFarlane maintains an excellent pandoc-citeproc with Pandoc, but it’s written in Haskell, which might as well be Greek. I studied functional programming once, over a decade ago, but I can hardly read this code, much less contribute much to it. Instead, I’ve forked and am contributing back to citeproc-py, maintained by Brecht Machiels. Isn’t Python a lovely language in which to write?

This page is a demo (to which I committed!). The inline citations included in the paragraphs above are generated using my citeproc-py fork and you can see the full references below, marked up with the hCite microformat.

To “compile” this page, I type make html at the command line. But the relevant command (using my citeproc-py fork as of today) looks like this:

pandoc demo.mdown -t html5 --filter

If you view source on this page, you can see the hCite markup (including a <cite> tag for the title and <time> for the date) in the References section. Or a microformats parser will show you the embedded data in this page.

3 Future Work


Healy, Kieran.Plain Text, Papers, Pandoc”.

Doty, Nick, and Deirdre K. Mulligan.Internet Multistakeholder Processes and Techno-Policy Standards: Initial Reflections on Privacy at the World Wide Web Consortium”. Journal on Telecommunications and High Technology Law 11.