Generating PDF from JavaScript the nice way

Jan Silbersiepe

15-11-2024

backend, javascript, pdf, playwright, react

Since the beginning of time, one of the most dreaded issues in my professional career has been coming up with a solution for creating PDF reports from dynamic, user-generated content that always look great. Anyone who has ever encountered this topic can probably relate that it is not an easy task - especially if your PDFs go beyond an invoice template with a black-outlined table on a white background. No matter which library you choose, you will always need to make sacrifices in the design (i.e., gradients, background images, advanced styling), the reliability (might work on one browser but not another), or the layout (unexpected page breaks, margins, and other artifacts).

Me, thinking about creating printer-friendly markup via GIPHY

Last year, our team eventually had to face this issue as product was pushing to finally fix our reports and make them “look like consulting slides,” as users seemed not to be satisfied anymore with black-outlined tables on white backgrounds. Understandable, because the current state at that time suffered from all of the issues mentioned above. After some research and trial and error, we came up with a solution that seems to fulfill all the criteria and is different from all the approaches I have seen while doing research on the topic.

To get an idea of what is a very common way to create PDF documents in JavaScript, let’s look at jsPDF, one of the most popular PDF libraries on npm. Creating a document follows a simple incremental syntax:

1
2
3

const doc = new jsPDF()
doc.text('Hello world!', 10, 10)
doc.save('a4.pdf')

New elements can be appended to the doc object, and this way your PDF is constructed. This works well in really simple scenarios, but as soon as you want to add more styling and background images, it quickly reaches its limits. Also, one big drawback is that the team has to learn a new expression language to build PDFs. There has to be an easier way! This example shows jsPDF, but most of the PDF libraries work in a similar way.

Going deeper on the thought that we would like to avoid the need to learn a whole new expression language, I asked myself why we can’t just build an application with the tools we know (meaning HTML + React, Angular, Vue, Svelte, or whatever) and then just print it as a PDF. Apparently, many others thought the same thing already, and there are plenty of libraries that try to accomplish this. html2pdf for example uses html2canvas which in the end takes the html, tries to translate it to a canvas element and then takes a snapshot of the canvas to print to pdf. It sounds interesting, but it still suffers from a few drawbacks: The html2canvas engine does not support all styling options that we need in our templates. But even more important: It renders the PDF fully client-side, which makes it unpredictable, as different browsers have different canvas interpreters.

The general idea, though, sounded great to me: 1. Take HTML 2. Translate it into something PDF-friendly 3. Convert to PDF. Our final solution follows these steps, but we made some important changes that make the process not only more flexible but also much more reliable.

It seemed to me that one of the main issues with most of the existing approaches was the dependency on HTML. However, there is a really similar sibling of HTML in the big family of XML supersets, that is much more printer-friendly. Anyone who has worked with the open-source vector graphics software Inkscape knows it uses SVG as its default format to save all vector graphics. And as Inkscape also allows for easy conversion of their SVG files to PDF, I thought to myself that it might be easier to write the templates of my PDF reports in SVG instead of HTML. The big upside: As SVG is an implementation of XML, all of our favorite frontend frameworks are capable of outputting valid SVG instead of HTML. In other words: We can write a React app that generates an SVG version of our dynamic content that should be much more friendly to further conversion to PDF than it would be from HTML.

Great! So now we have our SVG, but how do we get to the PDF? During the discussion about this issue, the idea came up to use the Inkscape CLI, as it also gave us the idea to look into SVG in the first place. However, when we tried it, against our expectations, it also resulted in some weird artifacts like flipped bitmap images and misaligned text, while Chromium would render the SVG exactly the way we wanted it. If we could just make all users use Chromium-based browsers and tell them how to use the print to PDF function, we would be finished here. However, unfortunately, this is not realistic. So we needed to find a way to automate that in a controlled environment. And what type of software is great for automating manual tasks usually done by a user in the browser? An E2E test framework! In our case, we recently switched to Playwright, which also seemed to be a perfect candidate here because it uses actual browser executables to run the test automation. Conveniently, Playwright also comes with a PDF export API that allows control over the headless browser’s print to PDF function.

So in the end our pipeline looks like this:

Compile the React app to a single bundled JS + CSS file
Pass the bundle to an HTML page that can be navigated to using Playwright + Chromium
Export headless rendered page to PDF using Playwright’s PDF API

The result is a 100% deterministic process that can make use of all SVG + CSS features that are supported by Chromium. Here is what the final script to generate the PDF looks like:

import { readFileSync, readdirSync } from 'fs'
import { chromium } from 'playwright'

async function createPDF(params: MyParams) {
  const script = readFileSync(`./build/static/js/main.<hash>.js`) // reference to compiled react app
  const css = readFileSync(`./build/static/css/main.<hash>.css`) // reference to compiled react css

  // inline script and styles into a html boilerplate
  const staticHtml = `
  <!DOCTYPE html>
  <html lang="en">
    <head>
      <meta charset="utf-8" />
      <meta name="viewport" content="width=device-width,initial-scale=1" />
      <style>${css}</style>
      <script>
        const params = ${JSON.stringify(params)}
      </script>
      </head>
    <body>
      <div id="root"></div>
    </body>
    <script>${script}</script>
  </html>`

  // launch chromium and create a page
  const browser = await chromium.launch()
  const page = await browser.newPage()

  // set the content to be the react app
  await page.setContent(staticHtml)

  // save the page to a file using playwrights `.pdf` function
  await page.pdf({
    path: 'renderedwithplaywright.pdf',
    margin: { top: '0px', left: '0px', right: '0px', bottom: '0px' },
    format: 'A4',
  })

  await browser.close()
}

This might for sure be a rather unconventional approach to the whole topic of PDF generation. However, for us, it proved to be a reliable solution that has been serving its purpose in production for over a year now. I was originally planning to publish this post already at the beginning of the year when this was just deployed for a few weeks.

Today, I can say that the implementation, especially including the familiar React-based templating, really increases the flexibility of what we can do in terms of layout and styling. In addition, changes to the template are more convenient as the devs don’t need to refresh their knowledge of a specific PDF composition library every time a change is requested to one of the PDF templates.

Ultimately, I would say this is a great solution for when a PDF needs to be more than a simple minimally styled document. If only that is needed (i.e. an order confirmation, receipt, invoice, etc.), setting this up is overkill. If the goal, however, is to create a report with lots of images and styled content, the Playwright-based rendering is the way to go.