Thursday, June 6, 2013

PHP: HTML to PDF the better way

So, the LAMP dream application is finished, everything is pristine, you are feeling pretty good about getting paid and the client says, "ok one more thing, please?". "Sure", you say, what the heck could it be that we haven't done. And then the client says it, "it would be great to get this report emailed nightly in a PDF format.". "What!?!", the brain is starting to convulse. And rightfully so, because HTML to PDF options in PHP server side world are very limited. There is a lot of posts out there about viable options and let's go through some of them:

- DOMPDF: buggy, pain to use, slow and difficult getting things to line up. 191 issues in GitHub and I can't remember when I upgraded it and things just worked.
- html2pdf: I really liked this project but it was very heavy and maintenance dropped off around 2008.
- wkhtmltopdf: needs entire QT environment and last supported in 2011.
- framebuffers: This is my personal alternative where by a virtual framebufer is launched inside a virtual X11 environment. Then a script can launch any application and run an export to image/PDF/JPEG/whatever back to the environment. Framebuffers are incredibly powerful but unfortunately setting all this up takes some time and in a shared environment it's unlikely to happen.
- PDFLib (http://www.pdflib.com/): this looks awesome, but I don't have that kind of money.

Before I dive into a better solution I just want to say how easy this all is with .NET and Java. In Java world I've had great success with Flying Saucer. On the .NET side ABCPDF (http://www.websupergoo.com/) is a beautiful library that is accurate to a pixel.

So my ultimate solution was to jump to a Python library Pisa. I love using Python and talking to Python from PHP is a breeze. In most cases to get going you simply need to run:

easy_install pisa html5lib reportlab

In case that fails you might need python-dev library and python-setuptools. Once everything is good and running you should be able to type:

pisa --version

This will produce a lot of usage information. To try it out create an html file somewhere (/tmp/test.html) and run:

pisa - - < /tmp/test.hml > test.pdf

The dashes (-) tell Pisa to use STDIN and STDOUT respectively. One you have a PDF that looks reasonable these are the command to run from PHP:

$html_file = tempnam("/tmp", "html");
$pdf_file = tempnam("/tmp", "pdf");
$handle = fopen($html_file, "w");
fwrite($handle, $html);
fclose($handle);
echo system('pisa - - < ' . $html_file . ' > ' . $pdf_file);
$disposition = 'inline';
//$disposition = 'attachment';
header('Content-type: application/pdf');
header('Content-Disposition: ' . $disposition . '; filename="test.pdf"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($pdf_file));
header('Accept-Ranges: bytes');
@readfile($pdf_file);
unlink($pdf_file);
unlink($html_file); 
?> 

Voila, Pisa just generated a great looking PDF and returned the binary format to PHP which can either save it to a file or present it to user.

Happy PDF!
B