Satya's blog - Serving PDFs with Rails using Inkscape

Sep 15 2007 22:12 Serving PDFs with Rails using Inkscape

This HOWTO explains how, given your database, a PDF form, Ruby on Rails, plus a few other things, you can produce filled-out PDF documents. This assumes Debian/Ubuntu.

You will need a PNG to PNM convertor, an XSL transformer, Inkscape for producing and converting SVG files, and a PNG quantiser (optional).

Overview: The page layout template is produced as an SVG file in Inkscape, from an existing (blank) PDF. Inkscape is also used by Ruby on Rails (RoR) to convert the merged SVG into a PDF that is served to the user via the web.

First, get all the stuff besides Ruby on Rails and so on, stuff which isn't likely to be on your average Rails server:
apt-get install netpbm libxslt-ruby inkscape pngquant
(netpbm for pngtopnm)

Making the template

Convert the original PDF to PNM
pdf2ps | pstopnm or whatever

Open the PNM file in inkscape. This gets the right size of page.
Delete everything.
Create new layers: scan, trace, borders, boilerplate, dynamic.
Set fonts to something ghostscript understands, like the URW family.

Import the PNM into the 'scan' layer. That is, keep the scan layer selected in the layers dialog, and then Import.

Select all, Path -> trace bitmap (long, memory intensive process starts)
Move the path to the 'trace' layer.
Delete the extra nodes, i.e. text etc. (long, memory-intensive process ends)

Put any extra "drawn" stuff you need into the 'borders' layer, such as empty checkboxes or borders that don't show up properly in 'trace'

In 'boilerplate', put all the text that won't change, i.e. what's always there.

In 'dynamic' layer, put placeholders for the fields. I'd insert as text the field name from the database, such as where the last name would normally go I'd put the actual string 'last_name'. This will later be replaced by XSL markup.

Save as SVG.

The SVG file is just XML. Fix the saved SVG file as an XSL stylesheet by adding these lines in a text editor:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/claim">

instead of claim, whatever the xml root node is, and close the tags at the bottom of the file.

Use xsl tags instead of the dynamic fields elements. The text 'last_name' becomes: <xsl:select name="last_name" /> or whatever the correct XPath is.

Transforming the template in RoR

Now you write the RoR methods to convert your data to PDF. Take the data from the database, convert it to XML, and "transform" the XML using the XSL stylesheet you created in the previous section. Then you will use Inkscape to convert the transformed XML into a (PNG, then a) PDF.

Get your XML and transform it with (Ruby). The following can probably be a method of your ActiveRecord model, say, to_pdf:

xml_string=to_xml
require 'xml/libxslt'
xslt = XML::XSLT.new()
xslt.xml = "string containing the xml"
xslt.xsl = "your.xsl"
xslt.save("/tmp/something.svg")

to_xml is another method that works something like this:

def to_xml
    buffer=""
    xm=Builder::XmlMarkup.new(buffer)
    xm.instruct!
    buffer += xm.report {
       xm.employee_name("#{employee_name}")
       %w(address1 address2 name).each do |col|
           xm.tag!('employer_'+col, employer.send(col))
       end
       #and so on
    }
    return buffer
end 

Making the PDF

Back to your to_pdf method. Remember, stuff like #{id} in a string gets the current object's id:

fileprefix='/tmp/your_app_#{id}"
inkscape="/usr/bin/inkscape #{fileprefix}.svg --export-background=#ffffff --export-png=#{fileprefix}.png"
system(inkscape)
system("pngquant 8 #{fileprefix}.png")
system("pngtopnm #{fileprefix}-fs8.png | pnmtops | ps2pdf - > #{fileprefix}.pdf")
Basically just a series of system() calls.

Then read the PDF in like so:
pdf_data = IO.read("#{fileprefix}.pdf")

You should save 'data' into the database. It's raw PDF code, so use a blob or other large binary type. When you create this column in the migration, use something like: add_column :table, :pdf_data, :binary, :limit => 512.kilobyte

If your PDF table is separate from your data table (may make things easier on the db engine), you probably want something like this:

if pdf_id.nil?
    pdfo=Pdf.new
else
    pdfo=Pdf.find(pdf_id)
end
pdfo.data = IO.read("#{fileprefix}.pdf")
pdfo.save
update_attributes(:pdf_id => pdfo.id)
where your table belongs_to :pdf and the pdfs table has_one :your_data_table

I also use a 'dirty' column, so anytime the controller's pdf method is hit, it checks the dirty bit; if set, it calls to_pdf on the object. Then it sends_data. Pseudo-code:

def add
    obj.dirty=true
    obj.save
end

def pdf
    obj=Model.find(id)
    obj.to_pdf if obj.dirty
    send_data(obj.pdf.data, :type => 'application/x-pdf', 
    :filename => "something_#{obj.id}.pdf")
end

Inkscape uses Gnome's VFS, which wants the user's home directory to be writeable. The user it runs as, that is. Now your RoR is probably running as a mongrel process, as www-data or something. So, create a user for running this app, call it inkscape-pdf-makr or something. Run mongrel as that user by putting USER=inkscape-pdf-makr in /etc/mongrel/sites-enabled/your-site.conf. Make sure the permissions on log/ and tmp/ are correct.

Tag: geeky rails howto

Comments:
  • Thu Sep 20 00:07:47 2007 Derrick Pallas wrote:
    There is actually a severe resource leak in ruby-xslt, so I don't recommend using it in a Rails application. The author (to whom I submitted a patch ages ago) doesn't clean up things like temporary xmlChar* results, etc.
  • Thu Sep 20 06:59:29 2007 Satya wrote:
    Just great. I wonder if there are any alternatives, and if so, what's wrong with them.