Spliters, Joiners, Converters and PDF Editing

Available to read, export (save) and edit PDF documents is a little known, lightweight, Linux application, Xournal [768 kb]. Xournal directly injects text of all sizes and types, erases (white-outs) given text and images, highlites, injects, shapes, draws and does other things. The image above-right illustrates some possible edits done by me. It was my first time, so please excuse the sloppiness. In general, Xournal can be used for notetaking, sketching and keeping a journal using a stylus in a variety of document forms. Unfortunately, it does not bookmark or have a search feature. A lightweight wine portable pdf app that does bookmarking and searching is pdfxchange. It also has other useful tools. I have not had a chance to use it, but I have heard good things about Master Pdf Editor, which has a Linux version: sudo apt-get install master-pdf-editor3.

There are many simple command line tools and special applications in Linux that perform special tasks related to PDF documents in a remarkably efficient and accurate way. Three general tool collections that can be easily installed in most distributions, are coreutils, x11-utils, x11-apps.

The accompanying chart illustrates what some command line tools can do. The main tool for pdf-documents is Poppler-utils (PDF Utilities)[456 kb]. The command, as printed in this table, should be given from the directory containing the pertinent files. The files resulting from the command will appear in the same directory (provided a specific path is not specified in the command). Finally, ghostscript, which is installed on most Linux distributions, can be used to merge or extract .pdf and/or .ps files, albeit the commands are long (see last 2 entries in accompanying table).

pdfseparate xx.pdf p-%d.pdfseparates xx.pdf into separate pages: p-1.pdf, p-2.pdf, ...
pdfseparate -f 5 xx.pdf p-%d.pdfseparates from page 5 to end: p-5.pdf, p-6.pdf, ...
pdfseparate -f 2 -l 3 xx.pdf p-%d.pdfseparates from page 2 to page 3: p-2.pdf, p-3.pdf
pdfseparate -f 2 -l 2 xx.pdf p-%d.pdfseparates page 2: p-2.pdf
pdfimages xx.pdf yextracts all images, saved as y-000.ppm, y-001.ppm,...
pdfunite xx.pdf yy.pdf zz.pdfunites xx.pdf and yy.pdf into zz.pdf
pdftotext xx.pdf extracts text, saved as xx.txt
pdftoppm xx.pdf yPDF to ppm converter, saved as y-1.ppm
pdftohtml xx.pdfPDF to HTML converter
pdftops xx.pdfPDF to PostScript (PS) converter
pdfinfo xx.pdfdocument information for xx.pdf
pdffonts xx.pdffont analyzer for xx.pdf
html2text xx.html | tee ~/xx.texconverts xx.html, including some special html symbols, to xx.tex
vilistextum -rcn xx.html xx.texconverts xx.html, including empty space but not symbols, to xx.tex
convert xx.xwd yy.jpgconverts image types (imagemagick)
convert xx.tif -compress jpeg xx.pdfconverts xx.tif to xx.pdf
csplit -k xx.tex 22 37splits xx.tex at lines 22, 37 with line 22 in the second file, etc
lxsplit -j xx.rar.001joins files of like type - creates xx.rar from xx.rar.001, xx.rar.002, etc
gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -dFirstPage=3 -dLastPage=5 -sOutputFile=fileout.pdf filein.pdf   [extracts pages 3-5 of filein.pdf]
gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=fileout.pdf filein1.pdf filein2.pdf   [merges filein1 and filein2]