Page 1 of 1

[Solved] Batch conversion of .docx to PDF

Posted: Wed Apr 22, 2009 5:23 pm
by Kmp
Welcome beginner. What is your question or comment?
Please try to briefly and clearly tell us: What you want, What you tried, and What happened.
-----------------------------------------------------------------------------------------------------------
Hello, I would like to ask if anyone knew of a way to batch convert many hundreds of .docx files to PDF. Since the Openoffice software already has a Export as PDF, but this is graphical and requires me to open every document.

Re: Batch conversion of .docx to PDF

Posted: Wed Apr 22, 2009 6:56 pm
by squenson
How do you plan to open .docx with Writer? AFAIK, this is not yet supported. May be you should instead a virtual pdf printer (http://www.cutepdf.com)

Re: Batch conversion of .docx to PDF

Posted: Wed Apr 22, 2009 7:16 pm
by Villeroy
http://www.oooninja.com/2008/02/batch-c ... -with.html
oooninja wrote:PDF printer method

If you just want to generate PDF files, you don't need a Python script, a Basic macro, Java code, or any other kind of programming. Just install a PDF printer such as PDFCreator (Windows) or CUPS-PDF. Then, use the -pt command with the the first argument as the printer name and the second argument as the source document.

Re: Batch conversion of .docx to PDF

Posted: Wed Apr 22, 2009 11:18 pm
by Cambirder
How do you plan to open .docx with Writer? AFAIK, this is not yet supported.
Yes it is, support was added in 3.0, although you can't save in that format.

Re: Batch conversion of .docx to PDF

Posted: Thu Apr 23, 2009 7:36 pm
by Kmp
I found a way to batch convert my .docx documents to .pdf. The program is called JODConverter, and you can get it from http://www.artofsolving.com/ . The site have excellent guides but I am just going to write what I did.
First downloaded the jodconverter-2.2.2.zip and extracted it to the desktop.

the program use openoffice for the conversion, so you have to start openoffice as a service. Open the terminal and enter.

Code: Select all

soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard
This is what the command looked like when I converted.

Code: Select all

java -jar /home/user/Desktop/jodconverter-2.2.2/lib/jodconverter-cli-2.2.2.jar -f pdf /home/user/Desktop/folder/*.docx
To convert, you have to point to the needed jodconverter-cli-2.2.2.jar. It is in the archive we downloaded, so for me it was.

Code: Select all

/home/user/Desktop/jodconverter-2.2.2/lib/jodconverter-cli-2.2.2.jar
.
then to specify what output file you want to use, use

Code: Select all

-f filetype
could be -f odt or -f pdf. Afterwards you specify where the files you want to convert are

Code: Select all

/home/user/Desktop/folder/*.docx
To select all files types with docx. use *.docx .

That was it, and I can actually convert my docx files now. Thanks Villeroy for the link above, which led me to the JODConverter site.

Re: Batch conversion of .docx to PDF

Posted: Thu Apr 23, 2009 7:37 pm
by TheGurkha
Thanks for the detailed feeback.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Thu May 13, 2010 7:46 am
by lokeshmf
Hi kmp,

I am able to convert docx files to PDF using the commands below..

soffice -headless -accept="socket,host=localhost,port=8100;urp;" -nofirststartwizard

java -jar C:/jodconverter-2.2.2/lib/jodconverter-cli-2.2.2.jar -f pdf C:/*.doc

but I am facing allignment problems and other like index is not coming and some table formats , pics are not coming in the PDF document.

Can u help me on this?

Re: [Solved] Batch conversion of .docx to PDF

Posted: Thu May 13, 2010 9:05 am
by RoryOF
Install a pdf printer on Windows MS Office and try that instead.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Thu May 13, 2010 10:18 am
by lokeshmf
Can't we do it using open office?

Re: [Solved] Batch conversion of .docx to PDF

Posted: Thu May 13, 2010 10:36 am
by RoryOF
You have just found out how limited is OOo's support for docx files.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Thu May 13, 2010 10:31 pm
by Kmp
Like said before, the problem is AFAIK with openoffice not being apple to convert .docx filetypes properly, therefore there's nothing I can do. If you want there is another option that RoryOF suggested, being that you will have to download pdfcreator, which is a pdf printer, and print the files using MS office, or any other suite that can read the files properly. If you have many, you can right click the files and print them all at once, just be sure to que them with some time in between prints, because if you print them all at once, the computer may begin to act slow.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Thu May 13, 2010 10:46 pm
by TheGurkha
What about Zamzar? They ought to be able to do it.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Fri May 14, 2010 12:54 am
by Kmp
Just tried their service, and it works great. Converted a .docx file, with tables and images and it went through clean. Thank you for the link.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Fri May 14, 2010 12:53 pm
by lokeshmf
We need to convert docx to PDF using java and open office.

And more over we are not allowed to use any other converters or PDF printers.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Sat Nov 03, 2012 6:52 am
by DPell
Actually the same site, artofsolving, has another alternative tool to the java-based JOD that is Python based -- pyodconverter -- [http://www.artofsolving.com/opensource/pyodconverter]. It works with OpenOffice and LibreOffice and looks like it may still be an active project [https://github.com/mirkonasato/pyodconverter]. I was happy to see a Python and LibreOffice based option as these are my preferred tools / products. LibreO has better support of doc standards and interoperability.

Re: [Solved] Batch conversion of .docx to PDF

Posted: Sat Nov 03, 2012 9:44 am
by DPell
Additionall information for PyODConverter (from the ReadMe):

PyODConverter (for Python OpenDocument Converter) is a Python script that automates office document conversions from the command line using LibreOffice or OpenOffice.org.

The script was written as a simpler alternative to JODConverter for command line usage.
Usage

PyODConverter requires LibreOffice/OpenOffice.org to be running as a service and listening on port (by default) 2002; this can be achieved e.g. by starting it from the command line as

$ soffice "-accept=socket,port=2002;urp;"

The script expects exactly 2 parameters: an input and an output file names. The document formats are inferred from the file extensions.

Since it uses the Python/UNO bridge, the script requires the UNO modules to be already present in your Python installation. Most of the time this means you need to use the Python version installed with OpenOffice.org, e.g. on Windows

> "C:\Program Files\OpenOffice.org 3.1\program\python" DocumentConverter.py test.odt test.pdf

or on Linux

$ /opt/openoffice.org3.1/program/python DocumentConverter.py test.odt test.pdf

If you want to write your own scripts in Python, PyODConverter can also act as a Python module, exporting a DocumentConverter class with a very simple API.