Extract text from MS Word file using OOo API

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
shekhar.kotekar
Posts: 2
Joined: Mon Mar 28, 2011 12:45 pm

Extract text from MS Word file using OOo API

Post by shekhar.kotekar »

Hi,

I would like to know whether is it possible to use Open Office API to extract text and other information from MS word document.

Till now I have used MS Offfice Interop API (its too slow as it COM based), and also Apache POI (it has poor documentation, no support, no forums) but they have their disadvantages so I was thinking about using Open Office API.

Please enlighten !!!!
OpenOffice 3.1 on Windows Vista
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: Extract text from MS word file using Open Office API

Post by rudolfo »

shekhar.kotekar wrote:Till now I have used MS Offfice Interop API (its too slow as it COM based)
OpenOffice has to convert MS Word documents first into its internal XML or DOM tree format before it can do anything with it. If COM automation is too slow for you, I doubt that OOo would be fast enough for you.
The key point with OpenOffice is that is is build around an open xml format. If the Office Suite itself is to slow for you, you can always code your own XML application to manipulate the documents. But of course this approach to get around performance issues can only be used for documents that are already in the (ODF) xml format.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
shekhar.kotekar
Posts: 2
Joined: Mon Mar 28, 2011 12:45 pm

Re: Extract text from MS word file using Open Office API

Post by shekhar.kotekar »

@rudolfo,

Thanks a lot for your inputs.
OpenOffice 3.1 on Windows Vista
Post Reply