Generic Document properties values extraction

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
othmanelmoulat
Posts: 142
Joined: Sun Aug 03, 2008 4:39 am

Generic Document properties values extraction

Post by othmanelmoulat »

hello,

I need to extract a OO writer document Properties values generically to export them in an xml file . I don't think that calling the java methods of http://www.openoffice.org/api/docs/comm ... erties.htm one by one is a good approach. the obstacle is that each value of these properties has a different Tye. for Example "Author" type is a 4string' while DocumentStatistics is a com::sun::star::beans::NamedValue .

i have the below code that extracts only the Names of the document properties. i need an equivalent code to get the proper value in a String formet tpo be able to export it in my xml file. can you propose a way to iterate and get the Document properties values ?

thanks

Code: Select all

 
XTextDocument xtd = (XTextDocument)UnoRuntime.queryInterface(XTextDocument.class,m_xComponent);
	XDocumentInfoSupplier xdis = (XDocumentInfoSupplier)UnoRuntime.queryInterface(XDocumentInfoSupplier.class,xtd);
        XDocumentInfo xdi = xdis.getDocumentInfo();
        XPropertySet xps = (XPropertySet)UnoRuntime.queryInterface(XPropertySet.class,xdi);
        XPropertySetInfo xpsi = xps.getPropertySetInfo();
        Property[] props = xpsi.getProperties();
        for(int i=0;i< props.length;i++){
          System.out.println( props[i].Name);
        }
othmanelmoulat
Posts: 142
Joined: Sun Aug 03, 2008 4:39 am

Re: Generic Document properties values extraction

Post by othmanelmoulat »

can we do it by

Code: Select all

XDocumentproperties.storeToStorage() or storeToMedium()
then for XStorage specify an xml file? would this do it ? can you provide some java code to export the writer document properties into an xml file?
is there some method exportProperties() equivalent to this importer : http://www.openoffice.org/api/docs/comm ... orter.html

thanks
OOo 3.4 and LibreOffice 3.4 on openSuse 11.4 + windows 7
othmanelmoulat
Posts: 142
Joined: Sun Aug 03, 2008 4:39 am

Re: Generic Document properties values extraction

Post by othmanelmoulat »

ok i wrote this ugly method to convert property value to string. very ugly but this is what i could think of.

Code: Select all

 public String propertyValueToString(Object propValue) throws IllegalArgumentException {
        String res = null;
          SimpleDateFormat sdf = new SimpleDateFormat("MMM dd,yyyy HH:mm");
        if (AnyConverter.isDouble(propValue)) {
            res = Double.toString(AnyConverter.toDouble(propValue));
        } else if (AnyConverter.isFloat(propValue)) {
            res = Float.toString(AnyConverter.toFloat(propValue));
        } else if (AnyConverter.isInt(propValue)) {
            res = Integer.toString(AnyConverter.toInt(propValue));
        } else if (AnyConverter.isLong(propValue)) {
            res = Long.toString(AnyConverter.toLong(propValue));
        } else if (AnyConverter.isShort(propValue)) {
            res = Short.toString(AnyConverter.toShort(propValue));
        } else if (AnyConverter.isBoolean(propValue)) {
            res = Boolean.toString(AnyConverter.toBoolean(propValue));
        } else if (AnyConverter.isString(propValue)) {
            res = AnyConverter.toString(propValue);
        } else if (propValue instanceof Date) {
           // System.out.println("got a Date!");
            Date dt = (Date) (propValue);
             Calendar cal = new GregorianCalendar(dt.Year, dt.Month, dt.Day);          
            res = sdf.format(cal.getTime());

        }
         else if (propValue instanceof DateTime) {
           // System.out.println("got a Date!");
            DateTime dt = (DateTime) (propValue);
             Calendar cal = new GregorianCalendar(dt.Year, dt.Month, dt.Day);
            res = sdf.format(cal.getTime());

        }
        return res;
    }
OOo 3.4 and LibreOffice 3.4 on openSuse 11.4 + windows 7
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: Generic Document properties values extraction

Post by rudolfo »

What about extracting the meta.xml file from the .odt file directly? I've done this sometimes with Python and the zipfile module. I am pretty sure there is something similar in Java. And if not you can use the "com.sun.star.packages.Package" UNO-service.

The drawback might be that this method works on files (on the disk) and not on data blocks in memory. And you will probably have to run the meta.xml through an XSLT process to convert it to something that fits your needs.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
othmanelmoulat
Posts: 142
Joined: Sun Aug 03, 2008 4:39 am

Re: Generic Document properties values extraction

Post by othmanelmoulat »

rudolfo wrote:What about extracting the meta.xml file from the .odt file directly? I've done this sometimes with Python and the zipfile module. I am pretty sure there is something similar in Java. And if not you can use the "com.sun.star.packages.Package" UNO-service.

The drawback might be that this method works on files (on the disk) and not on data blocks in memory. And you will probably have to run the meta.xml through an XSLT process to convert it to something that fits your needs.
Hmm . i have wrote a java OO utility to introspect the oxt binary ( see my blog post: https://othmanelmoulatblog.wordpress.co ... with-java/)
but this java class only parses and extracts data from the binary oxt not from the current writer document opened in openoffice application. so i need another class that extracts data form .odt not from oxt. I also know that the odf toolkit allows us to do this kind of work. if you can present some java (or python) snippet code that extracts data from current .odt that would be very welcome.

thanks
OOo 3.4 and LibreOffice 3.4 on openSuse 11.4 + windows 7
othmanelmoulat
Posts: 142
Joined: Sun Aug 03, 2008 4:39 am

Re: Generic Document properties values extraction

Post by othmanelmoulat »

there is a good solution provided by the XMLTools.java class located at : http://c-cpp.r3dcode.com/files/LibreOff ... Tools.java

the method

Code: Select all

public static void exportDocument(XMultiServiceFactory xMSF, XComponent xDoc, String docType, String exportType, String fileURL)
does the job of exporting the xDoc componenet into an xml by specifying

Code: Select all

 docType =Writer and exportType=Meta
in the method arguments. the fileURL is the full url of where you want your xml file to be stored.
this is a good method. but it is not perfect for me as my xml needs to have some additional properties appended to document properties. so i have to parse again the exported file and append elements to xml then export again which is not very clean.
However if you want only to export your document props as xml the XMLTools class above is a very good solution
OOo 3.4 and LibreOffice 3.4 on openSuse 11.4 + windows 7
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: Generic Document properties values extraction

Post by rudolfo »

The python code that I use for extracting the meta data from the .odt file is not really a big deal:

Code: Select all

import zipfile

zip = zipfile.ZipFile(file, 'r', zipfile.ZIP_DEFLATED)
zipEntry = zip.getinfo('meta.xml')
metaData = zip.read(zipEntry)
# metaData is the XML as a stream of bytes

# If we want to inspect it a bit closer, we use Python's XML modules
import xml.dom.minidom
from xml.dom.minidom import parseString

xmlMeta = parseString(metaData)
metaRoot = xmlMeta.documentElement.getElementsByTagName('office:meta')[0]
# Loop through the child notes of "office:meta"
# And place either the child element and its first text node or the node and its
# attributes into two dictionaries
counters = {}
metas = {}
for aNode in metaRoot.childNodes :
    if aNode.localName == 'document-statistic' :
        for i in range(aNode.attributes.length) :
            counters[aNode.attributes.item(i).localName] = aNode.attributes.item(i).value
    elif aNode.hasChildNodes() :
        metas[aNode.localName] = aNode.firstChild.nodeValue
That's plain python with a few modules from its standard library. No OpenOffice UNO interfaces. Hence it might be different in Java, although I am pretty sure that the DOM parsing code is probably the same getChildNodes and getElementsByName should be available in any language, because it is rather DOM specific and not programming language specific.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
othmanelmoulat
Posts: 142
Joined: Sun Aug 03, 2008 4:39 am

Re: Generic Document properties values extraction

Post by othmanelmoulat »

rudolfo wrote:The python code that I use for extracting the meta data from the .odt file is not really a big deal:

Code: Select all

import zipfile

zip = zipfile.ZipFile(file, 'r', zipfile.ZIP_DEFLATED)
zipEntry = zip.getinfo('meta.xml')
metaData = zip.read(zipEntry)
# metaData is the XML as a stream of bytes

# If we want to inspect it a bit closer, we use Python's XML modules
import xml.dom.minidom
from xml.dom.minidom import parseString

xmlMeta = parseString(metaData)
metaRoot = xmlMeta.documentElement.getElementsByTagName('office:meta')[0]
# Loop through the child notes of "office:meta"
# And place either the child element and its first text node or the node and its
# attributes into two dictionaries
counters = {}
metas = {}
for aNode in metaRoot.childNodes :
    if aNode.localName == 'document-statistic' :
        for i in range(aNode.attributes.length) :
            counters[aNode.attributes.item(i).localName] = aNode.attributes.item(i).value
    elif aNode.hasChildNodes() :
        metas[aNode.localName] = aNode.firstChild.nodeValue
That's plain python with a few modules from its standard library. No OpenOffice UNO interfaces. Hence it might be different in Java, although I am pretty sure that the DOM parsing code is probably the same getChildNodes and getElementsByName should be available in any language, because it is rather DOM specific and not programming language specific.
it is probably same logic i did for the oxt java class mentioned above. not a clean solution but worth a test.
OOo 3.4 and LibreOffice 3.4 on openSuse 11.4 + windows 7
Post Reply