[Solved] Metadata in saved output file

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
JFalken
Posts: 12
Joined: Wed Feb 08, 2017 10:16 pm

[Solved] Metadata in saved output file

Post by JFalken »

Hello, I'm curious about the presence/absence of metadata in files saved by OpenOffice, specifically for the 97-2003 Microsoft Word format. If I open an existing (Word-created) .doc file in OO (v4.1.5 currently), save it out with the 97-2003 filter, and view the document properties from the Windows file system, it shows empty fields for Pages, Word Count, Character Count, etc.

The attached screenshots show the metadata comparisons.

The issue persists with the Java API. Now I know the properties exist and OO is aware of them - I can do a XPropertySet.getPropertyValue("WordCount") and see the actual word count. But that information never makes it out into the saved file (storeToUrl or storeAsUrl).

Any ideas for this? I'm really interested in an UNO API solution to saving this metadata along with the file, but the GUI shows it's not just an API issue.
Attachments
Output File
Output File
SavedProperties.png (4.6 KiB) Viewed 4557 times
Input File
Input File
OrigProperties.png (5.39 KiB) Viewed 4557 times
Last edited by JFalken on Fri Mar 22, 2019 2:21 pm, edited 1 time in total.
OpenOffice 4.1.3 on Windows 7 64-bit
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Metadata in saved output file

Post by Villeroy »

This forum is about Apache OpenOffice. It is not related to Microsoft Office.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
JFalken
Posts: 12
Joined: Wed Feb 08, 2017 10:16 pm

Re: Metadata in saved output file

Post by JFalken »

Villeroy wrote:This forum is about Apache OpenOffice. It is not related to Microsoft Office.
Sure, maybe I should be more specific in my question.

Why is Apache OpenOffice not saving metadata about document statistics in the file stored on disk?
OpenOffice 4.1.3 on Windows 7 64-bit
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Metadata in saved output file

Post by RoryOF »

It does when the files are saved in OpenOffice's native formats (odt, .ods etc)
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JFalken
Posts: 12
Joined: Wed Feb 08, 2017 10:16 pm

Re: Metadata in saved output file

Post by JFalken »

RoryOF wrote:It does when the files are saved in OpenOffice's native formats (odt, .ods etc)
This is a good point. OpenOffice clearly has the metadata in memory... Is there a way to force it (GUI or API) to emit that data when saving as a Word 97-2003 format?
OpenOffice 4.1.3 on Windows 7 64-bit
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Metadata in saved output file

Post by RoryOF »

There may be, but you might have to rewrite the Word97 filter; alternately, you could write an extension that wrote the metadata into the doc file, if you know where such data ought be located.

https://wiki.openoffice.org/wiki/Metadata_API
may have some information that would put you on such a path.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Metadata in saved output file

Post by RoryOF »

I have Saved a document in Word 97/200 format, then closed OO and reopened that document. All the metadata was present, visible under /File /Properties.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Metadata in saved output file

Post by Villeroy »

DocProperties_MSWord.png
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Metadata in saved output file

Post by RoryOF »

I got the equivalent of Villeroy's screen for my document. Also document statistics from the Statistics tab.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JFalken
Posts: 12
Joined: Wed Feb 08, 2017 10:16 pm

Re: Metadata in saved output file

Post by JFalken »

RoryOF wrote:I have Saved a document in Word 97/200 format, then closed OO and reopened that document. All the metadata was present, visible under /File /Properties.
Yes, I'm thinking OO is generating the statistics on the fly in your GUI, rather than reading it from the file directly. I'd imagine if you used something like exiftool or tika on the file, it wouldn't report the statistics fields (word count, page count, char count, etc) because OO isn't storing them in the file.
OpenOffice 4.1.3 on Windows 7 64-bit
User avatar
RoryOF
Moderator
Posts: 34586
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Metadata in saved output file

Post by RoryOF »

Open an .odt file with an archiver and examine it. The metadata is in meta.xml
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Metadata in saved output file

Post by Villeroy »

This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Metadata in saved output file

Post by John_Ha »

LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
JFalken
Posts: 12
Joined: Wed Feb 08, 2017 10:16 pm

Re: Metadata in saved output file

Post by JFalken »

Villeroy wrote:This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.
That's a nice sentiment, but you can open MS Word files in OpenOffice, modify them in OpenOffice, and save them in OpenOffice. What are you other requirements for being an editor of MS Word files, exactly?
John_Ha wrote:See [Tutorial] Differences between Writer and MS Word files for an explanation.
Yes, I've read it in the past, it's very well done. No mention of the lack of support for metadata, but I guess it may be covered by "...you may lose content..."

I think I have what I need. OO cannot output a complete 97-2003 MS Word .doc with accompanying metadata and it likely has no plans to augment support for that filter. I'll mark this as solved. Thanks to all who helped.
OpenOffice 4.1.3 on Windows 7 64-bit
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Metadata in saved output file

Post by Villeroy »

JFalken wrote:
Villeroy wrote:This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.
That's a nice sentiment, but you can open MS Word files in OpenOffice, modify them in OpenOffice, and save them in OpenOffice. What are you other requirements for being an editor of MS Word files, exactly?
The one and only editor would be the proprietary program able to load and edit each and every aspect of its own proprietary file format. OpenOffice lets you import data and formatting from foreign file formats fairly properly. You may even save some of these file formats more or less accurately, however this does not make AOO a full blown editor for anything else but Open Document Fromat.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Metadata in saved output file

Post by John_Ha »

JFalken wrote:I think I have what I need. OO cannot output a complete 97-2003 MS Word .doc with accompanying metadata and it likely has no plans to augment support for that filter.
That is probably correct. Metadata may even be optional under the so called ".doc standard".

I believe that page numbering is handled differently in AOO and MS Word. A .odt file has no page numbers in it. Page numbers are created dynamically when the parge margin is reached and the overflow is given the next page number.

See wikipedia for why Microsoft makes it impossible for others to get complete compatibility.
Because the DOC file format was a closed specification for many years, inconsistent handling of the format persists and may cause some loss of formatting information when handling the same file with multiple word processing programs.

Some specifications for Microsoft Office 97 binary file formats were published in 1997 under a restrictive license, but these specifications were removed from online download in 1999.

Specifications of later versions of Microsoft Office binary file formats were not publicly available.

The DOC format specification was available from Microsoft on request since 2006 under restrictive RAND-Z terms until February 2008.

Sun Microsystems and OpenOffice.org reverse engineered the file format.

On February 15, 2008, Microsoft released a .DOC format specification under the Microsoft Open Specification Promise. However, this specification does not describe all of the features used by DOC format and reverse engineered work remains necessary.

Since 2008 the specification has been updated several times; the last change was made in January 2017.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Post Reply