Page 1 of 1

[Solved] Metadata in saved output file

PostPosted: Thu Mar 21, 2019 4:28 pm
by JFalken
Hello, I'm curious about the presence/absence of metadata in files saved by OpenOffice, specifically for the 97-2003 Microsoft Word format. If I open an existing (Word-created) .doc file in OO (v4.1.5 currently), save it out with the 97-2003 filter, and view the document properties from the Windows file system, it shows empty fields for Pages, Word Count, Character Count, etc.

The attached screenshots show the metadata comparisons.

The issue persists with the Java API. Now I know the properties exist and OO is aware of them - I can do a XPropertySet.getPropertyValue("WordCount") and see the actual word count. But that information never makes it out into the saved file (storeToUrl or storeAsUrl).

Any ideas for this? I'm really interested in an UNO API solution to saving this metadata along with the file, but the GUI shows it's not just an API issue.

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 5:10 pm
by Villeroy
This forum is about Apache OpenOffice. It is not related to Microsoft Office.

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 5:26 pm
by JFalken
Villeroy wrote:This forum is about Apache OpenOffice. It is not related to Microsoft Office.


Sure, maybe I should be more specific in my question.

Why is Apache OpenOffice not saving metadata about document statistics in the file stored on disk?

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 6:09 pm
by RoryOF
It does when the files are saved in OpenOffice's native formats (odt, .ods etc)

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 6:22 pm
by JFalken
RoryOF wrote:It does when the files are saved in OpenOffice's native formats (odt, .ods etc)


This is a good point. OpenOffice clearly has the metadata in memory... Is there a way to force it (GUI or API) to emit that data when saving as a Word 97-2003 format?

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 6:27 pm
by RoryOF
There may be, but you might have to rewrite the Word97 filter; alternately, you could write an extension that wrote the metadata into the doc file, if you know where such data ought be located.

https://wiki.openoffice.org/wiki/Metadata_API
may have some information that would put you on such a path.

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 6:30 pm
by RoryOF
I have Saved a document in Word 97/200 format, then closed OO and reopened that document. All the metadata was present, visible under /File /Properties.

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 6:55 pm
by Villeroy
DocProperties_MSWord.png

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 7:06 pm
by RoryOF
I got the equivalent of Villeroy's screen for my document. Also document statistics from the Statistics tab.

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 7:08 pm
by JFalken
RoryOF wrote:I have Saved a document in Word 97/200 format, then closed OO and reopened that document. All the metadata was present, visible under /File /Properties.


Yes, I'm thinking OO is generating the statistics on the fly in your GUI, rather than reading it from the file directly. I'd imagine if you used something like exiftool or tika on the file, it wouldn't report the statistics fields (word count, page count, char count, etc) because OO isn't storing them in the file.

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 7:34 pm
by RoryOF
Open an .odt file with an archiver and examine it. The metadata is in meta.xml

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 7:34 pm
by Villeroy
This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.

Re: Metadata in saved output file

PostPosted: Thu Mar 21, 2019 7:39 pm
by John_Ha

Re: Metadata in saved output file

PostPosted: Fri Mar 22, 2019 2:21 pm
by JFalken
Villeroy wrote:This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.


That's a nice sentiment, but you can open MS Word files in OpenOffice, modify them in OpenOffice, and save them in OpenOffice. What are you other requirements for being an editor of MS Word files, exactly?

John_Ha wrote:See [Tutorial] Differences between Writer and MS Word files for an explanation.


Yes, I've read it in the past, it's very well done. No mention of the lack of support for metadata, but I guess it may be covered by "...you may lose content..."

I think I have what I need. OO cannot output a complete 97-2003 MS Word .doc with accompanying metadata and it likely has no plans to augment support for that filter. I'll mark this as solved. Thanks to all who helped.

Re: Metadata in saved output file

PostPosted: Fri Mar 22, 2019 2:36 pm
by Villeroy
JFalken wrote:
Villeroy wrote:This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.


That's a nice sentiment, but you can open MS Word files in OpenOffice, modify them in OpenOffice, and save them in OpenOffice. What are you other requirements for being an editor of MS Word files, exactly?

The one and only editor would be the proprietary program able to load and edit each and every aspect of its own proprietary file format. OpenOffice lets you import data and formatting from foreign file formats fairly properly. You may even save some of these file formats more or less accurately, however this does not make AOO a full blown editor for anything else but Open Document Fromat.

Re: Metadata in saved output file

PostPosted: Fri Mar 22, 2019 2:45 pm
by John_Ha
JFalken wrote:I think I have what I need. OO cannot output a complete 97-2003 MS Word .doc with accompanying metadata and it likely has no plans to augment support for that filter.

That is probably correct. Metadata may even be optional under the so called ".doc standard".

I believe that page numbering is handled differently in AOO and MS Word. A .odt file has no page numbers in it. Page numbers are created dynamically when the parge margin is reached and the overflow is given the next page number.

See wikipedia for why Microsoft makes it impossible for others to get complete compatibility.

Because the DOC file format was a closed specification for many years, inconsistent handling of the format persists and may cause some loss of formatting information when handling the same file with multiple word processing programs.

Some specifications for Microsoft Office 97 binary file formats were published in 1997 under a restrictive license, but these specifications were removed from online download in 1999.

Specifications of later versions of Microsoft Office binary file formats were not publicly available.

The DOC format specification was available from Microsoft on request since 2006 under restrictive RAND-Z terms until February 2008.

Sun Microsystems and OpenOffice.org reverse engineered the file format.

On February 15, 2008, Microsoft released a .DOC format specification under the Microsoft Open Specification Promise. However, this specification does not describe all of the features used by DOC format and reverse engineered work remains necessary.

Since 2008 the specification has been updated several times; the last change was made in January 2017.