[Solved] Metadata in saved output file
[Solved] Metadata in saved output file
Hello, I'm curious about the presence/absence of metadata in files saved by OpenOffice, specifically for the 97-2003 Microsoft Word format. If I open an existing (Word-created) .doc file in OO (v4.1.5 currently), save it out with the 97-2003 filter, and view the document properties from the Windows file system, it shows empty fields for Pages, Word Count, Character Count, etc.
The attached screenshots show the metadata comparisons.
The issue persists with the Java API. Now I know the properties exist and OO is aware of them - I can do a XPropertySet.getPropertyValue("WordCount") and see the actual word count. But that information never makes it out into the saved file (storeToUrl or storeAsUrl).
Any ideas for this? I'm really interested in an UNO API solution to saving this metadata along with the file, but the GUI shows it's not just an API issue.
The attached screenshots show the metadata comparisons.
The issue persists with the Java API. Now I know the properties exist and OO is aware of them - I can do a XPropertySet.getPropertyValue("WordCount") and see the actual word count. But that information never makes it out into the saved file (storeToUrl or storeAsUrl).
Any ideas for this? I'm really interested in an UNO API solution to saving this metadata along with the file, but the GUI shows it's not just an API issue.
- Attachments
-
- Output File
- SavedProperties.png (4.6 KiB) Viewed 4726 times
-
- Input File
- OrigProperties.png (5.39 KiB) Viewed 4726 times
Last edited by JFalken on Fri Mar 22, 2019 2:21 pm, edited 1 time in total.
OpenOffice 4.1.3 on Windows 7 64-bit
Re: Metadata in saved output file
This forum is about Apache OpenOffice. It is not related to Microsoft Office.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Metadata in saved output file
Sure, maybe I should be more specific in my question.Villeroy wrote:This forum is about Apache OpenOffice. It is not related to Microsoft Office.
Why is Apache OpenOffice not saving metadata about document statistics in the file stored on disk?
OpenOffice 4.1.3 on Windows 7 64-bit
Re: Metadata in saved output file
It does when the files are saved in OpenOffice's native formats (odt, .ods etc)
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: Metadata in saved output file
This is a good point. OpenOffice clearly has the metadata in memory... Is there a way to force it (GUI or API) to emit that data when saving as a Word 97-2003 format?RoryOF wrote:It does when the files are saved in OpenOffice's native formats (odt, .ods etc)
OpenOffice 4.1.3 on Windows 7 64-bit
Re: Metadata in saved output file
There may be, but you might have to rewrite the Word97 filter; alternately, you could write an extension that wrote the metadata into the doc file, if you know where such data ought be located.
https://wiki.openoffice.org/wiki/Metadata_API
may have some information that would put you on such a path.
https://wiki.openoffice.org/wiki/Metadata_API
may have some information that would put you on such a path.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: Metadata in saved output file
I have Saved a document in Word 97/200 format, then closed OO and reopened that document. All the metadata was present, visible under /File /Properties.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: Metadata in saved output file
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Metadata in saved output file
I got the equivalent of Villeroy's screen for my document. Also document statistics from the Statistics tab.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: Metadata in saved output file
Yes, I'm thinking OO is generating the statistics on the fly in your GUI, rather than reading it from the file directly. I'd imagine if you used something like exiftool or tika on the file, it wouldn't report the statistics fields (word count, page count, char count, etc) because OO isn't storing them in the file.RoryOF wrote:I have Saved a document in Word 97/200 format, then closed OO and reopened that document. All the metadata was present, visible under /File /Properties.
OpenOffice 4.1.3 on Windows 7 64-bit
Re: Metadata in saved output file
Open an .odt file with an archiver and examine it. The metadata is in meta.xml
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: Metadata in saved output file
This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Metadata in saved output file
See [Tutorial] Differences between Writer and MS Word files for an explanation.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Re: Metadata in saved output file
That's a nice sentiment, but you can open MS Word files in OpenOffice, modify them in OpenOffice, and save them in OpenOffice. What are you other requirements for being an editor of MS Word files, exactly?Villeroy wrote:This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.
Yes, I've read it in the past, it's very well done. No mention of the lack of support for metadata, but I guess it may be covered by "...you may lose content..."John_Ha wrote:See [Tutorial] Differences between Writer and MS Word files for an explanation.
I think I have what I need. OO cannot output a complete 97-2003 MS Word .doc with accompanying metadata and it likely has no plans to augment support for that filter. I'll mark this as solved. Thanks to all who helped.
OpenOffice 4.1.3 on Windows 7 64-bit
Re: Metadata in saved output file
The one and only editor would be the proprietary program able to load and edit each and every aspect of its own proprietary file format. OpenOffice lets you import data and formatting from foreign file formats fairly properly. You may even save some of these file formats more or less accurately, however this does not make AOO a full blown editor for anything else but Open Document Fromat.JFalken wrote:That's a nice sentiment, but you can open MS Word files in OpenOffice, modify them in OpenOffice, and save them in OpenOffice. What are you other requirements for being an editor of MS Word files, exactly?Villeroy wrote:This is not an editor for MS Word files. Always use MS Word for MS Word docs. MS Word is the one and only application for this type of file.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Metadata in saved output file
That is probably correct. Metadata may even be optional under the so called ".doc standard".JFalken wrote:I think I have what I need. OO cannot output a complete 97-2003 MS Word .doc with accompanying metadata and it likely has no plans to augment support for that filter.
I believe that page numbering is handled differently in AOO and MS Word. A .odt file has no page numbers in it. Page numbers are created dynamically when the parge margin is reached and the overflow is given the next page number.
See wikipedia for why Microsoft makes it impossible for others to get complete compatibility.
Because the DOC file format was a closed specification for many years, inconsistent handling of the format persists and may cause some loss of formatting information when handling the same file with multiple word processing programs.
Some specifications for Microsoft Office 97 binary file formats were published in 1997 under a restrictive license, but these specifications were removed from online download in 1999.
Specifications of later versions of Microsoft Office binary file formats were not publicly available.
The DOC format specification was available from Microsoft on request since 2006 under restrictive RAND-Z terms until February 2008.
Sun Microsystems and OpenOffice.org reverse engineered the file format.
On February 15, 2008, Microsoft released a .DOC format specification under the Microsoft Open Specification Promise. However, this specification does not describe all of the features used by DOC format and reverse engineered work remains necessary.
Since 2008 the specification has been updated several times; the last change was made in January 2017.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.