I'm trying to extract the contents of a Docx-file with Java. The new Docx-files are based on the Office Open XML, which is a zipped XML-based file format. So when I just rename my document from document.docx to document.zip, I'm able to see the contents:
Some of those extracted files need to be processed. But that would be another question.
Is it possible to directly access those contents, without explicitly renaming it into a zip-file?
[Solved] Extracting Docx-file with Java
[Solved] Extracting Docx-file with Java
Last edited by MrProgrammer on Thu Jul 23, 2020 5:14 am, edited 1 time in total.
Reason: Tagged ✓ [Solved]
Reason: Tagged ✓ [Solved]
Re: Extracting Docx-file with Java
Have you tried unzipping the docx file by passing the filename as an argument to your decompression software?
It works for me on Linux:
It works for me on Linux:
Code: Select all
bash-4.3$ ls
AGM2013.docx ebooks
bash-4.3$ unzip AGM2013.docx
Archive: AGM2013.docx
inflating: [Content_Types].xml
inflating: _rels/.rels
inflating: word/_rels/document.xml.rels
inflating: word/document.xml
inflating: word/theme/theme1.xml
inflating: word/settings.xml
inflating: word/fontTable.xml
inflating: word/webSettings.xml
inflating: docProps/app.xml
inflating: docProps/core.xml
inflating: word/styles.xml
Cheers
David
OS - Slackware 15 64 bit
Apache OpenOffice 4.1.15
LibreOffice 24.2.2.2; SlackBuild for 24.2.2 by Eric Hameleers
David
OS - Slackware 15 64 bit
Apache OpenOffice 4.1.15
LibreOffice 24.2.2.2; SlackBuild for 24.2.2 by Eric Hameleers