[Solved] Extracting Docx-file with Java

Talk about anything at all....
Post Reply
balasahu
Posts: 2
Joined: Thu Jun 07, 2018 7:21 pm
Location: India

[Solved] Extracting Docx-file with Java

Post by balasahu »

I'm trying to extract the contents of a Docx-file with Java. The new Docx-files are based on the Office Open XML, which is a zipped XML-based file format. So when I just rename my document from document.docx to document.zip, I'm able to see the contents:

Image

Some of those extracted files need to be processed. But that would be another question.

Is it possible to directly access those contents, without explicitly renaming it into a zip-file?
Last edited by MrProgrammer on Thu Jul 23, 2020 5:14 am, edited 1 time in total.
Reason: Tagged ✓ [Solved]
User avatar
robleyd
Moderator
Posts: 5082
Joined: Mon Aug 19, 2013 3:47 am
Location: Murbko, Australia

Re: Extracting Docx-file with Java

Post by robleyd »

Have you tried unzipping the docx file by passing the filename as an argument to your decompression software?

It works for me on Linux:

Code: Select all

bash-4.3$ ls
AGM2013.docx  ebooks
bash-4.3$ unzip AGM2013.docx
Archive:  AGM2013.docx
  inflating: [Content_Types].xml     
  inflating: _rels/.rels             
  inflating: word/_rels/document.xml.rels  
  inflating: word/document.xml       
  inflating: word/theme/theme1.xml   
  inflating: word/settings.xml       
  inflating: word/fontTable.xml      
  inflating: word/webSettings.xml    
  inflating: docProps/app.xml        
  inflating: docProps/core.xml       
  inflating: word/styles.xml
Cheers
David
OS - Slackware 15 64 bit
Apache OpenOffice 4.1.15
LibreOffice 24.2.2.2; SlackBuild for 24.2.2 by Eric Hameleers
Post Reply