How to Convert Doc or Docx File to HTML in java?

Java, C++, C#, Delphi... - Using the UNO bridges
Post Reply
abcdef
Posts: 1
Joined: Wed Feb 01, 2012 4:52 pm

How to Convert Doc or Docx File to HTML in java?

Post by abcdef »

How to Convert Doc or Docx File to HTML in java? Please show me code
OpenOffice 3.1 on Windows Vista
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Post by rudolfo »

.doc and .docx are Microsoft formats. Traditionally Microsoft doesn't play well with Java. Why do you need this to be done in Java? Your chances are better if you approach this in .NET. And yes, the first address to ask about proprietary formats of a specific owner is the owner of these formats, so that would be Microsoft again.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Post by Villeroy »

Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Post by rudolfo »

If the quality of the documentation is an indication for the quality of the implementation it might be better to avoid it: The dummy text "Description of the method" is everywhere in the Javadoc and for the class DOC2HTML itself the explanation is: "uses the api to present the contents excel 97 spreadsheet as an html file". Assuming that DOC is saying something about the file extension, the class should rather look into Word documents but not into spreadsheets.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Post by rudolfo »

It seems like those who are responsable for the code are cheating a bit. Look at these 3 lines of code taken from the convertDOCToHTML() method:

Code: Select all

 PrintWriter outw = new PrintWriter(new FileWriter(file2));
 NativeExec.execute("antiword " + file1.getAbsolutePath(), outw);
 outw.close();
antiword might be not too bad in converting MS Word files, but you surely don't need several hundred lines of java code to build a wrapper around it.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Post by Villeroy »

Sorry. I just posted the first google hit for "doc2html +java".
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Post by rudolfo »

No problem with me. The real problem is that we as Volunteers have too often use our time for a google lookup, which could as well be done by the OP. And surely we have better things to do then classifying the quality of what pops up at google. Unfortunately we are not living in a perfect world.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
Post Reply