How to Convert Doc or Docx File to HTML in java?

Java, C++, C#, Delphi, ??? - Using the UNO bridges

How to Convert Doc or Docx File to HTML in java?

Postby abcdef » Wed Feb 01, 2012 4:57 pm

How to Convert Doc or Docx File to HTML in java? Please show me code
OpenOffice 3.1 on Windows Vista
abcdef
 
Posts: 1
Joined: Wed Feb 01, 2012 4:52 pm

Re: How to Convert Doc or Docx File to HTML in java?

Postby rudolfo » Wed Feb 01, 2012 11:47 pm

.doc and .docx are Microsoft formats. Traditionally Microsoft doesn't play well with Java. Why do you need this to be done in Java? Your chances are better if you approach this in .NET. And yes, the first address to ask about proprietary formats of a specific owner is the owner of these formats, so that would be Microsoft again.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
rudolfo
Volunteer
 
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Postby Villeroy » Wed Feb 01, 2012 11:58 pm

Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.4
User avatar
Villeroy
Volunteer
 
Posts: 28544
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Postby rudolfo » Thu Feb 02, 2012 8:19 pm

Villeroy wrote:http://webcat.sourceforge.net/javadocs/pt/tumba/parser/doc/DOC2HTML.html

If the quality of the documentation is an indication for the quality of the implementation it might be better to avoid it: The dummy text "Description of the method" is everywhere in the Javadoc and for the class DOC2HTML itself the explanation is: "uses the api to present the contents excel 97 spreadsheet as an html file". Assuming that DOC is saying something about the file extension, the class should rather look into Word documents but not into spreadsheets.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
rudolfo
Volunteer
 
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Postby rudolfo » Fri Feb 03, 2012 8:42 pm

It seems like those who are responsable for the code are cheating a bit. Look at these 3 lines of code taken from the convertDOCToHTML() method:

Code: Select all   Expand viewCollapse view
PrintWriter outw = new PrintWriter(new FileWriter(file2));
NativeExec.execute("antiword " + file1.getAbsolutePath(), outw);
outw.close();


antiword might be not too bad in converting MS Word files, but you surely don't need several hundred lines of java code to build a wrapper around it.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
rudolfo
Volunteer
 
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Postby Villeroy » Fri Feb 03, 2012 8:51 pm

Sorry. I just posted the first google hit for "doc2html +java".
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.4
User avatar
Villeroy
Volunteer
 
Posts: 28544
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: How to Convert Doc or Docx File to HTML in java?

Postby rudolfo » Fri Feb 03, 2012 10:06 pm

No problem with me. The real problem is that we as Volunteers have too often use our time for a google lookup, which could as well be done by the OP. And surely we have better things to do then classifying the quality of what pops up at google. Unfortunately we are not living in a perfect world.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
rudolfo
Volunteer
 
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany


Return to External Programs

Who is online

Users browsing this forum: No registered users and 1 guest