Page 1 of 1

[Solved] RTF to HTML conversion loses centering attribute

Posted: Fri Jan 13, 2012 11:45 pm
by Woody20
I'm trying to use OO to convert RTF input (from MS Word 2000) to HTML.
If I open the RTF file, everything looks exactly correct on the screen.
If I save it as HTML, then re-open the HTML file in OO, almost everything looks the same (exception: table).
However, if I open the HTML file in Firefox, the text is not correct. Specifically, the paragraphs that were centered or right-justified in the RTF and when viewed in OO HTML are now all left-justified.

This is strange, because the text of the HTML file is

Code: Select all

<P CLASS="western" ALIGN=CENTER STYLE="text-indent: 0in; margin-bottom: 0in">
<FONT COLOR="#000000"><FONT FACE="Verdana, sans-serif"><FONT SIZE=4 STYLE="font-size: 16pt"><B>Some text that should be centered</B></FONT></FONT></FONT></P>
and the class "western" is

Code: Select all

P.western { font-size: 10pt; so-language: en-US }
Anybody know why the centering is not working as expected? I will deal with the table problems another day.

Re: RTF to HTML conversion loses centering attribute

Posted: Sat Jan 14, 2012 12:25 am
by Villeroy
Running MS Windows, you have an application called "WordPad" which supports RTF natively.

Re: RTF to HTML conversion loses centering attribute

Posted: Sat Jan 14, 2012 2:55 am
by rudolfo
When exporting to HTML OpenOffice allows you different flavors: HTML 3.2, Netscape Navigator, Internet Explorer and Writer. Sometimes it helps to test the different flavors when exporting. Though this dialog sucks, because the only thing that is clearly defined is HTML 3.2, the old W3C standard, the other 3 can be anything, IE4 is completely different from IE8. But the dialog doesn't tell of which version of IE it thinks of. But that's just another piece of the overall rule: Don't use OOo for RTF, and don't use it for HTML.

Still don't give up on OOo. The XHTML export is really good, at least if you view it in Firefox. Not sure if this is true for a document that was imported from RTF format, but for native .odt documents it is really amazingly close to the original.

Re: RTF to HTML conversion loses centering attribute

Posted: Sat Jan 14, 2012 4:09 am
by Woody20
How do I choose which HTML or XHTML standard I want? When I did "save as", there was only a single HTML choice.

Re: RTF to HTML conversion loses centering attribute

Posted: Sat Jan 14, 2012 2:15 pm
by Villeroy
Try menu:File>Export
Exportable file formats are write-only. OOo can convert to these file format but it can not open them.

Re: RTF to HTML conversion loses centering attribute

Posted: Sat Jan 14, 2012 8:14 pm
by Woody20
File/Export gives me two choices: xhtml and PDF. If I choose xhtml I get a message that the selected Java runtime environment is defective.

Not sure what you mean by "write-only". If the file format is supported by OO is should be readable by OO. However, this is not important for the current question, just curious.

Re: RTF to HTML conversion loses centering attribute

Posted: Sat Jan 14, 2012 9:17 pm
by Villeroy
I did not know that this tool requires Java.
Have a look at Tools>Options>Java and see if you can choose an auto-detected installation of Java or if you can point the office to an existing installation of Java.
A file format becomes write-only when it is possible to map the your own attributes to the other file format attributes while the other way round would be very difficult, incomplete or impossible.
In the case of PDF it is obvious that the exported file has nothing to do with an office document.

Re: RTF to HTML conversion loses centering attribute

Posted: Sun Jan 15, 2012 1:15 pm
by rudolfo
As a background information it is worth knowing that a great part of the export functionality in OOo is achieved through XSLT (eXtensible Stylesheet Language Transformation). There are quite a lot of transformation engines available: Microsoft has one, the relevant scripting languages (Perl,PHP,Python) include them as modules and one of the oldest and very stable processors is Xalan from the apache project. OOo uses the java implementation of Xalan, that's the reason why a working Java Runtime is required to export to XHTML.
In the usual export dialog PDF and XHTML seems to be on the same level, but PDF runs internally while XHMTL requires the JRE. If you want to know this in advance which export options needs Java there is a way to figure this out: In Tools -> XML Filter Settings you will see the currently installed and available filters. In my case:
DocBook file OpenOffice.org Writer (.sxw) import/export filter
MS Word 2003 OpenOffice.org Writer (.odt) import/export filter
UOF text OpenOffice.org Writer (.odt) import/export filter
XHTML Writer File OpenOffice.org Writer (.odt) export filter

All what you find in this Filter Settings dialog requires the Java Runtime. If it's an export and import filter you will find it under "Save As", if it is only an export filter File -> Export is the place to look for it. All the 4 listed formats are XML based, that makes it easy to simply run another XSL Transformation in the opposite direction to get the original .odt file back. "original" depends on how good the attributes of the two formats can be mapped on each other. For XHTML someone thought that's too limited to classify this as an import, as well. Or it was just to complicated or time consuming to write an appropriate import XSL file.

I don't know about the other formats above, but for DocBook I can say that it is used as a meta format in the OpenSource community to generate HTML, Tex/pdf and of course also plain Text formats. It is a logical markup language only. You can't view it directly (except for in its XML source code). If you want to visualize it, you will require one of the just mentioned transformators.

Re: RTF to HTML conversion loses centering attribute

Posted: Sun Jan 15, 2012 1:22 pm
by rudolfo
I think you still need the answer where to set the different HTML flavors. It is in Tools -> Options -> Load/Save - HTML compatibility.
But note that this setting has nothing to do with the above mentioned XML filters and XHTML. It is only relevant for the internal (and somehow ancient) conversion to HTML. But I am pretty sure you will have more luck with XHTML.

Re: [Solved] RTF to HTML conversion loses centering attribut

Posted: Sun Jan 15, 2012 10:19 pm
by Woody20
I downloaded Java runtime v 6 from Sun, used the Tools/Options/OpenOffice.org/Java window to add the directory jre6 created by the Java installer, and now when I can open an RTF file with OO and export it as xhtml. The resulting file looks correct in Firefox, including the table.

Re: RTF to HTML conversion loses centering attribute

Posted: Mon Aug 21, 2017 6:15 pm
by marcpolizzi
rudolfo wrote:I think you still need the answer where to set the different HTML flavors. It is in Tools -> Options -> Load/Save - HTML compatibility.
It is only relevant for the internal (and somehow ancient) conversion to HTML.
:D YES it's a very good information, I have "Netscape", I put "HTML 3.2" and all is right :)
The picture are well center in all browser.
Thank's
Marc

Re: [Solved] RTF to HTML conversion loses centering attribut

Posted: Tue Aug 22, 2017 5:30 pm
by marcpolizzi
Hi,

The problem if I put HTLM 3.2 on Aoo Writer for googd center, it's that page-break is missing.
(and to make a ebook it's a problem)

So I return to Netscape or iE or writer Htlm export and
in the html code I search & replace :
replace all ALIGN=CENTER> by style="text-align:center;">
and
replace all ALIGN=CENTER STYLE=" by style="text-align:center;
so all center are OK :D