XHTML Macro and Superscript Bug

Writing a book, Automating Document Production - Discuss your special needs here
Post Reply
FredJones
Posts: 9
Joined: Sun Mar 09, 2008 8:36 pm

XHTML Macro and Superscript Bug

Post by FredJones »

I am using Windows 2K with the latest release of OO (2.3.1) and I am using this macro to convert RTF files to XHTML:

"C:\Program Files\OpenOffice.org 2.3\program\soffice" -invisible macro:///Standard.Module1.SaveAsXHTML("H:\RTF Tests\Test_0045.rtf")

This works fine, except a few issues do NOT work. All other converters also fail on those, so I am inclined to think that the RTF I am using may have some MS Word-only code, as the RTF files were generated with MS Word. The one issue which the other converters do work with is superscript, however.

If I open an RTF with superscript and use regular Save As and then choose HTML in OO, then the resultant HTML indeed has the superscript. Using Export and XHTML or the above macro, however, the superscript is lost.

Any ideas how I can fix this? I have a large set of files to convert, but we don't want to lose superscripts.

Thank you!
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: XHTML Macro and Superscript Bug

Post by TerryE »

There have been a few related issues fixed for 3.0. See http://qa.openoffice.org/issues/show_bug.cgi?id=83771 as an example. I haven't established a workaround in the interim sorry.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
FredJones
Posts: 9
Joined: Sun Mar 09, 2008 8:36 pm

Re: XHTML Macro and Superscript Bug

Post by FredJones »

TerryE wrote:There have been a few related issues fixed for 3.0. See http://qa.openoffice.org/issues/show_bug.cgi?id=83771 as an example. I haven't established a workaround in the interim sorry.
OK. Thank you.
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: XHTML Macro and Superscript Bug

Post by TerryE »

I did actually try. The problem seems to be under certain circumstances the XHTML output filter loses some attributes in its CSS. One of these is the superscript styling. A fix for this has been test and is working its way through QA for planned introduction in 3.0.

I thought about doing a global Find replace for .* (regexp pattern, with the appropriate Superscript font attributes) for & (with no superscript but with some unique font that you don't normally use such as Univers. You could then tweak the output XHTML CSS to replace the class for Univers with one for no font override + superscript. This is all a bit of a hack and would take some scripting and quite a lot of playing around to get working. However, 3.0 is due out later in the year anyway.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
FredJones
Posts: 9
Joined: Sun Mar 09, 2008 8:36 pm

Re: XHTML Macro and Superscript Bug

Post by FredJones »

TerryE wrote: I thought about doing a global Find replace for .* (regexp pattern, with the appropriate Superscript font attributes) for & (with no superscript but with some unique font that you don't normally use such as Univers. You could then tweak the output XHTML CSS to replace the class for Univers with one for no font override + superscript. This is all a bit of a hack and would take some scripting and quite a lot of playing around to get working.
I actually already used a similar solution to handle small caps. OOo displays them fine but doesn't output them in the XHTML. There, I was able to identify a pattern and actually just use a well-defined CSS rule to make all of our page titles small caps. This works b/c they are all similar.

For superscripts, I have identified a regexp that would catch them in the XHTML and then I could use a script to edit the code (CSS alone won't work here b/c they're not selectable via CSS) but the one issue is that I found that at least one occurrence was actually a subscript. So we are still looking into this.
TerryE wrote: However, 3.0 is due out later in the year anyway.
Problem is that we need to get 40K RTF files online next month. OK, perhaps we will consider re-processing the RTF files with OOo 3.0 when it comes out. :)

Thank you very much for your assistance.
FredJones
Posts: 9
Joined: Sun Mar 09, 2008 8:36 pm

Re: XHTML Macro and Superscript Bug

Post by FredJones »

Attached is a sample file showing at least 3 issues: superscript, subscript and small caps. Truth is that most if not all of the other converters I tried also fail on small caps. OOo however natively DOES recognize it and parses it, so one would expect that also the XHTML would retain it.

Regarding a question asked of me offline, I don't know OOoBasic and my experience in VB is quite limited, so I don't think I will be able to look into the code much myself here.

Thank you.
Attachments
testfile.rtf
(8.59 KiB) Downloaded 319 times
TerryE
Volunteer
Posts: 1402
Joined: Sat Oct 06, 2007 10:13 pm
Location: UK

Re: XHTML Macro and Superscript Bug

Post by TerryE »

Fred, Thank for that example. This clearly describes the bug and it seems to be far wider that just subscripts. If you look at the xhtml, you will see that the content correctly establishes for text adornment classes:
  • T1 - Not sure why this is here, but this one isn't a bug
  • T2 - Small Caps
  • T3 - Superscript
  • T4 - Subscript
but that the export does not emit these styles:
  • *.T1 { }
    *.T2 { }
    *.T3 { }
    *.T4 { }
Sorry, I can't think of a sensible work-around. At least it looks as if this has been fixed for 3.0 :-(
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
FredJones
Posts: 9
Joined: Sun Mar 09, 2008 8:36 pm

Re: XHTML Macro and Superscript Bug

Post by FredJones »

OK. Thank you very much. 3.0 seems not to have any RCs ready, so we will do the best we can in the meantime.

Thanks.
Post Reply