Hi there!
We are in the process of converting thousands of documents from Microsoft Word 97 to OO 2.3.0. We are also using the resulting converted odt's as input to a conversion to a wiki (TWiki) using an XSLT conversion process.
We have noticed that the Word->OO conversion creates all the styles in the document as text:p, including the headings. Normally this is OK as the Style name can be used by the XSLT process to convert to headings. But if the Heading appears at the top of a page because of a page break, a custom sub-style is created with the heading style as a parent, including the page-break. Unfortunately the XSLT process cannot recognise these as headings and they just come out as plain (body) text.
The funny thing is that all the other headings on that page have custom styles defined for them as well, but these styles do not add any additional formatting to the Heading style it is based on. The styles seem to 'come right' on the next page or so.
An ODT document created from scratch in Writer correctly uses text:h styles for all the headings (including ones with associated page-breaks), allowing the XSLT process to convert them OK.
Any idea why the Word->ODT conversion doesn't recognise/save the headings as text:h?
Thanks in advance for any help.
Cheers, Duncan.
Converted Word Doc does not use text:h for headings
Re: Converted Word Doc does not use text:h for headings
I created a small text doc in Word with four lines — Head1, Head2, Head3, Normal with the styles set accordingly. I then opened this in Writer and saved as ODT. The content.xml contained the following:
Have you looked at the original documents? I know from past experience that authors often don't use Styles in their documents but instead just use character formatting to achieve the same visual effect. If the DOC file contains a heading styled of a normal or default base then you wouldn't expect the DOC importer to recognise this as a text:h token. In just isn't that psychic. If this isn't the case why not attach a fragment of your document as a test case, and I'll have a look.
- <text:h text:style-name="P1" text:outline-level="1">Head1</text:h>
<text:h text:style-name="Heading_20_2" text:outline-level="2">Head2</text:h>
<text:h text:style-name="Heading_20_3" text:outline-level="3">Head3</text:h>
<text:p text:style-name="Standard">normal</text:p>
Have you looked at the original documents? I know from past experience that authors often don't use Styles in their documents but instead just use character formatting to achieve the same visual effect. If the DOC file contains a heading styled of a normal or default base then you wouldn't expect the DOC importer to recognise this as a text:h token. In just isn't that psychic. If this isn't the case why not attach a fragment of your document as a test case, and I'll have a look.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.
Re: Converted Word Doc does not use text:h for headings
Thanks for the reply.
I did some more testing here and it appears that it is the Headings in a document that was created from a custom template which had different formatting styles for the 'Heading 1', 'Heading 2', etc. styles.
The actual paragraphs are defined as being the 'Heading 1' style, but that style has been redefined in the template. Indeed, the 'View | Outline" option in Word displays these headings as different levels of the document 'hierarchy'.
I have attached an example of a Word97 document created from this custom template and the resulting odt file which illustrates the <text:p> headings.
I did some more testing here and it appears that it is the Headings in a document that was created from a custom template which had different formatting styles for the 'Heading 1', 'Heading 2', etc. styles.
The actual paragraphs are defined as being the 'Heading 1' style, but that style has been redefined in the template. Indeed, the 'View | Outline" option in Word displays these headings as different levels of the document 'hierarchy'.
I have attached an example of a Word97 document created from this custom template and the resulting odt file which illustrates the <text:p> headings.
- Attachments
-
- testSpec.odt
- Resulting odt file
- (15.73 KiB) Downloaded 280 times
-
- testSpec.doc
- Original Word97 document created from the custom template.
- (38.5 KiB) Downloaded 270 times
Re: Converted Word Doc does not use text:h for headings
The problem seems to stem from the Heading styles as defined in the template. In the DOC file, the Heading 4-9 styles all have outline numbering specified, but the Heading 1-3 styles do not. It seems that in order for Writer to import headings correctly and assign them to outline levels in Outline Numbering, either all imported headings must have outline numbering specified or all imported headings must have no outline numbering specified.
After removing the numbering from Heading 4-9 styles in the DOC file, Headings 1-3 appeared in Outline Numbering when the DOC file was opened with Writer. I don't know if there is any other fix for this.
After removing the numbering from Heading 4-9 styles in the DOC file, Headings 1-3 appeared in Outline Numbering when the DOC file was opened with Writer. I don't know if there is any other fix for this.
Re: Converted Word Doc does not use text:h for headings
OK, I think that you owe me a virtual beer.
The issue is something to do with the fact that the "Heading 1" to "Heading 3" styles have been subtly tweaked in the Word template used to generate these forms. As a result if you do a Tools->Outline Numbering then you will see that the first three outline levels are not associated with these styles. As a result you both get the text:p tags instead of the text:h tags and the Wiki exporter fails to to pick up the heading levels. The workaround is to map these back before doing the export. This involves some tedious clicking and selecting in the Dialog or a bit of nasty UNO code in basic, which is a bit advanced so I have included this below:
This assume that this subroutine is stored in a Global Library but executed in the context of the document. I assume that you have some processing macro which is enumerating a folder to load the each DOC in turn then experting to Wiki. You just need to execute this macro between the load and export (you may want to tweak it to pass the document by parameter. I tried this with your testSpec.doc and this now produces a correct Wiki formatted testSpec.txt
Hope that this helps.
The issue is something to do with the fact that the "Heading 1" to "Heading 3" styles have been subtly tweaked in the Word template used to generate these forms. As a result if you do a Tools->Outline Numbering then you will see that the first three outline levels are not associated with these styles. As a result you both get the text:p tags instead of the text:h tags and the Wiki exporter fails to to pick up the heading levels. The workaround is to map these back before doing the export. This involves some tedious clicking and selecting in the Dialog or a bit of nasty UNO code in basic, which is a bit advanced so I have included this below:
Code: Select all
Sub ResetOutline
Dim oDoc, oNum, i
oDoc = StarDesktop.CurrentComponent
If Not oDoc.supportsService("com.sun.star.text.TextDocument" ) Then
MsgBox "This macro can only be run for a Writer Document"
Exit Sub
End If
oNum = oDoc.ChapterNumberingRules
Dim pv(10) as New com.sun.star.beans.PropertyValue
pv(0).Name = "Adjust" : pv(0).Value = 3
pv(1).Name = "ParentNumbering" : pv(1).Value = 10
pv(2).Name = "Prefix" : pv(2).Value = ""
pv(3).Name = "Suffix" : pv(3).Value = ""
pv(4).Name = "CharStyleName" : pv(4).Value = ""
pv(5).Name = "StartWith" : pv(5).Value = 1
pv(6).Name = "LeftMargin" : pv(6).Value = 0
pv(7).Name = "SymbolTextDistance" : pv(7).Value = 0
pv(8).Name = "FirstLineOffset" : pv(8).Value = 0
pv(9).Name = "NumberingType" : pv(9).Value = 5
For i = 1 To 3
pv(10).Name = "HeadingStyleName" : pv(10).Value = "Heading " & (i)
oNum.replaceByIndex(i-1,pv())
Next i
End SubHope that this helps.
Ubuntu 11.04-x64 + LibreOffice 3 and MS free except the boss's Notebook which runs XP + OOo 3.3.