I have recently come across the problem that documents, that have been created with a newer version of MS Office, contain lots of redundant tags. Those redundant tags enclose virtually every single word.
Although (or because) OmegaT displays those tags, translating such texts while keeping the formatting is very time consuming. Furthermore, those redundant tags just do not conform to the idea of translation memories, thus making the usage of CAT-tools difficult.
It looks like that:
Code: Select all
Nabídky<f0> </f0>z menu<f1> </f1>pro<f2> </f2>překlad<f3> </f3>(nejlépe<f4> </f4>ve<f5> </f5>všech<f6> </f6>jazycích)
Code: Select all
<text:h text:style-name="Heading_20_1" text:outline-level="1">
Nabídky
<text:span text:style-name="T1"> </text:span>
z menu
<text:span text:style-name="T1"> </text:span>
pro
<text:span text:style-name="T1"> </text:span>
překlad
<text:span text:style-name="T1"> </text:span>
(nejlépe
<text:span text:style-name="T1"> </text:span>
ve
<text:span text:style-name="T1"> </text:span>
všech
<text:span text:style-name="T1"> </text:span>
jazycích)
</text:h>
Now my idea is if someone would be able to create a macro which does that job automatically so I can run it over a whole document. Although I can imagine the logic required to program such a macro I do not have sufficient programming skills to realize this myself.
So if someone would be willing and able to cooperate with me or at least provide an initial macro code I would be more than happy.
Update:
I fixed my particular problem the easy way. From reading several forums I realized that LibreOffice 3.4 has a bug similar to the weird MS Office behaviour, thus putting tags around every single word. So I just used LibreOffice 3.3 to open the doc-file and save it as odt. All redundant tags are gone and I can now work on the file easily. BTW, Abiword did not do the job, it was unable to import the graphics and did not keep the page layout.
I still leave this topic unsolved, because it would be nice to have an OpenOffice version of that Codezapper script.