[Tutorial] Differences between Writer and MS Word files

Forum rules
No question in this forum please
For any question related to a topic, create a new thread in the relevant section.

[Tutorial] Differences between Writer and MS Word files

Postby John_Ha » Mon Jun 06, 2016 4:55 pm

AOO Writer and MS Word (and other word processing programs) are all similar, but none are identical. If you open a Writer .odt file in Word, or a Word .doc, .rtf or .docx file in Writer, you will sometimes notice differences.

Both programs do many of the same things, including text, styles, tables, images, bold, italics, headings, page number, headers, footers etc, as shown in the light blue area below. However, Writer does some things which Word does not do (the red areas); and Word does some things which Writer does not do (the green and dark blue areas). Each program can store its own data in its own file, but obviously cannot store this extra data in the other program's file as there is nowhere for it to go and/or nothing in the other program to see it.

Writer and Word are based on different schools of typography which can be slightly confusing. Word considers the page header/footer areas to be part of "print matter" while Writer considers them to be "marginalia". You may need to change the top and/or bottom margin widths by the height of 'one line + header/footer spacing' if you have page headers/footers and you are trying to replicate a .doc layout in a .odt file. [Thanks to keme]

files.png
Different capabilities in Writer and its .odt files; compared with MS Word and its .doc, .docx (and .rtf) files

Similarly, when you open a .doc. .docx or .rtf file, what you see may not be exactly what the person wrote - formatting in particular is often changed. .rtf files are particularly limited in what they can store. .txt files can store only the text characters - they cannot store any formatting or font information.

0. When using Writer always save all documents as .odt files

When using any application, always save files in that application's format because everything will be saved. When using Writer always to save all documents as .odt files.

That way you know that all your document and formatting will be saved. If someone irrationally asks you to send them a .doc file, question the request, and offer to send them a .odt file instead as all versions of Microsoft Office later than 2007 claim to be able both to read and to write .odt files. If MS Word corrupts the .odt file, get the recipient to complain to Microsoft. If the requester insists on a .doc file, then create a .doc file from the .odt file, and delete the .doc after sending it.

Always work in, and save all Writer documents, as .odt files.

Don't forget that Google Docs uses .odt files and Microsoft is now feeling a lot of pressure from the .odt format.

If you save your work as any other file other than a .odt file (eg .doc, .rtf etc) you are almost certain to lose something. In general, it is the more complex things which get mangled, such as Edit > Changes, bullet shapes, colours etc.

Be very careful with .rtf files

Note how a .rtf file cannot store some of Writer's capability. To make matters worse, Writer has, for example, chosen not to write notes to an .rtf file even though the .rtf file does allow notes to be saved in it. This is an example of an application (Writer) having the capability (notes), but choosing not to provide it in a given format (.rtf). Writer will, of course, save notes in a .odt file so, save as a .odt file and then create a copy as a .rtf file. If anything gets lost in the .rtf file you can go back to the .odt file where it will be saved.

OpenOffice Migration Guide

See the OpenOffice Migration Guide for more information.

1. Textboxes in .docx files do not display and nor does their content

Later versions of MS Word which write .docx files often use Textboxes. Textboxes are not part of the OOXML International Standard - they are a Microsoft add-on which is proprietary. See OOXML/Markup Compatibility and Extensibility which says

Although the OOXML spec defines a specific set of allowed elements, Microsoft sometimes extend this with additional proprietary elements that are specific to new versions of Office. For example, if you insert a shape into a document in Word 2013, it will be defined in terms of a "word processing shape" element structure, which is not part of the OOXML spec. For the purposes of compatibility with older versions of Word however, they include a second version of the shape which uses an element structure that is defined in the spec, albeit using the legacy VML drawing format.

Writer (4.1.3) only recognises the OOXML Standard parts of the file - anything which does not comply with the OOXML Standard is ignored so Textboxes are ignored. It appears that LibreOffice does recognise Textboxes.

2. Bullets, list items and numbered items in .doc files often display incorrectly

Bullets, list items and numbered items in MS Word .doc files often display incorrectly when the file is opened with Writer and the corruption persists when the file is saved as a .odt file. Typical corruptions are the bullet appearing with a digit inside (10 is common), or the list number [eg a) or b) ] being struck through or highlighted in colour.

The bullet appearing with a digit inside (10 is common) is often a font substitution problem - see Corrupted Bullets

The list item being incorrect problem is usually caused by MS Word specific Character Styles, typically with names like WW8Num1z0, WW8Numz2 ..., etc, which are applied to Bullets, Lists and Numbering. Deleting these MS Word Character Styles (or editing them to be consistent with what is available in Writer) fixes the problem. What actually happens is the MS Word Character Style, which is defined in the Styles and Formatting dialogue under Character Styles is applied to the Bullets, Lists and Numbered Items by the Format > Bullets and Numbering ..., dialogue, where it appears under the Option tab as the selected Character Style. Set it to Numbering Symbols which is the default setting for AOO bullets. If you set it to None the bullets pick up the font etc characteristics from the text and not from the List Styles. See Oddities Involving Bullets/Outlines & Font Styles/Highlights

Either delete these unwanted Character Styles by

1 press F11 to open the Styles and Formatting window
2 click Character Styles - second icon
3 right click the character styles with names beginning WW8 > delete

This fixes it throughout the entire document.

Or, fix just one occurrence by resetting it to use the Writer defaults

1 place the cursor in a bulleted line and go Format > Bullets and Numbering
2 choose the Options tab
3 Character Style will be something like WW8Num1z0. Set it to Numbering Symbols (or None as appropriate)

3. Documents layout differently - lines, paragraphs and pages spill in different places

This is not an MS Word / OpenOffice problem - it is more a "Microsoft Windows lockin" problem.

It is in Microsoft's commercial interest to keep on changing fonts and/or add new fonts to Windows and to encourage Windows users to use these fonts. When documents with these new fonts are sent to users using other operating systems, or even older versions of Windows, which do not have the fonts installed, the documents will invariably change format - lines, paragraphs and pages spill in different places. The only way to ensure the layout does not change is to do what PDFs do, namely embed the fonts in the PDF file itself; or install the fonts on the new PC.

Remember that the font showing in the Writer font drop-down selection box is the font the document is asking for. This may NOT be the font being used to create the display because, if the font being asked for is not installed on the PC, Windows (or other operating system) will silently substitute a different font which is available.

The TestFonts add-on is invaluable for finding missing fonts which the document is asking for, but which are not installed on the PC.

You can see which fonts are installed on the PC by Start > Control Panel > Fonts or by clicking C:\Windows\Fonts.

4. Saving as .doc files is not recommended but ...

... if you are forced to save as a .doc file, be sure to select Word 97 / 2000 / XP as it is the most recent format. Word 95 and Word 6.0 .doc formats are very old and obsolete and less comprehensive than Word 97 / 2000 / XP .doc format.

most modern.png
Use Word 97 / 2000 / XP - Word 95 and Word 6.0 are very old and obsolete

If you attempt to save a document as any format other than .odt, Writer warns you that you may lose data as in the pop-up window below. Unfortunately, many users switch off this warning. If you do not get this warning message, you can switch it back on with Tools > Options > LoadSave > General ...

Save as doc.png
Writer warning when you save as a .doc (or other format) file

5. Microsoft Word Viewer

If you regularly receive .doc or .docx files, you will find it very useful to download the free Microsoft Word Viewer from How to obtain the latest Microsoft Word Viewer. You can then open the .doc or .docx file, and check to see if any content is missing and, if necessary, copy the content into Writer.

6. MS Word can read and write .odt files

All versions of MS Word later that Word 2007 claim to be able both to read and write .odt files and Microsoft lists its partial support of .odt files in Differences between the OpenDocument Text (.odt) format and the Word (.docx) format. So, if someone sends you a .doc or .docx file you cannot read, ask them to send you a .odt file instead. If MS Word does not create a proper .odt file, ask the sender to complain vigorously to Microsoft. Similarly, if you send someone who uses MS Word a .odt file, and MS Word does not present it correctly, ask the person who received it to complain vigorously to Microsoft.

Note that AOO has some Microsoft compatibility options available under Tools > Options > Load/Save > VBA Properties..., and Tools > Options > Load/Save > Microsoft Office ..., which may need changing.

7. Academic study of Interoperability Issues

For an academic study of the problems see the University of Illinois' paper Lost in Translation: Interoperability Issues for Open Standards written in 2008.

I did not think that the paper covered very well the fact that the key benefit of an Open Standard is that ...

... it provides the all information necessary so that anyone can extract all the information from the data file without needing to have the application. This is because the file structure is not a commercial secret
.

Similarly, I felt the paper only briefly mentioned that applications must support all the "items" coded in the file - see the diagram on this page. Interoperability only exists across those functions implemented in both programs and those functions which are implemented in file format being used to store the document ie the light blue items for Writer, MS Word, .odt and .doc files.

Further information on the history of the .doc format can be found in the wiki article Doc (computing) which includes:

Specification

Because the DOC file format was a closed specification for many years, inconsistent handling of the format persists and may cause some loss of formatting information when handling the same file with multiple word processing programs. Some specifications for Microsoft Office 97 binary file formats were published in 1997 under a restrictive license, but these specifications were removed from online download in 1999. Specifications of later versions of Microsoft Office binary file formats were not publicly available.

The DOC format specification was available from Microsoft on request since 2006 under restrictive RAND-Z terms until February 2008. Sun Microsystems and OpenOffice.org reverse engineered the file format. On February 15, 2008, Microsoft released a .DOC format specification under the Microsoft Open Specification Promise. However, this specification does not describe all of the features used by DOC format and reverse engineered work remains necessary.

Since 2008 the specification has been updated several times; the last change was made in September 2015.


8. Microsoft’s OOXML "pseudo-standard" format (.docx etc)

See Why you should never use Microsoft’s OOXML pseudo-standard format where Italo Vignoli of The Document Foundation, the organization responsible for developing LibreOffice, talks about "the dirty tricks Microsoft uses to break interoperability and keep users locked into their platform". It includes
... each version of MS Office since 2007 has a different and non standard implementation of OOXML, which is defined as “transitional” because it contains elements which are supposed to be deprecated at standard level, but are still there for compatibility reasons. Although LibreOffice manages to read and write OOXML in a fairly appropriate way, it will be impossible to achieve a perfect interoperability because of these different non standard versions.

In addition to format incompatibilities, Microsoft – with OOXML – has introduced elements which may lead the user into producing a non interoperable document, such as the C-Fonts (for instance, Calibri and Cambria).

See MS Office 2007 OOXML file format (docx, xslx, pptx, ppsx) for a discussion of OOXML and why many consider OOXML is a deliberate attempt by Microsoft to make it almost impossible for other vendors to read or write fully compliant OOXML files. The "standard" is 6,000 pages long and it is estimated a full import or export filter would take 50 to 500 person-years to write.

And after you have done all that work, all it takes is for Microsoft to make another not-part-of-the-standard change or addition to the so called "standard" ... and your filter no longer works. :crazy:

9. By default .docx files do not comply with the OOXML standard

See Complex singularity versus openness for a discussion of the impossible position in which vendors find themselves because Microsoft default .docx files do not comply with the OOXML standard. What hope is there if Microsoft doesn't even bother to use the standard it professes to use?

A default installation of MS Word uses the "transitional" OOXML "standard" which does not comply. It is possible for users to configure MS Word to use the Strict OOXML Standard, which is fully compliant, but very, very, very few do, and even fewer have even heard of it! You might even conclude that it is in Microsoft's commercial interest - it's all about money - for users to use the "transitional" "standard" because it makes exchange between MS Word and other vendors more complex, and users might be forced into buying MS Word.

10. AOO Help has a section About Converting Microsoft Office Documents ...

... which discusses the? some? differences.
About Converting Microsoft Office Documents

OpenOffice can automatically open Microsoft Office 97/2000/XP .doc document files. However, some layout features and formatting attributes in more complex Microsoft Office documents are handled differently in OpenOffice or are unsupported. As a result, converted files require some degree of manual reformatting. The amount of reformatting that can be expected is proportional to the complexity of the structure and formatting of the source document. OpenOffice cannot run Visual Basic Scripts, but can load them for you to analyse.

The most recent versions of OpenOffice can load, but not save, the Microsoft Office Open XML document formats with the extensions .docx, .xlsx, and .pptx. The same versions can also run some Microsoft Excel Visual Basic scripts, if you enable this feature at Tools - Options - Load/Save - VBA Properties.

The following lists provide a general overview of Microsoft Office features that may cause conversion challenges. These will not affect your ability to use or work with the content of the document once the MS file has been saved as a .odt etc file.

Microsoft Word
1. AutoShapes
2. Revision marks
3. OLE objects
4. Certain controls and Microsoft Office form fields
5. Indexes
6. Tables, frames and multi-column formatting
7. Hyperlinks and bookmarks
8. Microsoft WordArt graphics
9. Animated characters/text

Microsoft PowerPoint
1. AutoShapes
2. Tab, line and paragraph spacing
3. Master background graphics
4. Grouped objects
5. Certain multimedia effects

Microsoft Excel
1. AutoShapes
2. OLE objects
3. Certain controls and Microsoft Office form fields
4. Pivot tables
5. New chart types
6. Conditional formatting
7. Some functions/formulae (see below)

One example of differences between Calc and Microsoft Excel is the handling of boolean values. Enter TRUE to cells A1 and A2.
In Calc, the formula =A1+A2 returns the value 2, and the formula =SUM(A1;A2) returns 2.
In Excel, the formula =A1+A2 returns 2, but the formula =SUM(A1,A2) returns 0.

For a detailed overview about converting documents to and from Microsoft Office format, see the OpenOffice Migration Guide.

Opening Microsoft Office Documents That Are Protected With a Password

OpenOffice can open the following Microsoft Office document types that are protected by a password.

Note: If you cannot open an encrypted file, ask someone with MS Word to open it for you, and save it without the password.

Code: Select all   Expand viewCollapse view
Microsoft Office format                                 Supported encryption method

Word 6.0, Word 95                                       Weak XOR encryption

Word 97, Word 2000, Word XP, Word 2003                  Office 97/2000 compatible encryption

Word XP, Word 2003                                      Weak XOR encryption from older Word versions

Excel 2.1, Excel 3.0, Excel 4.0, Excel 5.0, Excel 95    Weak XOR encryption

Excel 97, Excel 2000, Excel XP, Excel 2003              Office 97/2000 compatible encryption

Excel XP, Excel 2003                                    Weak XOR encryption from older Excel versions


Starting from OpenOffice.org 3.2 or StarOffice 9.2, Microsoft Office files that are encrypted by AES128 can be opened. Other encryption methods are not supported.


Disclaimer: Everything in this post is opinion. Please let me know of any errors so they can be corrected.
Last edited by John_Ha on Thu Mar 23, 2017 6:24 pm, edited 1 time in total.
AOO 4.1.4, Windows 7 Home 64 bit

See the Writer Manual, the Writer FAQ, the Writer Tutorials and the up to date Writer guide for information. Click the Help button on a pop-up window for extensive help on that function.
John_Ha
Volunteer
 
Posts: 4640
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Return to Writer

Who is online

Users browsing this forum: No registered users and 1 guest