Conspiracy Theory Word Compatibility?

Discuss the word processor
Post Reply
Absinthe
Posts: 14
Joined: Wed Jan 21, 2009 5:44 pm

Conspiracy Theory Word Compatibility?

Post by Absinthe »

I just purchased a document form a lawyer. It is a 4 page contract that I use for my business and it has my name embedded in it and so forth.

They distribute it in MS Word format as a doc file and as a PDF.

This is a pretty simple document as far as word processors go. It has a fairly consistent font through out and just some weight changes here and there.

However, if I open it in Writer, right off th ebat there are two places in the beginning where there is a large '1' in sort of reverse highlighting. I could simply delete these, since I don't see any purpose they are serving. I am guessing they are an artifact from some sort of mail merge operation or something. They do not show up in word though. Secondly around the 2nd or third page there is a section where the font just randomly changes from what it already was through out into some courier type light weight font. Then after that section it changes back without issue.

The question is, do you think M$ is intentionally putting tweaks into their documents to force this to happen or are these just the result of sloppy process in making such a document by th eoriginal author that simply don't manifest in MS Word because it ignores such stuff as so much flotsum?

If possible I would like to clean up the Word document in 365 (which I have access to) such that it is readily opened by Writer. Ultimately, I will have to get the whole thing into Writer if I have to simply put them side by side and fix as I go. It just seems that one could write a document cleanly (enough) that it would be compatible between them... or am I delusional?
OOo 3.2.0
OOO320m12 (Build9483)
MS Windows Vista
User avatar
Villeroy
Volunteer
Posts: 31269
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Conspiracy Theory Word Compatibility?

Post by Villeroy »

No, I don't believe that your Word document is a pretty simple one even if it looks simple.
No, the developers don't read on this forum. However, only a develper could tell something about your document if you would post it.
No, the problem can not be solved without uploading the document to the bug tracker.
There is no software which is able to open every WinWord doc flawlessly. Sometimes another version of WinWord fails to do so. Try http://abiword.org or http://zamzar or SoftMaker Office. They all open Word docs with limitations and flaws.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Hagar Delest
Moderator
Posts: 32627
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Conspiracy Theory Word Compatibility?

Post by Hagar Delest »

Absinthe wrote:It just seems that one could write a document cleanly (enough) that it would be compatible between them... or am I delusional?
Yes, you are...

The .doc format is a closed proprietary format and it import/export filters were reverse engineered. Even if their specification was published by MS, I doubt it was completely documented.
As for their new format (.docx), see: MS Office 2007 OOXML file format (docx, xslx, pptx, ppsx).

I doubt MS will be ever compatible with something else than MS Office. They will never allow a full implementation of their format (nobody can make a clone of MS Office anyway), nor another standard (like ODF). If they support ODF, why should users buy MS Office when they would have alternative suites for free?

That's vendor lock-in policy...
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
obswob
Posts: 23
Joined: Mon Mar 11, 2013 1:22 am

Re: Conspiracy Theory Word Compatibility?

Post by obswob »

Here's a pretty believable explanation from a former Microsoft employee -

http://www.joelonsoftware.com/items/2008/02/19.html


A few extracts:
"They were designed to be fast on very old computers. For the early versions of Excel for Windows, 1 MB of RAM was a reasonable amount of memory"

"The file format is contorted, where necessary, to make common operations fast."

"There was always an assumption that you could use importers and exporters to exchange documents. In fact Word does have a format designed for easy interchange, called RTF, which has been there almost since the beginning. It’s still 100% supported."

"whenever a programmer on the Word team had to make a decision about how to change the file format, the only thing they cared about was (a) what was fast and (b) what took the fewest lines of code in the Word code base."

"A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. They’re still in the file format for backwards compatibility,"
OpenOffice 3.4.1 on Windows 8
User avatar
Hagar Delest
Moderator
Posts: 32627
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Conspiracy Theory Word Compatibility?

Post by Hagar Delest »

Thanks, this is a really interesting article.

Here are the excerpts I've found relevant:
They were not designed with interoperability in mind. The assumption, and a fairly reasonable one at the time, was that the Word file format only had to be read and written by Word. That means that whenever a programmer on the Word team had to make a decision about how to change the file format, the only thing they cared about was (a) what was fast and (b) what took the fewest lines of code in the Word code base. The idea of things like SGML and HTML—interchangeable, standardized file formats—didn’t really take hold until the Internet made it practical to interchange documents in the first place; this was a decade later than the Office binary formats were first invented. There was always an assumption that you could use importers and exporters to exchange documents.
They have to reflect all the complexity of the applications. Every checkbox, every formatting option, and every feature in Microsoft Office has to be represented in file formats somewhere. That checkbox in Word’s paragraph menu called “Keep With Next” that causes a paragraph to be moved to the next page if necessary so that it’s on the same page as the paragraph after it? That has to be in the file format. And that means if you want to implement a perfect Word clone than can correctly read Word documents, you have to implement that feature. If you’re creating a competitive word processor that has to load Word documents, it may only take you a minute to write the code to load that bit from the file format, but it might take you weeks to change your page layout algorithm to accommodate it. If you don’t, customers will open their Word files in your clone and all the pages will be messed up.

They have to reflect the history of the applications. A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. They’re still in the file format for backwards compatibility, and because it doesn’t cost anything for Microsoft to leave the code around. But if you really want to do a thorough and complete job of parsing and writing these file formats, you have to redo all that work that some intern did at Microsoft 15 years ago. The bottom line is that there are thousands of developer years of work that went into the current versions of Word and Excel, and if you really want to clone those applications completely, you’re going to have to do thousands of years of work. A file format is just a concise summary of all the features an application supports.
It just confirms what we use to say in this forum: a full .doc compatibility requires a MS Office clone application. So this is not something we will ever see I think.
OOXML (.docx) is the same, such file format has to be linked to the application features, that's why it can't be an interoperable file format and the ISO approving it is a real shame.
But the ODF support is another story and if it means that even if less application features would be available in such documents, it at least gives a common ground to exchange files among users. So not supporting ODF completely is the conspiracy (IMHO).
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
Barbossa
Posts: 8
Joined: Tue Apr 23, 2013 8:36 am

Re: Conspiracy Theory Word Compatibility?

Post by Barbossa »

OpenOffice might be a nice free office suite, but to be honest, the import and export filters for Microsoft Office are not the best, loss of formatting and other problems like yours are very common.
The best available filters are these from SoftMaker Office. SoftMaker offers a free office suite, too. Maybe you should download their FreeOffice and try once again to open your file. It should work.

freeoffice.com
Ooo 3.41
Win7
User avatar
keme
Volunteer
Posts: 3699
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Conspiracy Theory Word Compatibility?

Post by keme »

Softmaker FreeOffice seems do do a nice job with MS Office document layout. However, there are issues that users may want to know about before installing. Moderators have requested that I post my findings, so I made a separate thread (click here). There, users with more experience in using Softmaker products can "fill in the blanks" and otherwise comment, as I have only done very basic tests using the free suite.
Apache OO 4.1.12 and LibreOffice 7.5, mostly on Ms Windows 10
John_Ha
Volunteer
Posts: 9583
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: Conspiracy Theory Word Compatibility?

Post by John_Ha »

Absinthe wrote:If possible I would like to clean up the Word document in 365 (which I have access to) such that it is readily opened by Writer.
As you have the PDF, you know what the final document should look like. So:

1 Edit the document in Writer to remove the "strange" things. This should be by far the easiest. View > Show hidden characters will help.

2 Open the document in the Microsoft Word Viewer, and copy and paste it into Writer.

3 Open the document in some other application which reads doc files - like MS Works Word processor or MS WordPad - these are "simple" applications and are likely to strip out the "strange" thing.

4 You could try one of the on-line converter services to convert the PDF to an odt file (or even to a doc or docx file) and see if that works.

5 You could copy the content out of the pdf and paste it into an odt file, but that will need a lot of formatting because each line in a PDF file has an "end of paragraph" so appears as a paragraph in Writer.

See [Tutorial] Differences between Writer and MS Word files for why you should always work in and save files as .odt.
Last edited by John_Ha on Wed Nov 09, 2016 1:10 am, edited 1 time in total.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
keme
Volunteer
Posts: 3699
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Conspiracy Theory Word Compatibility?

Post by keme »

@Absinthe: Instead of trying to convert existing documents, or build one in Writer from scratch, you could ask the source to provide an ODF document. If they create documents professionally, they should have a fairly recent version of MS Office. They have already created the document, so they shouldn't charge much (I'd say nothing at all) for sending you another copy, saved in Open Document Format. It is still not guaranteed to come out perfect when opened in Writer, but it may be worth a try.
Apache OO 4.1.12 and LibreOffice 7.5, mostly on Ms Windows 10
Post Reply