Hidden formatting when loading docx & Fine Reader files

Discuss the word processor
Post Reply
orgopenoffice1
Posts: 18
Joined: Sat Aug 20, 2011 11:34 am

Hidden formatting when loading docx & Fine Reader files

Post by orgopenoffice1 »

When we use FineReader to Optical Character Recognise a document, it comes with a lot of hidden fomatting. The same thing happens if we open a .docx file. This hidden format creates huge problems when we try to edit the file. Does anyone know what the hidden formats are, and how can we remove them.
They break the document up into sections and roll over onto new pages. They are not listed under tables. The sections are shown in the document by light horizontal lines across the page, as shown in the attachment.
HiddenFormatingin-odsFile.jpg
HiddenFormatingin-odsFile.jpg (30.2 KiB) Viewed 649 times
Portable OpenOffice 3.2 xp
User avatar
robleyd
Moderator
Posts: 5509
Joined: Mon Aug 19, 2013 3:47 am
Location: Murbko, Australia

Re: Hidden formatting when loading docx & Fine Reader files

Post by robleyd »

A picture of your document doesn't tell much; can you upload a sample document?

In the interim, perhaps using View -> Navigator (or F5) and View -> Non-Printing characters might give a clue.
Slackware 15 (current) 64 bit
Apache OpenOffice 4.1.16
LibreOffice 26.2.3.2; SlackBuild for 26.2.3 by Eric Hameleers
---------------
I hate this damn computer, I wish that I could sell it.
It won't do what I want it to, Only what I tell it.
erbsenzahl
Volunteer
Posts: 266
Joined: Tue Apr 18, 2017 8:23 am
Location: Germany

Re: Hidden formatting when loading docx & Fine Reader files

Post by erbsenzahl »

orgopenoffice1 wrote: Thu May 07, 2026 10:49 pmDoes anyone know what the hidden formats are, and how can we remove them.
They break the document up into sections
Erase the sections (menu Format | sections...) and their content will be still available.
Or: Cut/paste as unformatted.
robleyd wrote: Fri May 08, 2026 12:58 am A picture of your document doesn't tell much; can you upload a sample document?
+1
orgopenoffice1 wrote: Thu May 07, 2026 10:49 pmFineReader to Optical Character Recognise a document
FineReader is an outstanding but not perfect app. :ucrazy:
LibreOffice current versions 24.x/25.x and OpenOffice 4.1.15
on LinuxMint 21 - 22 Mate, W10-64 pro
orgopenoffice1
Posts: 18
Joined: Sat Aug 20, 2011 11:34 am

Re: Hidden formatting when loading docx & Fine Reader files

Post by orgopenoffice1 »

odtProblem.odt
(32.35 KiB) Downloaded 7 times
The non-printing characters appear to be mainly carriage returns. They don't seem to be related to the frames or whatever is forming the layout.
Attached is an example odtProblem.odt where if we add text to a box it blows out unpredictable ways.
Portable OpenOffice 3.2 xp
orgopenoffice1
Posts: 18
Joined: Sat Aug 20, 2011 11:34 am

Re: Hidden formatting when loading docx & Fine Reader files

Post by orgopenoffice1 »

Great. Erase the sections (menu Format | sections...) removed some of the problem areas, but there are still boxes remaining.
Portable OpenOffice 3.2 xp
User avatar
Hagar Delest
Moderator
Posts: 33633
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: Hidden formatting when loading docx & Fine Reader files

Post by Hagar Delest »

Well, that's the "problem" with OCR, it cannot recreate the layout perfectly. Sometimes, it's even quicker to just copy and paste the text and redo the formatting.
When you say that the same thing happen with a .docx, I guess that the docx is the output of the OCR, showing that the result is no better. It is not a file format issue.

Please add [Solved] at the beginning of the title in your first post (top of the topic) with the button if your issue has been fixed. Even if it's not really solved, there is no real fix for this kind of issue I think.
LibreOffice 25.2 on Linux Mint Debian Edition (LMDE 7 Gigi) and 25.2 portable on Windows 11.
orgopenoffice1
Posts: 18
Joined: Sat Aug 20, 2011 11:34 am

Re: Hidden formatting when loading docx & Fine Reader files

Post by orgopenoffice1 »

The problem is formatting. The boxes and divisions in the file are due to formatting.
Code in openoffice is dividing the file up, what we need to know is how we can control these divisions.
Doing editing with no formatting is not a solution.
Word processors have formatting for a necessary purpose, but we should have control over it.
Portable OpenOffice 3.2 xp
User avatar
MrProgrammer
Moderator
Posts: 5432
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Hidden formatting when loading docx & Fine Reader files

Post by MrProgrammer »

orgopenoffice1 wrote: Thu May 07, 2026 10:49 pm When we use FineReader to Optical Character Recognise a document, it comes with a lot of hidden formatting.
Code in OpenOffice is dividing the file up, what we need to know is how we can control these divisions.
I would say that FineReader is generating what you have called hidden formatting in the DOCX file and OpenOffice is just showing you how FineReader formatted it. The problem lies with FineReader. Since FineReader generates the proprietary Microsnot DOCX format you could check to see if Word does any better. If it does, you should use the software which works best for you. Some of the difficulty may be that DOCX is an incompletely documented specification, and FineReader's developers may not be generating the correct formatting in the file.

There are hundreds and hundreds of discussions — I'm not going to search for you — about difficulties involved in converting documents intended only for display into an editable word processor document. Many of these documents are PDF, though some are just pieces of paper processed by OCR. The word processors vary. In this forum you'd find discussions about Writer, but I suspect others who use Microsnot Word, Apple Pages, Google Docs, Collabora Online, etc. have similar problems. The OCR software varies too. A common theme in the topics in this forum is that the fastest procedure and best results are obtained by discarding all formatting from the PDF or OCR document, then formatting a new Writer document (file type ODT) using styles. The job will take much longer if you don't know how to use styles. I realize that this is not the answer you would like, however you can check for yourself what others have said.

It's possible that in the coming decade techniques from artificial intelligence research will allow PDF converters or OCR software to produce better results. I doubt if I have anything further to contribute to this topic.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
orgopenoffice1
Posts: 18
Joined: Sat Aug 20, 2011 11:34 am

Re: Hidden formatting when loading docx & Fine Reader files

Post by orgopenoffice1 »

With regard to the difficulties, as the great man would have said, "The difficulties are able to look after themselves, what we need is solutions".
We will examine the internal xml files.
Portable OpenOffice 3.2 xp
Post Reply