Hidden formatting when loading docx & Fine Reader files
-
orgopenoffice1
- Posts: 18
- Joined: Sat Aug 20, 2011 11:34 am
Hidden formatting when loading docx & Fine Reader files
When we use FineReader to Optical Character Recognise a document, it comes with a lot of hidden fomatting. The same thing happens if we open a .docx file. This hidden format creates huge problems when we try to edit the file. Does anyone know what the hidden formats are, and how can we remove them.
They break the document up into sections and roll over onto new pages. They are not listed under tables. The sections are shown in the document by light horizontal lines across the page, as shown in the attachment.
They break the document up into sections and roll over onto new pages. They are not listed under tables. The sections are shown in the document by light horizontal lines across the page, as shown in the attachment.
Portable OpenOffice 3.2 xp
Re: Hidden formatting when loading docx & Fine Reader files
A picture of your document doesn't tell much; can you upload a sample document?
In the interim, perhaps using View -> Navigator (or F5) and View -> Non-Printing characters might give a clue.
In the interim, perhaps using View -> Navigator (or F5) and View -> Non-Printing characters might give a clue.
Slackware 15 (current) 64 bit
Apache OpenOffice 4.1.16
LibreOffice 26.2.3.2; SlackBuild for 26.2.3 by Eric Hameleers
---------------
I hate this damn computer, I wish that I could sell it.
It won't do what I want it to, Only what I tell it.
Apache OpenOffice 4.1.16
LibreOffice 26.2.3.2; SlackBuild for 26.2.3 by Eric Hameleers
---------------
I hate this damn computer, I wish that I could sell it.
It won't do what I want it to, Only what I tell it.
-
erbsenzahl
- Volunteer
- Posts: 266
- Joined: Tue Apr 18, 2017 8:23 am
- Location: Germany
Re: Hidden formatting when loading docx & Fine Reader files
Erase the sections (menu Format | sections...) and their content will be still available.orgopenoffice1 wrote: ↑Thu May 07, 2026 10:49 pmDoes anyone know what the hidden formats are, and how can we remove them.
They break the document up into sections
Or: Cut/paste as unformatted.
+1
FineReader is an outstanding but not perfect app.orgopenoffice1 wrote: ↑Thu May 07, 2026 10:49 pmFineReader to Optical Character Recognise a document
LibreOffice current versions 24.x/25.x and OpenOffice 4.1.15
on LinuxMint 21 - 22 Mate, W10-64 pro
on LinuxMint 21 - 22 Mate, W10-64 pro
-
orgopenoffice1
- Posts: 18
- Joined: Sat Aug 20, 2011 11:34 am
Re: Hidden formatting when loading docx & Fine Reader files
The non-printing characters appear to be mainly carriage returns. They don't seem to be related to the frames or whatever is forming the layout.
Attached is an example odtProblem.odt where if we add text to a box it blows out unpredictable ways.
Attached is an example odtProblem.odt where if we add text to a box it blows out unpredictable ways.
Portable OpenOffice 3.2 xp
-
orgopenoffice1
- Posts: 18
- Joined: Sat Aug 20, 2011 11:34 am
Re: Hidden formatting when loading docx & Fine Reader files
Great. Erase the sections (menu Format | sections...) removed some of the problem areas, but there are still boxes remaining.
Portable OpenOffice 3.2 xp
- Hagar Delest
- Moderator
- Posts: 33633
- Joined: Sun Oct 07, 2007 9:07 pm
- Location: France
Re: Hidden formatting when loading docx & Fine Reader files
Well, that's the "problem" with OCR, it cannot recreate the layout perfectly. Sometimes, it's even quicker to just copy and paste the text and redo the formatting.
When you say that the same thing happen with a .docx, I guess that the docx is the output of the OCR, showing that the result is no better. It is not a file format issue.
Please add [Solved] at the beginning of the title in your first post (top of the topic) with the ✎ button if your issue has been fixed. Even if it's not really solved, there is no real fix for this kind of issue I think.
When you say that the same thing happen with a .docx, I guess that the docx is the output of the OCR, showing that the result is no better. It is not a file format issue.
Please add [Solved] at the beginning of the title in your first post (top of the topic) with the ✎ button if your issue has been fixed. Even if it's not really solved, there is no real fix for this kind of issue I think.
LibreOffice 25.2 on Linux Mint Debian Edition (LMDE 7 Gigi) and 25.2 portable on Windows 11.
-
orgopenoffice1
- Posts: 18
- Joined: Sat Aug 20, 2011 11:34 am
Re: Hidden formatting when loading docx & Fine Reader files
The problem is formatting. The boxes and divisions in the file are due to formatting.
Code in openoffice is dividing the file up, what we need to know is how we can control these divisions.
Doing editing with no formatting is not a solution.
Word processors have formatting for a necessary purpose, but we should have control over it.
Code in openoffice is dividing the file up, what we need to know is how we can control these divisions.
Doing editing with no formatting is not a solution.
Word processors have formatting for a necessary purpose, but we should have control over it.
Portable OpenOffice 3.2 xp
- MrProgrammer
- Moderator
- Posts: 5431
- Joined: Fri Jun 04, 2010 7:57 pm
- Location: Wisconsin, USA
Re: Hidden formatting when loading docx & Fine Reader files
I would say that FineReader is generating what you have called hidden formatting in the DOCX file and OpenOffice is just showing you how FineReader formatted it. The problem lies with FineReader. Since FineReader generates the proprietary Microsnot DOCX format you could check to see if Word does any better. If it does, you should use the software which works best for you. Some of the difficulty may be that DOCX is an incompletely documented specification, and FineReader's developers may not be generating the correct formatting in the file.orgopenoffice1 wrote: ↑Thu May 07, 2026 10:49 pm When we use FineReader to Optical Character Recognise a document, it comes with a lot of hidden formatting.
Code in OpenOffice is dividing the file up, what we need to know is how we can control these divisions.
There are hundreds and hundreds of discussions — I'm not going to search for you — about difficulties involved in converting documents intended only for display into an editable word processor document. Many of these documents are PDF, though some are just pieces of paper processed by OCR. The word processors vary. In this forum you'd find discussions about Writer, but I suspect others who use Microsnot Word, Apple Pages, Google Docs, Collabora Online, etc. have similar problems. The OCR software varies too. A common theme in the topics in this forum is that the fastest procedure and best results are obtained by discarding all formatting from the PDF or OCR document, then formatting a new Writer document (file type ODT) using styles. The job will take much longer if you don't know how to use styles. I realize that this is not the answer you would like, however you can check for yourself what others have said.
It's possible that in the coming decade techniques from artificial intelligence research will allow PDF converters or OCR software to produce better results. I doubt if I have anything further to contribute to this topic.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
-
orgopenoffice1
- Posts: 18
- Joined: Sat Aug 20, 2011 11:34 am
Re: Hidden formatting when loading docx & Fine Reader files
With regard to the difficulties, as the great man would have said, "The difficulties are able to look after themselves, what we need is solutions".
We will examine the internal xml files.
We will examine the internal xml files.
Portable OpenOffice 3.2 xp