[Solved] RTF has characters that don't show up properly
[Solved] RTF has characters that don't show up properly
I downloaded an RTF file from a group I'm a member of but a number of the text characters are showing up as squares when context shows they should be quotation marks, apostrophes and others that I'm less sure of and I want to know if anyone knows a way to either make open office display it correctly or to convert the file with some free tool or other?
Last edited by MrProgrammer on Tue Feb 16, 2021 12:03 am, edited 1 time in total.
Reason: Tagged ✓ [Solved]; probable bad RTF document; circumvention suggested
Reason: Tagged ✓ [Solved]; probable bad RTF document; circumvention suggested
OpenOffice 4.1.7 Windows 10
- Hagar Delest
- Moderator
- Posts: 32665
- Joined: Sun Oct 07, 2007 9:07 pm
- Location: France
Re: RTF has a number of characters that don't show up proper
Hi and welcome to the forum!
Seems to be a font issue. What is the font displayed? and is it installed on your machine? If not, try to install that font.
RTF is a poor format, poorly supported by AOO. You should switch to ODF (.odt for text).
Seems to be a font issue. What is the font displayed? and is it installed on your machine? If not, try to install that font.
RTF is a poor format, poorly supported by AOO. You should switch to ODF (.odt for text).
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
Re: RTF has a number of characters that don't show up proper
You could try opening it in Wordpad and seeing if it looks the same... and saving it from there in text or Unicode format if its fine and then opening it in OO.
Edit: A quick search finds this which may or may not be relevant:
https://bugs.documentfoundation.org/sho ... i?id=67594
Edit: A quick search finds this which may or may not be relevant:
https://bugs.documentfoundation.org/sho ... i?id=67594
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: RTF has a number of characters that don't show up proper
Remember that the font showing in the Writer font drop-down selection box is the font the document is asking for.
If the font being asked for is not installed on the PC, Windows (or other operating system) will silently substitute a different font which is available, and use that substitute font to display the text.
The TestFonts add-on is invaluable for finding missing fonts which the document is asking for, but which are not installed on the PC.
You can see which fonts are installed on a Windows 10 PC by Start > Settings > Personalisation > Fonts > Available fonts or by clicking C:\Windows\Fonts. Other OS should be similar.
If the font being asked for is not installed on the PC, Windows (or other operating system) will silently substitute a different font which is available, and use that substitute font to display the text.
The TestFonts add-on is invaluable for finding missing fonts which the document is asking for, but which are not installed on the PC.
You can see which fonts are installed on a Windows 10 PC by Start > Settings > Personalisation > Fonts > Available fonts or by clicking C:\Windows\Fonts. Other OS should be similar.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Re: RTF has a number of characters that don't show up proper
If the file were one of my creation I would have used an open office or word document format but as I said the file was created by anotherHagar Delest wrote:Hi and welcome to the forum!
Seems to be a font issue. What is the font displayed? and is it installed on your machine? If not, try to install that font.
RTF is a poor format, poorly supported by AOO. You should switch to ODF (.odt for text).
that said I just selected it all and changed the font to times new roman and that didn't fix anything so I'm back to my assumption it's an encoding issue rather than a font issue
OpenOffice 4.1.7 Windows 10
Re: RTF has a number of characters that don't show up proper
Wordpad is even worse in wordpad those supposed to be quote marks are just blank spacesJeJe wrote:You could try opening it in Wordpad and seeing if it looks the same... and saving it from there in text or Unicode format if its fine and then opening it in OO.
Edit: A quick search finds this which may or may not be relevant:
https://bugs.documentfoundation.org/sho ... i?id=67594
will go check out the link you suggested either way thank you for trying
OpenOffice 4.1.7 Windows 10
Re: RTF has a number of characters that don't show up proper
If you want more help please upload a small file showing the problem so that it can be analysed.
Press POSTREPLY and click the Upload attachment tab below where you type (128 kB max); or use a file share site, Dropbox or Google Drive for a larger file.
Press POSTREPLY and click the Upload attachment tab below where you type (128 kB max); or use a file share site, Dropbox or Google Drive for a larger file.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Re: RTF has a number of characters that don't show up proper
Seconded. The beauty of an rtf file is its plain text so you can open the file in Notepad as a text file and examine it in that way.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: RTF has a number of characters that don't show up proper
Might it be that the file is using a Windows codepage, and unless one gets that right the output is garbled? Just a thought.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: RTF has a number of characters that don't show up proper
https://we.tl/t-qC3gQqdIyb
here's the original file
here's the original file
OpenOffice 4.1.7 Windows 10
Re: RTF has a number of characters that don't show up proper
Opened for me with no problem; Try the attached .odt version
- Attachments
-
- Chilord - Poker Knight - BtVS - 06_02_2021.odt
- (189.45 KiB) Downloaded 121 times
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: RTF has a number of characters that don't show up proper
The file opened with no problem for me in LibreOffice but it appears that LO thinks that it needs the Calabri font which I do not have on my machine.
The TestFonts app says that LO has substituted Liberation Serif .
I have the feeling that Liberation Serif is not a standard Windows font.
The TestFonts app says that LO has substituted Liberation Serif .
I have the feeling that Liberation Serif is not a standard Windows font.
LibreOffice 7.3.7. 2; Ubuntu 22.04
- Hagar Delest
- Moderator
- Posts: 32665
- Joined: Sun Oct 07, 2007 9:07 pm
- Location: France
Re: RTF has a number of characters that don't show up proper
+1.jrkrideau wrote:The file opened with no problem for me in LibreOffice but it appears that LO thinks that it needs the Calabri font which I do not have on my machine.
Calibri is the standard font in MS Office (a rather nice one IMHO).
I can't see why it would not display fine on a Windows machine, except perhaps if the font is provided with MS Office rather than Windows itself.
Substitution with several standard fonts like Liberation or DejaVu is ok on my Xubuntu machine.
Weird that it doesn't work on your machine. If you select the text and change the font to other fonts, you still get the wrong characters?
Note: saving an excerpt with a single page with those characters would have been enough, no need to upload the whole thing (especially when there is no issue on our end, difficult to spot what could be wrong).
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
Re: RTF has a number of characters that don't show up proper
Doesn't work for me... in Word, OO, LO or Wordpad all giving various replacements for \u147? which is meant to be “
If you open the file in Notepad and do a find replace all for \u147? replacing with “ that fixes the file for that character. So maybe it will work for the other problem characters too.
If you open the file in Notepad and do a find replace all for \u147? replacing with “ that fixes the file for that character. So maybe it will work for the other problem characters too.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: RTF has a number of characters that don't show up proper
Here's a different fix suggestion. i put “ into a Wordpad file and saved it and it was represented by \ldblquote
and replacing \u147? with \ldblquote works. So you could do that with all the problem characters.
Edit: Or go back to the person who sent you the file and say it doesn't work with all these word processors have you got a different version...
and replacing \u147? with \ldblquote works. So you could do that with all the problem characters.
Edit: Or go back to the person who sent you the file and say it doesn't work with all these word processors have you got a different version...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
- MrProgrammer
- Moderator
- Posts: 4908
- Joined: Fri Jun 04, 2010 7:57 pm
- Location: Wisconsin, USA
Re: RTF has a number of characters that don't show up proper
Yes. The first line of the RTF file says so:RoryOF wrote:Might it be that the file is using a Windows codepage, and unless one gets that right the output is garbled?
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang2057{\fonttbl{\f0\fnil\fcharset0 Calibri;}}The attached file from the OP uses the old Windows-1252 code page. The article shows in the code page table that the curly quotes are 0147 and 0148. So one needs operating system (or application) support for that code page and a font which can display those characters.
Wkilpedia wrote:The first version of the codepage 1252 used in Microsoft Windows 1.0 did not have positions D7 and F7 defined. All the characters in the ranges 80–9F were undefined too.
The second version, used in Microsoft Windows 2.0, positions D7, F7, 91, and 92 had been defined.
The third version, used since Microsoft Windows 3.1, had all the present-day positions defined, except euro sign and Z with caron character pair.
The final version listed above debuted in Microsoft Windows 98 and was ported to older versions of Windows with the euro symbol update.
Since Rich Text Format (RTF) is a Micosnot-designed format and Wordpad is a Microsnot-designed program, this suggests to me that the content of the document is bad. Note that decimal 147 and 148 are hexadecimal 93 and 94 and not present in versions 1 and 2 of the code tables. Does "rtf1" mean that version 1 is to be used? I am not going to spend the time to research that.goku90504 wrote:Wordpad is even worse in wordpad those supposed to be quote marks are just blank spaces
If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the subject field. Select the green checkmark icon at the same time.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
Re: RTF has a number of characters that don't show up proper
it opens faster but I'm still getting the squares and I've installed the liberation fonts and updated to the newest version of openofficeRoryOF wrote:Opened for me with no problem; Try the attached .odt version
OpenOffice 4.1.7 Windows 10
Re: RTF has a number of characters that don't show up proper
The OpenOffice I used has Calibri installed; when it opened the rtf file I saw no boxes, so posted the .odt., assuming that double quotes had deliberately been omitted around speech by the formatter. Now I have more time I re-examined the file and note that Double quotes have been replaced with very thin spaces. I have been able to replace these with appropriate curly quotes. single apostrophes seem to be straight; I left these alone. I observed at least one location where an apostrophe is missing. I did not correct this.
This Double curly apostrophe file is attached
This Double curly apostrophe file is attached
- Attachments
-
- Chilord - Poker Knight - BtVS - 06_02_2021.odt
- (190.23 KiB) Downloaded 122 times
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: RTF has a number of characters that don't show up proper
IIRC, not all characters are included in any given font so this may the cause.
If I am right then if a character is missing from the font asked for, and is missing from the substitute font, then that character is displayed as a box.
If I am right then if a character is missing from the font asked for, and is missing from the substitute font, then that character is displayed as a box.
LO 6.4.4.2, Windows 10 Home 64 bit
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.
Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
Re: RTF has a number of characters that don't show up proper
I looked again, later, and found that about 1/3 of the way into the file "I'm" started to render as "Im". I have left correction of these as an exercise for the student [Read: I'm going to have my lunch!]
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: RTF has a number of characters that don't show up proper
The characters look to be all special ones such as ldblquote
'http://latex2rtf.sourceforge.net/rtfspe ... pecialchar
or the ellipsis which Wordpad uses "\'85" for
The attached document has a button which fires the below macro which:
prompts you for a file
makes a copy of the file with the extension .txt in the same folder
opens that new file
replace the \u terms for ldblquote etc
saves and closes that file with the changes
renames the file the same as the original file but with "New" added before .rtf at the end
All done - it opens that in a new window
CAUTION IT DOES CREATE AND RENAME FILES AS DESCRIBED ABOVE
USE AT OWN RISK
Edit: if more search replace terms are needed you can add them to the array in the macro as described there
'http://latex2rtf.sourceforge.net/rtfspe ... pecialchar
or the ellipsis which Wordpad uses "\'85" for
The attached document has a button which fires the below macro which:
prompts you for a file
makes a copy of the file with the extension .txt in the same folder
opens that new file
replace the \u terms for ldblquote etc
saves and closes that file with the changes
renames the file the same as the original file but with "New" added before .rtf at the end
All done - it opens that in a new window
CAUTION IT DOES CREATE AND RENAME FILES AS DESCRIBED ABOVE
USE AT OWN RISK
Code: Select all
REM ***** BASIC *****
Sub Main
doc1 = thiscomponent
ret= GetAFileName()
newname =ret
mid(newname,len(newname)-2,3)="txt"
FileCopy ret, newname
oDoc=stardesktop.Loadcomponentfromurl(converttourl(newname), "_blank", 0, array())
oReplace = odoc.createReplaceDescriptor()
oReplace.SearchCaseSensitive = True
'http://latex2rtf.sourceforge.net/rtfspec_7.html#rtfspec_specialchar
'ADD EXTRA TERMS TO THE FOLLOWING ARRAY IN THE FORM SEARCHWORK, REPLACE WORD EG "\u145?","\lquote "
reps = array ("\u145?","\lquote ","\u146?","\rquote ","\u147?","\ldblquote ","\u148?","\rdblquote ","\u149?","\bullet ","\u150?","\endash ","\u151?","\emdash ","\u133?","\'85")
for i =0 to ubound(reps) step 2
c=c+1
oReplace.SearchString = reps(i)
oReplace.ReplaceString = reps(i+1)
odoc.ReplaceAll(oReplace)
next
dim rtfname as string
odoc.store
odoc.close(false)
rtfname = newname
mid(rtfname ,len(ret)-3,4)="New."
rtfname = rtfname & "rtf"
Name newname as rtfname
wait 500
doevents
stardesktop.Loadcomponentfromurl(converttourl(rtfname), "_blank", 0, array())
End Sub
- Attachments
-
- convert.odt
- (12.27 KiB) Downloaded 110 times
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: RTF has a number of characters that don't show up proper
There's a problem with the way Unicode characters have been inserted in the document. If the document is opened with a text editor, the Unicode characters are inserted using "\uN?" where "N" should be the Unicode value of the character. However, in this document, the Ansi value of the character was used, not the Unicode value. I found 5 characters that needed to be changed:
\u133? to \u8230?
\u145? to \u8216?
\u146? to \u8217?
\u147? to \u8220?
\u148? to \u8221?
After making these changes, the correct characters started to show up in AOO, LO and Abiword on Linux Mint. They also showed up in LO on Debian and Ubuntu Mate.
\u133? to \u8230?
\u145? to \u8216?
\u146? to \u8217?
\u147? to \u8220?
\u148? to \u8221?
After making these changes, the correct characters started to show up in AOO, LO and Abiword on Linux Mint. They also showed up in LO on Debian and Ubuntu Mate.
AOO 4.1.14 on Ubuntu MATE 22.04
Re: RTF has a number of characters that don't show up proper
Bill - the last 4 of those all have special rtf words which can be used, as in my document above.
Using your replacements would just involve replacing the line
with your terms, i.e.
But as there's a consensus among word processors that the rtf is faulty... it might be better for the OP to tell that to whoever sent them the file...
Using your replacements would just involve replacing the line
Code: Select all
reps = array ("\u145?","\lquote ","\u146?","\rquote ","\u147?","\ldblquote ","\u148?","\rdblquote ","\u149?","\bullet ","\u150?","\endash ","\u151?","\emdash ","\u133?","\'85")
Code: Select all
reps = array ("\u133?","\u8230?","\u145?","\u8216?","\u146?","\u8217?","\u147?","\u8220?","\u148?", "\u8221?")
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)