[Solved] RTF has characters that don't show up properly

Discuss the word processor
Post Reply
goku90504
Posts: 6
Joined: Sat Feb 06, 2021 8:59 am

[Solved] RTF has characters that don't show up properly

Post by goku90504 »

I downloaded an RTF file from a group I'm a member of but a number of the text characters are showing up as squares when context shows they should be quotation marks, apostrophes and others that I'm less sure of and I want to know if anyone knows a way to either make open office display it correctly or to convert the file with some free tool or other?
Last edited by MrProgrammer on Tue Feb 16, 2021 12:03 am, edited 1 time in total.
Reason: Tagged ✓ [Solved]; probable bad RTF document; circumvention suggested
OpenOffice 4.1.7 Windows 10
User avatar
Hagar Delest
Moderator
Posts: 32665
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: RTF has a number of characters that don't show up proper

Post by Hagar Delest »

Hi and welcome to the forum!

Seems to be a font issue. What is the font displayed? and is it installed on your machine? If not, try to install that font.
RTF is a poor format, poorly supported by AOO. You should switch to ODF (.odt for text).
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: RTF has a number of characters that don't show up proper

Post by JeJe »

You could try opening it in Wordpad and seeing if it looks the same... and saving it from there in text or Unicode format if its fine and then opening it in OO.

Edit: A quick search finds this which may or may not be relevant:

https://bugs.documentfoundation.org/sho ... i?id=67594
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: RTF has a number of characters that don't show up proper

Post by John_Ha »

Remember that the font showing in the Writer font drop-down selection box is the font the document is asking for.

If the font being asked for is not installed on the PC, Windows (or other operating system) will silently substitute a different font which is available, and use that substitute font to display the text.

The TestFonts add-on is invaluable for finding missing fonts which the document is asking for, but which are not installed on the PC.

You can see which fonts are installed on a Windows 10 PC by Start > Settings > Personalisation > Fonts > Available fonts or by clicking C:\Windows\Fonts. Other OS should be similar.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
goku90504
Posts: 6
Joined: Sat Feb 06, 2021 8:59 am

Re: RTF has a number of characters that don't show up proper

Post by goku90504 »

Hagar Delest wrote:Hi and welcome to the forum!

Seems to be a font issue. What is the font displayed? and is it installed on your machine? If not, try to install that font.
RTF is a poor format, poorly supported by AOO. You should switch to ODF (.odt for text).
If the file were one of my creation I would have used an open office or word document format but as I said the file was created by another

that said I just selected it all and changed the font to times new roman and that didn't fix anything so I'm back to my assumption it's an encoding issue rather than a font issue
OpenOffice 4.1.7 Windows 10
goku90504
Posts: 6
Joined: Sat Feb 06, 2021 8:59 am

Re: RTF has a number of characters that don't show up proper

Post by goku90504 »

JeJe wrote:You could try opening it in Wordpad and seeing if it looks the same... and saving it from there in text or Unicode format if its fine and then opening it in OO.

Edit: A quick search finds this which may or may not be relevant:

https://bugs.documentfoundation.org/sho ... i?id=67594
Wordpad is even worse in wordpad those supposed to be quote marks are just blank spaces

will go check out the link you suggested either way thank you for trying
OpenOffice 4.1.7 Windows 10
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: RTF has a number of characters that don't show up proper

Post by John_Ha »

If you want more help please upload a small file showing the problem so that it can be analysed.

Press POSTREPLY and click the Upload attachment tab below where you type (128 kB max); or use a file share site, Dropbox or Google Drive for a larger file.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: RTF has a number of characters that don't show up proper

Post by JeJe »

Seconded. The beauty of an rtf file is its plain text so you can open the file in Notepad as a text file and examine it in that way.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: RTF has a number of characters that don't show up proper

Post by RoryOF »

Might it be that the file is using a Windows codepage, and unless one gets that right the output is garbled? Just a thought.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
goku90504
Posts: 6
Joined: Sat Feb 06, 2021 8:59 am

Re: RTF has a number of characters that don't show up proper

Post by goku90504 »

https://we.tl/t-qC3gQqdIyb
here's the original file
OpenOffice 4.1.7 Windows 10
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: RTF has a number of characters that don't show up proper

Post by RoryOF »

Opened for me with no problem; Try the attached .odt version
Attachments
Chilord - Poker Knight - BtVS - 06_02_2021.odt
(189.45 KiB) Downloaded 121 times
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
jrkrideau
Volunteer
Posts: 3816
Joined: Sun Dec 30, 2007 10:00 pm
Location: Kingston Ontario Canada

Re: RTF has a number of characters that don't show up proper

Post by jrkrideau »

The file opened with no problem for me in LibreOffice but it appears that LO thinks that it needs the Calabri font which I do not have on my machine.

The TestFonts app says that LO has substituted Liberation Serif .

I have the feeling that Liberation Serif  is not a standard Windows font.
LibreOffice 7.3.7. 2; Ubuntu 22.04
User avatar
Hagar Delest
Moderator
Posts: 32665
Joined: Sun Oct 07, 2007 9:07 pm
Location: France

Re: RTF has a number of characters that don't show up proper

Post by Hagar Delest »

jrkrideau wrote:The file opened with no problem for me in LibreOffice but it appears that LO thinks that it needs the Calabri font which I do not have on my machine.
+1.
Calibri is the standard font in MS Office (a rather nice one IMHO).
I can't see why it would not display fine on a Windows machine, except perhaps if the font is provided with MS Office rather than Windows itself.
Substitution with several standard fonts like Liberation or DejaVu is ok on my Xubuntu machine.

Weird that it doesn't work on your machine. If you select the text and change the font to other fonts, you still get the wrong characters?
Note: saving an excerpt with a single page with those characters would have been enough, no need to upload the whole thing (especially when there is no issue on our end, difficult to spot what could be wrong).
LibreOffice 7.6.2.1 on Xubuntu 23.10 and 7.6.4.1 portable on Windows 10
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: RTF has a number of characters that don't show up proper

Post by JeJe »

Doesn't work for me... in Word, OO, LO or Wordpad all giving various replacements for \u147? which is meant to be “

If you open the file in Notepad and do a find replace all for \u147? replacing with “ that fixes the file for that character. So maybe it will work for the other problem characters too.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: RTF has a number of characters that don't show up proper

Post by JeJe »

Here's a different fix suggestion. i put “ into a Wordpad file and saved it and it was represented by \ldblquote

and replacing \u147? with \ldblquote works. So you could do that with all the problem characters.

Edit: Or go back to the person who sent you the file and say it doesn't work with all these word processors have you got a different version...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
MrProgrammer
Moderator
Posts: 4908
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: RTF has a number of characters that don't show up proper

Post by MrProgrammer »

RoryOF wrote:Might it be that the file is using a Windows codepage, and unless one gets that right the output is garbled?
Yes. The first line of the RTF file says so:
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang2057{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
The attached file from the OP uses the old Windows-1252 code page. The article shows in the code page table that the curly quotes are 0147 and 0148. So one needs operating system (or application) support for that code page and a font which can display those characters.
Wkilpedia wrote:The first version of the codepage 1252 used in Microsoft Windows 1.0 did not have positions D7 and F7 defined. All the characters in the ranges 80–9F were undefined too.
The second version, used in Microsoft Windows 2.0, positions D7, F7, 91, and 92 had been defined.
The third version, used since Microsoft Windows 3.1, had all the present-day positions defined, except euro sign and Z with caron character pair.
The final version listed above debuted in Microsoft Windows 98 and was ported to older versions of Windows with the euro symbol update.
goku90504 wrote:Wordpad is even worse in wordpad those supposed to be quote marks are just blank spaces
Since Rich Text Format (RTF) is a Micosnot-designed format and Wordpad is a Microsnot-designed program, this suggests to me that the content of the document is bad. Note that decimal 147 and 148 are hexadecimal 93 and 94 and not present in versions 1 and 2 of the code tables. Does "rtf1" mean that version 1 is to be used? I am not going to spend the time to research that.

If this solved your problem please go to your first post use the Edit button and add [Solved] to the start of the subject field. Select the green checkmark icon at the same time.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
goku90504
Posts: 6
Joined: Sat Feb 06, 2021 8:59 am

Re: RTF has a number of characters that don't show up proper

Post by goku90504 »

RoryOF wrote:Opened for me with no problem; Try the attached .odt version
it opens faster but I'm still getting the squares and I've installed the liberation fonts and updated to the newest version of openoffice
Attachments
Squares.png
OpenOffice 4.1.7 Windows 10
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: RTF has a number of characters that don't show up proper

Post by RoryOF »

The OpenOffice I used has Calibri installed; when it opened the rtf file I saw no boxes, so posted the .odt., assuming that double quotes had deliberately been omitted around speech by the formatter. Now I have more time I re-examined the file and note that Double quotes have been replaced with very thin spaces. I have been able to replace these with appropriate curly quotes. single apostrophes seem to be straight; I left these alone. I observed at least one location where an apostrophe is missing. I did not correct this.

This Double curly apostrophe file is attached
Attachments
Chilord - Poker Knight - BtVS - 06_02_2021.odt
(190.23 KiB) Downloaded 122 times
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
John_Ha
Volunteer
Posts: 9584
Joined: Fri Sep 18, 2009 5:51 pm
Location: UK

Re: RTF has a number of characters that don't show up proper

Post by John_Ha »

IIRC, not all characters are included in any given font so this may the cause.

If I am right then if a character is missing from the font asked for, and is missing from the substitute font, then that character is displayed as a box.
LO 6.4.4.2, Windows 10 Home 64 bit

See the Writer Guide, the Writer FAQ, the Writer Tutorials and Writer for students.

Remember: Always save your Writer files as .odt files. - see here for the many reasons why.
User avatar
RoryOF
Moderator
Posts: 34618
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: RTF has a number of characters that don't show up proper

Post by RoryOF »

I looked again, later, and found that about 1/3 of the way into the file "I'm" started to render as "Im". I have left correction of these as an exercise for the student [Read: I'm going to have my lunch!]
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: RTF has a number of characters that don't show up proper

Post by JeJe »

The characters look to be all special ones such as ldblquote
'http://latex2rtf.sourceforge.net/rtfspe ... pecialchar
or the ellipsis which Wordpad uses "\'85" for

The attached document has a button which fires the below macro which:

prompts you for a file
makes a copy of the file with the extension .txt in the same folder
opens that new file
replace the \u terms for ldblquote etc
saves and closes that file with the changes
renames the file the same as the original file but with "New" added before .rtf at the end
All done - it opens that in a new window

CAUTION IT DOES CREATE AND RENAME FILES AS DESCRIBED ABOVE
USE AT OWN RISK

Code: Select all

	REM  *****  BASIC  *****

Sub Main
	doc1 = thiscomponent
	ret= GetAFileName()
	newname =ret
	mid(newname,len(newname)-2,3)="txt"
	FileCopy ret, newname

	oDoc=stardesktop.Loadcomponentfromurl(converttourl(newname), "_blank", 0, array())


	oReplace = odoc.createReplaceDescriptor()
	oReplace.SearchCaseSensitive = True
	'http://latex2rtf.sourceforge.net/rtfspec_7.html#rtfspec_specialchar
	
	'ADD EXTRA TERMS TO THE FOLLOWING ARRAY IN THE FORM SEARCHWORK, REPLACE WORD EG "\u145?","\lquote "
	reps = array ("\u145?","\lquote ","\u146?","\rquote ","\u147?","\ldblquote ","\u148?","\rdblquote ","\u149?","\bullet ","\u150?","\endash ","\u151?","\emdash ","\u133?","\'85")
	for i =0 to ubound(reps) step 2


		c=c+1
		oReplace.SearchString = reps(i)
		oReplace.ReplaceString = reps(i+1)
		odoc.ReplaceAll(oReplace)
	next
	
	dim rtfname as string
	odoc.store
	odoc.close(false)
	rtfname = newname
	mid(rtfname ,len(ret)-3,4)="New."
	rtfname = rtfname & "rtf"
	Name newname as  rtfname
	wait 500
	doevents
	stardesktop.Loadcomponentfromurl(converttourl(rtfname), "_blank", 0, array())

End Sub
Edit: if more search replace terms are needed you can add them to the array in the macro as described there
Attachments
convert.odt
(12.27 KiB) Downloaded 110 times
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Bill
Volunteer
Posts: 8934
Joined: Sat Nov 24, 2007 6:48 am

Re: RTF has a number of characters that don't show up proper

Post by Bill »

There's a problem with the way Unicode characters have been inserted in the document. If the document is opened with a text editor, the Unicode characters are inserted using "\uN?" where "N" should be the Unicode value of the character. However, in this document, the Ansi value of the character was used, not the Unicode value. I found 5 characters that needed to be changed:

\u133? to \u8230?
\u145? to \u8216?
\u146? to \u8217?
\u147? to \u8220?
\u148? to \u8221?

After making these changes, the correct characters started to show up in AOO, LO and Abiword on Linux Mint. They also showed up in LO on Debian and Ubuntu Mate.
AOO 4.1.14 on Ubuntu MATE 22.04
JeJe
Volunteer
Posts: 2784
Joined: Wed Mar 09, 2016 2:40 pm

Re: RTF has a number of characters that don't show up proper

Post by JeJe »

Bill - the last 4 of those all have special rtf words which can be used, as in my document above.

Using your replacements would just involve replacing the line

Code: Select all

reps = array ("\u145?","\lquote ","\u146?","\rquote ","\u147?","\ldblquote ","\u148?","\rdblquote ","\u149?","\bullet ","\u150?","\endash ","\u151?","\emdash ","\u133?","\'85")
with your terms, i.e.

Code: Select all

reps = array ("\u133?","\u8230?","\u145?","\u8216?","\u146?","\u8217?","\u147?","\u8220?","\u148?", "\u8221?")
But as there's a consensus among word processors that the rtf is faulty... it might be better for the OP to tell that to whoever sent them the file...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Post Reply