Is there a way to use the find and replace function to add a space in between these instances? I'm editing a 1256 page document with 600k words.testtest or testtest
Im using OpenOffice 4.1.5 on Windows 10 if that helps.
Is there a way to use the find and replace function to add a space in between these instances? I'm editing a 1256 page document with 600k words.testtest or testtest
Thanks, I'm been trying it right now for about an hour and I still can't get the search for to work as intended. I was also thinking that maybe I could use the spell check system to find these joined words (thinking they're misspelled or something) however it says everything in the document is fine.Zizi64 wrote:Try the AltSearch extension. It can find various formatting parameters.
Please upload a short ODF type sample document here with some misplelled words and with same structure (same paragraph style settings) as the original one. (Delete the most of the text from a copy of the original document, and upload it here.) The file size limit is 128 KiB in this forum.I was also thinking that maybe I could use the spell check system to find these joined words (thinking they're misspelled or something) however it says everything in the document is fine.
single words that are italic are next to (in front or after) a regular word have no space, so I end up with something like this:
foobarwibble or foobarwibble
Davidrobleyd wrote:I think the OP has given a misleading example in the first post; from the description I think the meaning is that different words, one having italic font, have no space between them.
I wouldn't say its duplication more like a issue when I initially copied and pasted the text from the website to OpenOffice. Some parts of the text had these "grey spaces" which weren't spaces with a background colour, highlighter or anything like that so I used the find and replace function to remove them with no spaces because sometimes those "grey spaces" had another normal space before or after, so I figured that would work. I only noticed the issue long after completing the ebook and then from reading it.robleyd wrote:Is it possible to utilise the method of getting from HTML to whatever format you are bringing into Writer, to resolve the duplication, rather than trying to post-process the text?
Obviously, I don't know what your process is, so this is a bit of a guess.
I appreciate the tutorials however I should've probably mentioned in the opening post that when certain characters speak in this web novel they are always italic so you can differentiate between the two (humans and reapers), so the examples you gave me don't really work too well since it also finds the normal italic conversations.John_Ha wrote:Davidrobleyd wrote:I think the OP has given a misleading example in the first post; from the description I think the meaning is that different words, one having italic font, have no space between them.
That is exactly what my Search does. The search does not look for a given word - it looks for any consecutive characters (including spaces) which are rendered in italic.
It then replaces that sequence with itself, either preceded by, or followed by, a space. This converts foobarwibble to foobar wibble, and converts foobarwibbleto foobar wibble.
Ah yes you're right, that's exactly what they are. I'll have to remember that for next time. Cheers! Why is that? Do multiple spaces not show when converting to say azw3 format etc?robleyd wrote:The grey spaces are most likely non-breaking spaces ( ) in the HTML file(s) - Ctrl-Shift_Space in Writer. Rather than deleting them perhaps you should have replaced them with normal spaces in Writer, then if needed go through and replace multiple spaces with one space. However if publishing in epub format, multiple spaces most likely wouldn't be an issue.
You say it isn't worth it and I should just skip to the next instance however, this is how many next instances i'd have to go through.John_Ha wrote:Searching with AOO Find and Replace, using (.*), Italic and Regular expressions, finds every instance of italic strings in your .odt file.
You can probably modify the regular expression so as to choose only to find "italic strings followed by a normal character which is not a space" but it probably isn't worth it - just skip the wrong ones and jump to the next instance. I cannot see an easy way to do it with AOO because the Italic applies to all the contents of the search box. You may be able to do it with Alternative Find and Replace by entering two search arguments in the search box (italic string followed by non italic character) but I have not experimented.
Incidentally, you should now see why applying Direct Formatting is not a good idea. Had you defined a Style for alien_speech as italic, and applied it as a style, then you could have searched for everything in that style.
Interesting. I never thought about it that way.John_Ha wrote:There are (countless) other workarounds - you need to use some imagination.
For example, if you do not use double spaces, replace all italic strings by the found string with a preceding and a following space. Now search for all double spaces and replace them with a single space.
Or, if you do use double spaces, search for all double spaces and replace them with xq85#. Now replace all italic strings by the string with a preceding and a following space. Now search for all double spaces and replace them with a single space. Now replace all xq85# with a double space.
This was very much intentional. Based on all the ebooks I've come across this type of indention and paragraph spacing is what they all use. I guess I just wanted it to look more professional, personally.John_Ha wrote:May I make a suggestion? Text reads much more easily with a small gap below a paragraph. Edit your default (Text body) style by Format > Paragraph > Indents and Spacing. Set Gap after paragraph to be 1.5 or 2mm. You may then also decide that indenting the paragraph is no longer necessary - remove it by setting the first line indent to 0mm.
Yeah figured that out in comment above . Thanks though.John_Ha wrote:Another point. Spellcheck does not find someonewas because its (and lots of other text) language is set to Unknown so AOO does not have a dictionary against which to check it. Set all your text to the correct language (UK English, US English etc), and install that dictionary and spellcheck with find someonewas.
12,000 x 1 sec per skip = 12,000 seconds = 3 hours and 20 minutes which is not very long. It has probably taken you that long to create this thread, add your posts and read the replies. 600k words at 6 chars plus a space per word is 4,200,000 keystrokes - 12,000 isn't very many in comparison.ImMoist wrote:You say it isn't worth it and I should just skip to the next instance however, this is how many next instances i'd have to go through.
I can't speak for azw3 as I haven't worked with it, or any of the Amazon/Kindle formats, but epub is basically HTML where all whitespace - spaces, tabs, newlines etc. - are treated as spaces and multiple occurrences are ignored.Do multiple spaces not show when converting to say azw3 format etc?
Touché.John_Ha wrote:12,000 x 1 sec per skip = 12,000 seconds = 3 hours and 20 minutes which is not very long. It has probably taken you that long to create this thread, add your posts and read the replies. 600k words at 6 chars plus a space per word is 4,200,000 keystrokes - 12,000 isn't very many in comparison.ImMoist wrote:You say it isn't worth it and I should just skip to the next instance however, this is how many next instances i'd have to go through.
Hmm, I see. Yeah I'm planning on starting again however now I'm having issues copy and pasting. I'm trying to the chapter from this page however every time I paste, it doesn't keep any of the italics. Yet if I copy only this part only from the chapterrobleyd wrote:I can't speak for azw3 as I haven't worked with it, or any of the Amazon/Kindle formats, but epub is basically HTML where all whitespace - spaces, tabs, newlines etc. - are treated as spaces and multiple occurrences are ignored.Do multiple spaces not show when converting to say azw3 format etc?
Given the extent of your problem, and hoping that you have created the necessary styles for formatting your document which you can re-use, perhaps starting from afresh might be an easier option?
It pastes fine into the document, and the italics are there. Why is this happenig?‘Well,’ said Garovel at length, ‘now we can get to Capaporo safely. It may take us another day or two, but at least there isn’t a feldeath and an army of worms in the way. As far as I know, that is.’
Code: Select all
\‘[^’]*\’
will end up being highlighted as‘Of course we’re kidding,’
(the underscore/bold is the text being highlighted/selected)‘Of course we’re kidding,’
Alright sweet, got it working. Thanks a lot!John_Ha wrote:Regular expressions are universal so Google for it.
Be sure to work on a copy of the file as a small mistake with a regular expression can go spectacularly wrong
I quickly found C# Regex: matching anything between single quotes (except single quotes) [duplicate] which links to Regular Expression Except this Characters.
see [Tutorial] How to record a macro (and Regular Expressions) for references for regular expressions.
Showing that a problem has been solved helps others searching so, if your problem is now solved, please view your first post in this thread and click the Edit button (top right in the post) and add [Solved] in front of the subject.