Advanced Find/Replace needs

Discuss the spreadsheet application
Post Reply
mrdf
Posts: 27
Joined: Sun Jul 18, 2010 8:01 pm

Advanced Find/Replace needs

Post by mrdf »

I'm translating a website. In order to make sure I found all the pages, I dumped the database into a Calc spreadsheet. The content of most of the cells is code, and the code is contained within angle brackets <>.

I am trying to get a reasonable idea of how many words I will be translating, so I want to eliminate the code and leave only the visible text I actually need to translate.

So I would like to eliminate everything within angle brackets and the angle brackets themselves, and see what's left.

All I need is a the proper search terms... Does Calc have powerful enough search wildcards for me to do this? My fantasy is FIND: <*> | REPLACE: [nothing] , but life is not as straightforward as that. :?

If Calc can't do the job, does anyone know another tool that can? Thanks!
Location: Basque Country/Spain - OOo 4.1.5 - Mac OS 10.14.4
User avatar
RoryOF
Moderator
Posts: 34611
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Advanced Find/Replace needs

Post by RoryOF »

in Useful Macro Information For OpenOffice.org by Andrew Pitonyak, which you can download from his site, listing 7.44 gives a macro to delete text between specified delimiters.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
mrdf
Posts: 27
Joined: Sun Jul 18, 2010 8:01 pm

Re: Advanced Find/Replace needs

Post by mrdf »

Thanks Rory, and thanks in absentia to Andrew Pitonyak.

My only problem now is I know next to nothing about macros.

I suppose that I copy his code and paste it... somewhere? This is my first time venturing into Tools > Macros… and I'm not finding it obvious what I should put where.

As far as the code itself, it seems that perhaps it's the red text that I need to pay attention to, and (most of) the rest I can safely ignore? For instance, there are the actual text delimiters in his example, square brackets [ ] that I would replace with what I want, < >. Then, eerily, later there are a few sets of quotation marks also in red, some with a space between them and some not. They seem to be calling to me to take action, but what I would do is not so obvious in those cases. There is also text highlighted in grey, seemingly asking me to pay attention to it and take action: oDoc.createSearchDescriptor followed by an enticing set of blue parentheses with just a single space inside them ( ). They seem to want me to write my own description of this search inside, but I also think this may be the kind of amateur fixit thinking that could ruin everything.

How can I figure out what to do next? I'm really pretty lost in this area.
Location: Basque Country/Spain - OOo 4.1.5 - Mac OS 10.14.4
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Advanced Find/Replace needs

Post by Villeroy »

menu:Find/Replace...
[More Options]
[X] Regular expressions

Everything within <>, including the <>
 Edit: removed backslashes 
Search: <.*>
Replace: blank, nothing, nada

Everything within <>, not including the <>
 Edit: old version was rubbish 
Search: (.*)<(.*)>(.*)
Replace: $1<>$3

Entire cell content within <>
 Edit: removed backslashes 
Search: ^<.*>$
Replace: blank, nothing, nada
Last edited by Villeroy on Tue May 28, 2019 10:40 pm, edited 2 times in total.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
mrdf
Posts: 27
Joined: Sun Jul 18, 2010 8:01 pm

Re: Advanced Find/Replace needs

Post by mrdf »

Hi Villeroy, I like your way of thinking (especially the avoiding macros part) :bravo:

Results not successful yet though. Here's what I tried, choosing "Replace All" in every case, and the results:
menu:Find/Replace...
[More Options]
[X] Regular expressions
Done.
Everything within <>, including the <>
Search: \<.*\>
Replace: blank, nothing, nada
Result: Deleted all text in cells where there was no <>, leaving a blank cell. In cells with mixed text and code in <>, it left only <>, <">, < ">, </> and < /> in cells, on individual lines, deleting all other text. This seems to be almost the perfect inverse of the desired effect.
Everything within <>, not including the <>
Search: \<(.*\)>
Replace: $1
Result: "Search key not found" -- I tried both copying your search term and retyping it.
Entire cell content within <>
Search: ^\<.*\>$
Replace: blank, nothing, nada
Result: Plain text in cells containing no code was deleted, cells containing code in <> or a mix of text and code seemed to remain untouched.

:?:
Location: Basque Country/Spain - OOo 4.1.5 - Mac OS 10.14.4
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Advanced Find/Replace needs

Post by Villeroy »

Sorry, I felt too sure and did not test. I revised my previous posting.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
keme
Volunteer
Posts: 3703
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Advanced Find/Replace needs

Post by keme »

I guess the regex search string you need is

Code: Select all

<.*?>
Replace with nothing.

A hook for your memory: The escaping of the "<" character is backwards.

By the "escaping standard", we would expect:
  • \< means "search for character <" - Wrong!
    < means "use the <-character's special function (start of word)"- Wrong!
It works the other way around. Fooled me more than once. I always forget, until I remember again.

Also, the question mark after * is required to force "lazy matching". The default for regex is "greedy matching", where in this case everything from the opening of the first tag to the closing of the last would match. Lazy vs. greedy does not matter in cases where only one tag is found in each cell, but I guess most entries have multiple tags.

To have a better result, you could search for

Code: Select all

[:space:]*<.*?>[:space:]*
Replace with single space.

This should return "sensible spacing". Better readability, but no point in that if you only need to count words.[/i]
User avatar
robleyd
Moderator
Posts: 5081
Joined: Mon Aug 19, 2013 3:47 am
Location: Murbko, Australia

Re: Advanced Find/Replace needs

Post by robleyd »

If Calc can't do the job, does anyone know another tool that can?
Any good plain text editor that supports regex in its search/replace would do the job.
Cheers
David
OS - Slackware 15 64 bit
Apache OpenOffice 4.1.15
LibreOffice 24.2.2.2; SlackBuild for 24.2.2 by Eric Hameleers
Post Reply