Advanced Find/Replace needs

Discuss the spreadsheet application

Advanced Find/Replace needs

Postby mrdf » Tue May 28, 2019 6:27 pm

I'm translating a website. In order to make sure I found all the pages, I dumped the database into a Calc spreadsheet. The content of most of the cells is code, and the code is contained within angle brackets <>.

I am trying to get a reasonable idea of how many words I will be translating, so I want to eliminate the code and leave only the visible text I actually need to translate.

So I would like to eliminate everything within angle brackets and the angle brackets themselves, and see what's left.

All I need is a the proper search terms... Does Calc have powerful enough search wildcards for me to do this? My fantasy is FIND: <*> | REPLACE: [nothing] , but life is not as straightforward as that. :?

If Calc can't do the job, does anyone know another tool that can? Thanks!
Location: Basque Country/Spain - OOo 4.1.5 - Mac OS 10.14.4
mrdf
 
Posts: 27
Joined: Sun Jul 18, 2010 8:01 pm

Re: Advanced Find/Replace needs

Postby RoryOF » Tue May 28, 2019 6:35 pm

in Useful Macro Information For OpenOffice.org by Andrew Pitonyak, which you can download from his site, listing 7.44 gives a macro to delete text between specified delimiters.
Apache OpenOffice 4.1.7 on Xubuntu 18.04.3 (mostly 64 bit version) and very infrequently on Win2K/XP
User avatar
RoryOF
Moderator
 
Posts: 29896
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: Advanced Find/Replace needs

Postby mrdf » Tue May 28, 2019 7:47 pm

Thanks Rory, and thanks in absentia to Andrew Pitonyak.

My only problem now is I know next to nothing about macros.

I suppose that I copy his code and paste it... somewhere? This is my first time venturing into Tools > Macros… and I'm not finding it obvious what I should put where.

As far as the code itself, it seems that perhaps it's the red text that I need to pay attention to, and (most of) the rest I can safely ignore? For instance, there are the actual text delimiters in his example, square brackets [ ] that I would replace with what I want, < >. Then, eerily, later there are a few sets of quotation marks also in red, some with a space between them and some not. They seem to be calling to me to take action, but what I would do is not so obvious in those cases. There is also text highlighted in grey, seemingly asking me to pay attention to it and take action: oDoc.createSearchDescriptor followed by an enticing set of blue parentheses with just a single space inside them ( ). They seem to want me to write my own description of this search inside, but I also think this may be the kind of amateur fixit thinking that could ruin everything.

How can I figure out what to do next? I'm really pretty lost in this area.
Location: Basque Country/Spain - OOo 4.1.5 - Mac OS 10.14.4
mrdf
 
Posts: 27
Joined: Sun Jul 18, 2010 8:01 pm

Re: Advanced Find/Replace needs

Postby Villeroy » Tue May 28, 2019 9:11 pm

menu:Find/Replace...
[More Options]
[X] Regular expressions

Everything within <>, including the <>
 Edit: removed backslashes 

Search: <.*>
Replace: blank, nothing, nada

Everything within <>, not including the <>
 Edit: old version was rubbish 

Search: (.*)<(.*)>(.*)
Replace: $1<>$3

Entire cell content within <>
 Edit: removed backslashes 

Search: ^<.*>$
Replace: blank, nothing, nada
Last edited by Villeroy on Tue May 28, 2019 10:40 pm, edited 2 times in total.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.x
User avatar
Villeroy
Volunteer
 
Posts: 27380
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Advanced Find/Replace needs

Postby mrdf » Tue May 28, 2019 10:11 pm

Hi Villeroy, I like your way of thinking (especially the avoiding macros part) :bravo:

Results not successful yet though. Here's what I tried, choosing "Replace All" in every case, and the results:

menu:Find/Replace...
[More Options]
[X] Regular expressions


Done.

Everything within <>, including the <>
Search: \<.*\>
Replace: blank, nothing, nada


Result: Deleted all text in cells where there was no <>, leaving a blank cell. In cells with mixed text and code in <>, it left only <>, <">, < ">, </> and < /> in cells, on individual lines, deleting all other text. This seems to be almost the perfect inverse of the desired effect.

Everything within <>, not including the <>
Search: \<(.*\)>
Replace: $1


Result: "Search key not found" -- I tried both copying your search term and retyping it.

Entire cell content within <>
Search: ^\<.*\>$
Replace: blank, nothing, nada


Result: Plain text in cells containing no code was deleted, cells containing code in <> or a mix of text and code seemed to remain untouched.

:?:
Location: Basque Country/Spain - OOo 4.1.5 - Mac OS 10.14.4
mrdf
 
Posts: 27
Joined: Sun Jul 18, 2010 8:01 pm

Re: Advanced Find/Replace needs

Postby Villeroy » Tue May 28, 2019 10:41 pm

Sorry, I felt too sure and did not test. I revised my previous posting.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.x
User avatar
Villeroy
Volunteer
 
Posts: 27380
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Advanced Find/Replace needs

Postby keme » Wed May 29, 2019 10:39 am

I guess the regex search string you need is
Code: Select all   Expand viewCollapse view
<.*?>
Replace with nothing.

A hook for your memory: The escaping of the "<" character is backwards.

By the "escaping standard", we would expect:
    \< means "search for character <" - Wrong!
    < means "use the <-character's special function (start of word)"- Wrong!
It works the other way around. Fooled me more than once. I always forget, until I remember again.

Also, the question mark after * is required to force "lazy matching". The default for regex is "greedy matching", where in this case everything from the opening of the first tag to the closing of the last would match. Lazy vs. greedy does not matter in cases where only one tag is found in each cell, but I guess most entries have multiple tags.

To have a better result, you could search for
Code: Select all   Expand viewCollapse view
[:space:]*<.*?>[:space:]*
Replace with single space.

This should return "sensible spacing". Better readability, but no point in that if you only need to count words.
User avatar
keme
Volunteer
 
Posts: 3286
Joined: Wed Nov 28, 2007 10:27 am
Location: Egersund, Norway

Re: Advanced Find/Replace needs

Postby robleyd » Wed May 29, 2019 10:54 am

If Calc can't do the job, does anyone know another tool that can?

Any good plain text editor that supports regex in its search/replace would do the job.
Cheers
David
Apache OpenOffice 420m2(Build:9821) - Slackware 14.2 - 64 bit
LibreOffice 6.0.7.3 - Slackware 14.2 - 64 bit
Apache OpenOffice 4.1.4 - Windows 7 Virtual machine
User avatar
robleyd
Moderator
 
Posts: 3057
Joined: Mon Aug 19, 2013 3:47 am
Location: Murbko, Australia


Return to Calc

Who is online

Users browsing this forum: No registered users and 26 guests