[Solved] Batch search / replace hyperlinks

Writing a book, Automating Document Production - Discuss your special needs here
Post Reply
evking
Posts: 4
Joined: Fri Apr 17, 2009 10:19 am

[Solved] Batch search / replace hyperlinks

Post by evking »

I have 150 Writer documents that each has lots of hyperlinks in the text.

#1: Now I need to replace all hyperlinks (e.g. http://www.oldsite.com/112.pdf to http://www.newsite.com/112.pdf). Is there a way to perform this in a batch operation?

#2: Afterwards all Writer need to be converted to PDF's. Is this also possible to do in one fell swoop?

#1 is the most important issue ;)

Thanks for any suggestions!
Last edited by Hagar Delest on Sun Apr 26, 2009 3:56 pm, edited 1 time in total.
Reason: tagged [Solved].
OOo 3.0.X on Ubuntu 8.x + XP
LuciferSam
Posts: 2
Joined: Thu Feb 12, 2009 5:00 pm

Re: Batch search / replace [Hacking Open Document Files]

Post by LuciferSam »

evking wrote:I have 150 Writer documents that each has lots of hyperlinks in the text.

#1: Now i need to replace all hyperlinks (e.g. http://www.oldsite.com/112.pdf to http://www.newsite.com/112.pdf). Is there a way to perform this in a batch operation?
Well, I have a method, but you may not like it! Hopefully, someone else will have written a macro
that can do this for you, otherwise....

This is a general approach that I use to overcome various shortcomings in Ooo. Your level of expertise with your operating system tools & utitlities will determine how "automated" the process will be. If you are unable to "script-up" some or all of these operations, then this method will almost certainly NOT be quicker than editing the files manually!

My method relies on the fact that the OOo file format is actually a .zip file full of .xml (and other) files which contain the documents text, style information, pictures and other stuff.

In overview, my method is:

1. Unzip the .odt file(s)

2. Edit the .xml (text) files directly using a decent text editor or other tools that can do global change and replace across multiple files e.g. the excellent and free CONtext editor under Windows, or command line text editing tools like sed, awk, or perl scripts (*nix/cross-platform)

3. Zip 'em back up again.


Now in more detail - I'm assuming Windows (although the technique translates to *nix etc easily)
Also, I'd definitely experiment with a single .ODT file first to get the hang of it.

1. Make a backup-up of your files - Of course, you always keep a proper backup
anyway, don't you? ;-)

2. Rename your (e.g.) "Document.odt" file to be (e.g.) "Document.zip"
(This is not scrictly necessary, particularly if you use a command line unzipper
like info-zip's unzip.exe, however many windows GUI unzippers won't easily unzip
the .odt files unless you rename them).

3. Unzip (e.g) Document.zip into a directory/folder called (e.g) "Document"
The built-in windows shell zip folder support works well for this because it creates the (e.g.) "Document" "root" folder for you. You want to end up with a folder/directory structure something like this:

Document\ <- The directory name corresponds to your "Document" name,
Document\Configurations2
Document\content.xml <- This file contains your document's text
Document\layout-cache
Document\META-INF
Document\meta.xml
Document\mimetype
Document\Pictures <- This directory/folder contains any images in your doc (e.g. 3 below)
Document\Pictures\1000000000000060000000509E4D9BDA.jpg
Document\Pictures\10000000000000640000007FACEF61E3.jpg
Document\Pictures\1000000000000082000000825D82B018.gif
Document\settings.xml
Document\styles.xml <- This file contains your style information
Document\Thumbnails
Document\Thumbnails\thumbnail.png <- This is the document's thumbnail image
Document\Configurations2\accelerator
Document\Configurations2\floater
Document\Configurations2\images
Document\Configurations2\menubar
Document\Configurations2\popupmenu
Document\Configurations2\progressbar
Document\Configurations2\statusbar
Document\Configurations2\toolbar
Document\Configurations2\accelerator\current.xml
Document\Configurations2\images\Bitmaps
Document\META-INF\manifest.xml

4. Delete or rename Document.zip (we're gonna recreate it in a minute)

5. Now you can edit the text in (all your) "content.xml" file(s)
Use a text editor or tools of your choice. Notepad will do for a single file, but can't do a global find & replace across multiple files, however all serious editors can.
(You can also easily and accurately edit styles and colours etc, by editing the text in the "styles.xml" files).

6. Zip up the "Document" directory/folder to recreate Document.zip.
YOU CANNOT USE WINDOWS' IN-BUILT ZIP FOLDER SUPPORT FOR THIS
Neither can you use 7-zip. OOo will NOT be able to understand the file! I have absolutely no idea why not!

I use infozip's command line zip.exe (which works, is free, and I can supply)
e.g. Run up a windows "Command Prompt" (cmd.exe)
cd to the (e.g.) "Document" directory/folder
zip ..\Document *.*
That will recreate the Document.zip file in the parent directory/folder

7. Rename the new Document.zip back to Document.odt and open the file with OOo to check it's all ok!

Now you're a fledgling Open Document File format hacker!

I find this technique particularly useful for changing and matching colours (by editing the hex RGB
values directly) and making fine adjustments to table and picture dimensions to match-up
accurately.

HTH

Regards
Luc
OOo 2.4.X on Ms Windows XP
evking
Posts: 4
Joined: Fri Apr 17, 2009 10:19 am

Re: Batch search / replace

Post by evking »

Luc, that was an amazing answer! Really kind of you to make such a write up to my question :D

I understand the method, and your article should probably be saved as a great crash course to the ODT file format.

It should not be too difficult to make a script to do it, but as you say someone may already have done it - an ODT search/replace batch machine....

Perhaps I should still let the subject stand open for a while; maybe someone will add to it.

Again, thank you for your explanation. Really nice of you!
OOo 3.0.X on Ubuntu 8.x + XP
LuciferSam
Posts: 2
Joined: Thu Feb 12, 2009 5:00 pm

Re: Batch search / replace

Post by LuciferSam »

evking wrote:Luc, that was an amazing answer! Really kind of you to make such a write up to my question :D
No problem at all - it's nice to be appreciated :-)
Perhaps I should still let the subject stand open for a while; maybe someone will add to it.
I would - my method is a little "non-optimal"!

Do you know any perl? I'm afraid I don't otherwise I'd be tempted to have a go (my scripting expertise is mainly confined to IBM's old REXX); there seem to be perl tools that would allow you to script this, but as before, I think it is likely to be more work to develop the scripts than to tediously edit each file!

If you register, you can vote for this enhancement at: or possibly: but I wouldn't hold your breath!
#2: Afterwards all Writer need to be converted to PDF's. Is this also possible to do in one fell swoop?
Check out Document Converter (about 1/3 of the way down the page)
  • Document Converter

    Author: Danny Brewer / Dan Horwood
    DocConverter is a utility to convert a batch of documents from any supported OOo format into any other supported OOo format. It could, for example, be used to convert a batch of OOo Writer documents into PDFs. Simple to use, with an interface similar to OOo's AutoPilots.

    Latest release: Version 2.0 (June 10, 2006)
Cheers
Luc
OOo 2.4.X on Ms Windows XP
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Batch search / replace

Post by acknak »

If you're on Unix/Linux, this is quite easy to script, using only the standard command-line tools.

Just yell if you could use something like that.
AOO4/LO5 • Linux • Fedora 23
evking
Posts: 4
Joined: Fri Apr 17, 2009 10:19 am

Re: Batch search / replace

Post by evking »

Of course I'm using Linux, acknak :D

A batch search replace script would be great, and certainly useful for lots of people so please consider yourself yelled at :)
OOo 3.0.X on Ubuntu 8.x + XP
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Batch search / replace

Post by acknak »

Quick and dirty; no warranty; use at your own risk, and all that...

Code: Select all

#!/bin/sh

usage="usage: fnr find replace files..."

find="$1"
replace="$2"

if [ -n "$find" -a -n "$replace" ]
then
  shift; shift
else
  echo "missing argument"
  echo "$usage"
  exit 2
fi

# exit immediately if something fails
set -e

for f in "$@"
do
  # keep a copy of the original file
  cp -p "$f" "$f.bak"

  # extract content.xml and make the changes
  unzip -p "$f" content.xml | sed -e "s$find$replaceg" > content.xml

  # update the document archive
  zip "$f" content.xml

  rm -f content.xml
done
 Edit: PS: 
Especially beware, this will change any text in the file, including the document's xml encoding. You can seriously ruin a document with this if you don't know what you're doing. Be sure to check that the modified documents will still open before you toss the backup copies.

However, with something like a URL, which would never be part of the xml syntax--it's always going to be data, and unique--it should be ok.

So, for your example, you can do
$ fnr http://www.oldsite.com http://www.newsite.com *.odt
and get away with it.

Something like
$ fnr text context *.odt
would be certain death to those documents. That's why the script makes a backup copy of each file.
AOO4/LO5 • Linux • Fedora 23
evking
Posts: 4
Joined: Fri Apr 17, 2009 10:19 am

Re: Batch search / replace

Post by evking »

Acknak and Luc, you are true gentlemen!

All documents are modified successfully, both the search/replace and conversion to PDF. :D :D :D

The SED command did not show correctly on my screen so I had to google a bit. (The "#" characters displays wrongly)

Thanks again, you've saved me hours of tedious editing.
OOo 3.0.X on Ubuntu 8.x + XP
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Batch search / replace

Post by acknak »

Good!
The SED command did not show correctly on my screen so I had to google a bit. (The "#" characters displays wrongly)
I should have mentioned: those characters are ctrl-v, and they may look a little strange, but will be handled correctly by the shell, and any decent text editor.

The reason to use such a strange character is to avoid any potential for conflicts with the $find and $replace strings. You can use any character, so I picked something that's very unlikely to appear in the strings.
AOO4/LO5 • Linux • Fedora 23
windmarc
Posts: 14
Joined: Sat Jul 03, 2010 9:34 pm

Re: [Solved] Batch search / replace hyperlinks

Post by windmarc »

I know this is old news by now, but I just found this method, which I was missing for a while. I have been able to update hundreds of links in a few minutes.
Luc, this is great, thank you so much!
OpenOffice 3.3.0 on Windows 7
MPEcho
Posts: 99
Joined: Wed Sep 07, 2016 11:30 pm

Re: [Solved] Batch search / replace hyperlinks

Post by MPEcho »

This post is the gift that keeps giving. I was trying to work out on my own how to hack up a script and ran across this one by acknak worked great for me. For anyone interested, I dislike having backups mixed into a working folder. So I create a backup folder when declaring variables like this:

Code: Select all

OUT_DIR=backup
     # testing output directory exist. if not create it.
     if [ ! -d ${OUT_DIR} ] ; then
          mkdir ${OUT_DIR}
     fi
then the line saving the backups looks like this

Code: Select all

  # keep a copy of the original file
  cp -p "$f" ${OUT_DIR}/"$f.bak"
As always, test any additions to your scripts because . . no warranties!
Cheers
Libre Office 5.1.6.2 Ubuntu 16.04
Post Reply