[Solved] Delete All Index entries at once

Discuss the word processor
Post Reply
nomnex
Posts: 180
Joined: Thu Aug 06, 2009 11:12 pm

[Solved] Delete All Index entries at once

Post by nomnex »

Can I delete all the index entries at once, to restart anew? A quick search on the board, and on Google did not return anything, but do delete each entry one by one.

What's the best practice to start experiencing with indexes; to keep the original document without index, and to make a copy of it with an index? I'd rather save myself to click/delete hundredth of entries. Thank you.
Last edited by nomnex on Mon Oct 22, 2012 3:21 pm, edited 1 time in total.
Bill
Volunteer
Posts: 8952
Joined: Sat Nov 24, 2007 6:48 am

Re: Delete All Index entries at once

Post by Bill »

Click the first index entry and select Edit > Index Entry..., then press and hold CTRL+D until all entries are deleted.
nomnex
Posts: 180
Joined: Thu Aug 06, 2009 11:12 pm

Re: Delete All Index entries at once

Post by nomnex »

the title says delete all Index entries at once, and I am say in the first comment that "the forum search did not return anything, but to delete the entry one by one."

So I assume there is no way to do it.

Hence my second question: I am not very experienced indexing words (this is beyond the scope of the indexing feature in Writer) so I guess I'd rather rather keep a non-indexed copy of my document, in the even I feel the need to restart anew.
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Delete All Index entries at once

Post by acknak »

Removing the index marks can be done by a small script which edits the document xml.

I think it probably makes more sense to just keep a backup copy of the document made before starting to experiment with the indexing.

** Use at your own risk! YMMV IANAL etc. **

function rmndx () {
unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}

rmndx mydoc.odt ## creates "mydoc-index.odt" with index marks removed
AOO4/LO5 • Linux • Fedora 23
nomnex
Posts: 180
Joined: Thu Aug 06, 2009 11:12 pm

Re: Delete All Index entries at once

Post by nomnex »

acknak wrote:Removing the index marks can be done by a small script which edits the document xml.


** Use at your own risk! YMMV IANAL etc. **
http://cygwin.com/acronyms/ > I am really not good with acronyms, but I found my way ;)
acknak wrote:

Code: Select all

function rmndx () { 
    unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
    cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}
rmndx mydoc.odt ## creates "mydoc-index.odt" with index marks removed

Code: Select all

REM  *****  BASIC  *****

Sub Main

function rmndx () {
unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}

End Sub
returns "BASIC syntax error. Function not allowed withing a procedure" when I create a module in the document and F5 (run the macro).
User avatar
peterroots
Volunteer
Posts: 299
Joined: Mon Mar 03, 2008 6:33 pm
Location: UK

Re: Delete All Index entries at once

Post by peterroots »

I think acknak intends this to be run as a script not a basic macro
Removing the index marks can be done by a small script
i.e a bash script
LibreOffice 4.0.3 OpenSUSE 12.3 : OpenOffice 4 Linux Mint 15
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Delete All Index entries at once

Post by acknak »

Yeah, sorry, my bad. I wasn't expecting anyone to actually try it :shock:

Just copy/paste the function definition (up to the closing curly bracket) into a shell command line, then use it as in the example below the code. The function will be lost when you close that shell session; you have to repeat the copy/paste if you ever want to use it again (or make arrangements to save it).
AOO4/LO5 • Linux • Fedora 23
nomnex
Posts: 180
Joined: Thu Aug 06, 2009 11:12 pm

Re: Delete All Index entries at once

Post by nomnex »

Tanks for the correction. I didn't know we could interact with document content using a shell script. But looking at the syntax, even without understanding about regex, it makes sens.
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: [Solved] Delete All Index entries at once

Post by acknak »

It's not a good way to do it, but it can be handy. Quick 'n dirty.

Better would be something that actually understands the document format; the script just does a dumb search/remove. If it gets off-track, it will likely corrupt the document.
AOO4/LO5 • Linux • Fedora 23
CmdrBalok
Posts: 2
Joined: Sat Jul 05, 2014 7:20 pm

Re: [Solved] Delete All Index entries at once

Post by CmdrBalok »

Replacing '<text:alphabetical-index-mark-(start|end)[^>]*>' did not work for me in OO 4.0.0. It corrupted the (copy of) my document. (Procedure: unzip doc with WinRAR, edit content.xml with jEdit, replace a few hundred index tags, save, compress as a ZIP). What I eventually did was follow acknak's advice: made a copy of my doc with no indexing and used that.

The problem I had is that when you build index markers using a concordance, each 'Update Index/Table' operation appears to *add* index marks, so you can end up with many duplicates. That makes the document size grow, but the real issue is that it makes changing how to index something (for example, if you add a keyword to collect related entries) not work - the item will show up both ways. Ideally, I'd like to see a checkbox in the 'Insert Index/Table' panel that causes OO to remove all index marks before generating new ones from a concordance. Of course, that comes with pitfalls of its own, chiefly, it requires you to either use a concordance or insert index entries manually, or live with duplicates.
OO 4.0.0, Win7 HP SP1 64-bit
rdh61
Posts: 2
Joined: Sat Apr 25, 2015 1:56 pm

Re: [Solved] Delete All Index entries at once

Post by rdh61 »

@ Acknak,

Please could you spell out for a real dummy, step by step, how your script should be used. All I get is syntax errors.

Many thanks.
Open Office 4.1 on Ubuntu 14.04
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: [Solved] Delete All Index entries at once

Post by acknak »

Open a terminal window.

At the shell prompt, type echo $SHELL Enter. You should see: /bin/bash

If you see something else, start bash: /bin/bash If that doesn't work, or echo $SHELL still doesn't say /bin/bash, turn back now!

At the bash (shell) prompt, paste the code from the post above; press Enter a couple of times until you get another shell prompt.

Type: type rmndx

You should see the function definition displayed on the screen.

If you see any errors at these steps, turn back now!

To actually run the script on a document, type rmndx document.odt

The output will be a new file: document-index.odt

If you see errors when running the script, you may not have all the necessary components installed: zip/unzip, sed, etc.
AOO4/LO5 • Linux • Fedora 23
rdh61
Posts: 2
Joined: Sat Apr 25, 2015 1:56 pm

Re: [Solved] Delete All Index entries at once

Post by rdh61 »

Thank you. Much appreciated.
Open Office 4.1 on Ubuntu 14.04
LDP-OpenOffice
Posts: 4
Joined: Mon Oct 11, 2021 9:12 pm

Re: [Solved] Delete All Index entries at once

Post by LDP-OpenOffice »

How can the script be run on Windows in order to remove all index marks?
OpenOffice 4.1.10 on Windows 10
User avatar
MrProgrammer
Moderator
Posts: 5349
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: [Solved] Delete All Index entries at once

Post by MrProgrammer »

LDP-OpenOffice wrote: Wed Nov 16, 2022 2:05 am How can the script be run on Windows in order to remove all index marks?
Just install the Linux subsystem for Windows Windows Subsystem for Linux. Then you can run bash scripts.
Installing Linux for Windows is beyond the scope of this forum but that search provides links to explain the process.

Or install Perl and use this three-line program. You may find that Perl is already installed on your system.

$/ = undef; $_ = <STDIN>;                  # Read entire file
s/<text:alphabetical-index-mark[^>]+>//g;  # Remove index tags
print $_;                                  # Write updated XML

To use the program
• The Writer document must be saved in ODT format (not DOC or DOCX)
• Ensure that the Writer document is not open in OpenOffice
Ensure that you have a backup of your ODT file in case you damage it
• UnZIP the ODT file to an empty directory, creating content.xml and other files
• In the directory, run the perl program, reading content.xml and writing content.new
• Delete content.xml  then rename content.new to content.xml
• Re-ZIP the files in the directory to the ODT file

Installing Perl and running Perl programs is beyond the scope of this forum but the Perl link above will have more information.

acknak wrote: Mon Oct 15, 2012 6:40 pm
sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g'
Some varieties of Unix/Linux use option -E instead of -r to indicate that this is an extended regular expression.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.6, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
LDP-OpenOffice
Posts: 4
Joined: Mon Oct 11, 2021 9:12 pm

Re: [Solved] Delete All Index entries at once

Post by LDP-OpenOffice »

Thank you very much.

Now it is much clearer how indexing is marked up in an .odt file and the file's structure, which is essentially a .zip file. It has 2 records.

The first is <?xml version="1.0" encoding="UTF-8"?>
The second is a very long XML record.

For example, the term ATLANTIS is marked up in it as follows:

<text:alphabetical-index-mark-start text:id="IMark66441952"/>
<text:alphabetical-index-mark-start text:id="IMark66442048"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark-start text:id="IMark66445264"/>
<text:alphabetical-index-mark-start text:id="IMark66281708"/>
<text:span text:style-name="T1">ATLANTIS</text:span>
<text:alphabetical-index-mark-end text:id="IMark66281708"/>
<text:alphabetical-index-mark-end text:id="IMark66445264"/>
<text:alphabetical-index-mark-end text:id="IMark66442048"/>
<text:alphabetical-index-mark-end text:id="IMark66441952"/>

I plan to try writing a gawk or PHP script to remove index related markup.
OpenOffice 4.1.10 on Windows 10
LDP-OpenOffice
Posts: 4
Joined: Mon Oct 11, 2021 9:12 pm

Re: [Solved] Delete All Index entries at once

Post by LDP-OpenOffice »

Found the LibreOffice generated xml file is much simpler and easier to process and once the .odt is re-zipped with the processed content.xml file the index marks disappear and there are less problems than with the OpenOffice version file.

open.xml contains 2 records: an XML header and the document. In my case it was a 5MB or so file.

In a first pass I forced each xml statement onto is own record. The second pass discarded indexing related statements. The 2 scrips were written in gawk.

OpenOffice indexes terns which are embedded in words.

Processing content.xml and rebuilding the .odt creates an unexpected problem that is not easy to resolve. For example if "Go" is an index term then "negotiate" will print like this "ne go tiate", etc.

After rebuilding the .odt file OpenOffice could not open it, but LibreOffice could.
OpenOffice 4.1.10 on Windows 10
Post Reply