[Solved] Delete All Index entries at once

nomnex · Post by **nomnex** » Fri Oct 12, 2012 1:45 am

Can I delete all the index entries at once, to restart anew? A quick search on the board, and on Google did not return anything, but do delete each entry one by one.

What's the best practice to start experiencing with indexes; to keep the original document without index, and to make a copy of it with an index? I'd rather save myself to click/delete hundredth of entries. Thank you.

Bill · Post by **Bill** » Fri Oct 12, 2012 5:21 pm

Click the first index entry and select Edit > Index Entry..., then press and hold CTRL+D until all entries are deleted.

nomnex · Post by **nomnex** » Mon Oct 15, 2012 12:47 pm

the title says delete all Index entries at once, and I am say in the first comment that "the forum search did not return anything, but to delete the entry one by one."

So I assume there is no way to do it.

Hence my second question: I am not very experienced indexing words (this is beyond the scope of the indexing feature in Writer) so I guess I'd rather rather keep a non-indexed copy of my document, in the even I feel the need to restart anew.

Post by **acknak** » Mon Oct 15, 2012 6:40 pm

Removing the index marks can be done by a small script which edits the document xml.

I think it probably makes more sense to just keep a backup copy of the document made before starting to experiment with the indexing.

** Use at your own risk! YMMV IANAL etc. **

function rmndx () {
unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}

rmndx mydoc.odt ## creates "mydoc-index.odt" with index marks removed

nomnex · Post by **nomnex** » Sun Oct 21, 2012 1:52 pm

acknak wrote:Removing the index marks can be done by a small script which edits the document xml.

** Use at your own risk! YMMV IANAL etc. **

http://cygwin.com/acronyms/ > I am really not good with acronyms, but I found my way

acknak wrote:

Code: Select all

function rmndx () { 
    unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
    cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}

rmndx mydoc.odt ## creates "mydoc-index.odt" with index marks removed

Code: Select all

REM  *****  BASIC  *****

Sub Main

function rmndx () {
unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}

End Sub

returns "BASIC syntax error. Function not allowed withing a procedure" when I create a module in the document and F5 (run the macro).

Post by **peterroots** » Sun Oct 21, 2012 6:30 pm

I think acknak intends this to be run as a script not a basic macro

Removing the index marks can be done by a small script

i.e a bash script

Post by **acknak** » Sun Oct 21, 2012 9:42 pm

Yeah, sorry, my bad. I wasn't expecting anyone to actually try it

Just copy/paste the function definition (up to the closing curly bracket) into a shell command line, then use it as in the example below the code. The function will be lost when you close that shell session; you have to repeat the copy/paste if you ever want to use it again (or make arrangements to save it).

nomnex · Post by **nomnex** » Mon Oct 22, 2012 3:21 pm

Tanks for the correction. I didn't know we could interact with document content using a shell script. But looking at the syntax, even without understanding about regex, it makes sens.

Post by **acknak** » Mon Oct 22, 2012 5:09 pm

It's not a good way to do it, but it can be handy. Quick 'n dirty.

Better would be something that actually understands the document format; the script just does a dumb search/remove. If it gets off-track, it will likely corrupt the document.

CmdrBalok · Post by **CmdrBalok** » Sat Jul 05, 2014 7:42 pm

Replacing '<text:alphabetical-index-mark-(start|end)[^>]*>' did not work for me in OO 4.0.0. It corrupted the (copy of) my document. (Procedure: unzip doc with WinRAR, edit content.xml with jEdit, replace a few hundred index tags, save, compress as a ZIP). What I eventually did was follow acknak's advice: made a copy of my doc with no indexing and used that.

The problem I had is that when you build index markers using a concordance, each 'Update Index/Table' operation appears to *add* index marks, so you can end up with many duplicates. That makes the document size grow, but the real issue is that it makes changing how to index something (for example, if you add a keyword to collect related entries) not work - the item will show up both ways. Ideally, I'd like to see a checkbox in the 'Insert Index/Table' panel that causes OO to remove all index marks before generating new ones from a concordance. Of course, that comes with pitfalls of its own, chiefly, it requires you to either use a concordance or insert index entries manually, or live with duplicates.

rdh61 · Post by **rdh61** » Sat Apr 25, 2015 2:02 pm

@ Acknak,

Please could you spell out for a real dummy, step by step, how your script should be used. All I get is syntax errors.

Many thanks.

Post by **acknak** » Sun Apr 26, 2015 1:57 am

Open a terminal window.

At the shell prompt, type echo $SHELL Enter. You should see: /bin/bash

If you see something else, start bash: /bin/bash If that doesn't work, or echo $SHELL still doesn't say /bin/bash, turn back now!

At the bash (shell) prompt, paste the code from the post above; press Enter a couple of times until you get another shell prompt.

Type: type rmndx

You should see the function definition displayed on the screen.

If you see any errors at these steps, turn back now!

To actually run the script on a document, type rmndx document.odt

The output will be a new file: document-index.odt

If you see errors when running the script, you may not have all the necessary components installed: zip/unzip, sed, etc.

rdh61 · Post by **rdh61** » Sun Apr 26, 2015 7:32 pm

Thank you. Much appreciated.

LDP-OpenOffice · Post by **LDP-OpenOffice** » Wed Nov 16, 2022 2:05 am

How can the script be run on Windows in order to remove all index marks?

Post by **MrProgrammer** » Wed Nov 16, 2022 11:19 pm

LDP-OpenOffice wrote: ↑Wed Nov 16, 2022 2:05 am How can the script be run on Windows in order to remove all index marks?

Just install the Linux subsystem for Windows Windows Subsystem for Linux. Then you can run bash scripts.
Installing Linux for Windows is beyond the scope of this forum but that search provides links to explain the process.

Or install Perl and use this three-line program. You may find that Perl is already installed on your system.

$/ = undef; $_ = <STDIN>;                  # Read entire file
s/<text:alphabetical-index-mark[^>]+>//g;  # Remove index tags
print $_;                                  # Write updated XML

To use the program
• The Writer document must be saved in ODT format (not DOC or DOCX)
• Ensure that the Writer document is not open in OpenOffice
• Ensure that you have a backup of your ODT file in case you damage it
• UnZIP the ODT file to an empty directory, creating content.xml and other files
• In the directory, run the perl program, reading content.xml and writing content.new
• Delete content.xml then rename content.new to content.xml
• Re-ZIP the files in the directory to the ODT file

Installing Perl and running Perl programs is beyond the scope of this forum but the Perl link above will have more information.

acknak wrote: ↑Mon Oct 15, 2012 6:40 pm
sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g'

Some varieties of Unix/Linux use option -E instead of -r to indicate that this is an extended regular expression.

LDP-OpenOffice · Post by **LDP-OpenOffice** » Thu Nov 17, 2022 7:22 pm

Thank you very much.

Now it is much clearer how indexing is marked up in an .odt file and the file's structure, which is essentially a .zip file. It has 2 records.

The first is <?xml version="1.0" encoding="UTF-8"?>
The second is a very long XML record.

For example, the term ATLANTIS is marked up in it as follows:

<text:alphabetical-index-mark-start text:id="IMark66441952"/>
<text:alphabetical-index-mark-start text:id="IMark66442048"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark-start text:id="IMark66445264"/>
<text:alphabetical-index-mark-start text:id="IMark66281708"/>
<text:span text:style-name="T1">ATLANTIS</text:span>
<text:alphabetical-index-mark-end text:id="IMark66281708"/>
<text:alphabetical-index-mark-end text:id="IMark66445264"/>
<text:alphabetical-index-mark-end text:id="IMark66442048"/>
<text:alphabetical-index-mark-end text:id="IMark66441952"/>

I plan to try writing a gawk or PHP script to remove index related markup.

LDP-OpenOffice · Post by **LDP-OpenOffice** » Sun Nov 20, 2022 7:31 pm

Found the LibreOffice generated xml file is much simpler and easier to process and once the .odt is re-zipped with the processed content.xml file the index marks disappear and there are less problems than with the OpenOffice version file.

open.xml contains 2 records: an XML header and the document. In my case it was a 5MB or so file.

In a first pass I forced each xml statement onto is own record. The second pass discarded indexing related statements. The 2 scrips were written in gawk.

OpenOffice indexes terns which are embedded in words.

Processing content.xml and rebuilding the .odt creates an unexpected problem that is not easy to resolve. For example if "Go" is an index term then "negotiate" will print like this "ne go tiate", etc.

After rebuilding the .odt file OpenOffice could not open it, but LibreOffice could.

[Solved] Delete All Index entries at once

[Solved] Delete All Index entries at once

Re: Delete All Index entries at once

Re: Delete All Index entries at once

Re: Delete All Index entries at once

Re: Delete All Index entries at once

Re: Delete All Index entries at once

Re: Delete All Index entries at once

Re: Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once

Re: [Solved] Delete All Index entries at once