[Solved] Delete All Index entries at once
[Solved] Delete All Index entries at once
Can I delete all the index entries at once, to restart anew? A quick search on the board, and on Google did not return anything, but do delete each entry one by one.
What's the best practice to start experiencing with indexes; to keep the original document without index, and to make a copy of it with an index? I'd rather save myself to click/delete hundredth of entries. Thank you.
What's the best practice to start experiencing with indexes; to keep the original document without index, and to make a copy of it with an index? I'd rather save myself to click/delete hundredth of entries. Thank you.
Last edited by nomnex on Mon Oct 22, 2012 3:21 pm, edited 1 time in total.
Re: Delete All Index entries at once
Click the first index entry and select Edit > Index Entry..., then press and hold CTRL+D until all entries are deleted.
Re: Delete All Index entries at once
the title says delete all Index entries at once, and I am say in the first comment that "the forum search did not return anything, but to delete the entry one by one."
So I assume there is no way to do it.
Hence my second question: I am not very experienced indexing words (this is beyond the scope of the indexing feature in Writer) so I guess I'd rather rather keep a non-indexed copy of my document, in the even I feel the need to restart anew.
So I assume there is no way to do it.
Hence my second question: I am not very experienced indexing words (this is beyond the scope of the indexing feature in Writer) so I guess I'd rather rather keep a non-indexed copy of my document, in the even I feel the need to restart anew.
Re: Delete All Index entries at once
Removing the index marks can be done by a small script which edits the document xml.
I think it probably makes more sense to just keep a backup copy of the document made before starting to experiment with the indexing.
** Use at your own risk! YMMV IANAL etc. **
function rmndx () {
unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}
rmndx mydoc.odt ## creates "mydoc-index.odt" with index marks removed
I think it probably makes more sense to just keep a backup copy of the document made before starting to experiment with the indexing.
** Use at your own risk! YMMV IANAL etc. **
function rmndx () {
unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}
rmndx mydoc.odt ## creates "mydoc-index.odt" with index marks removed
AOO4/LO5 • Linux • Fedora 23
Re: Delete All Index entries at once
http://cygwin.com/acronyms/ > I am really not good with acronyms, but I found my wayacknak wrote:Removing the index marks can be done by a small script which edits the document xml.
** Use at your own risk! YMMV IANAL etc. **
acknak wrote:rmndx mydoc.odt ## creates "mydoc-index.odt" with index marks removedCode: Select all
function rmndx () { unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt; cp "$1" "$n" && zip -j "$n" /tmp/content.xml }
Code: Select all
REM ***** BASIC *****
Sub Main
function rmndx () {
unzip -p "$1" content.xml | sed -r -e 's,<text:alphabetical-index-mark-(start|end)[^>]*>,,g' > /tmp/content.xml && n="$(basename "$1" .odt)"-index.odt;
cp "$1" "$n" && zip -j "$n" /tmp/content.xml
}
End Sub- peterroots
- Volunteer
- Posts: 299
- Joined: Mon Mar 03, 2008 6:33 pm
- Location: UK
Re: Delete All Index entries at once
I think acknak intends this to be run as a script not a basic macro
i.e a bash scriptRemoving the index marks can be done by a small script
LibreOffice 4.0.3 OpenSUSE 12.3 : OpenOffice 4 Linux Mint 15
Re: Delete All Index entries at once
Yeah, sorry, my bad. I wasn't expecting anyone to actually try it 
Just copy/paste the function definition (up to the closing curly bracket) into a shell command line, then use it as in the example below the code. The function will be lost when you close that shell session; you have to repeat the copy/paste if you ever want to use it again (or make arrangements to save it).
Just copy/paste the function definition (up to the closing curly bracket) into a shell command line, then use it as in the example below the code. The function will be lost when you close that shell session; you have to repeat the copy/paste if you ever want to use it again (or make arrangements to save it).
AOO4/LO5 • Linux • Fedora 23
Re: Delete All Index entries at once
Tanks for the correction. I didn't know we could interact with document content using a shell script. But looking at the syntax, even without understanding about regex, it makes sens.
Re: [Solved] Delete All Index entries at once
It's not a good way to do it, but it can be handy. Quick 'n dirty.
Better would be something that actually understands the document format; the script just does a dumb search/remove. If it gets off-track, it will likely corrupt the document.
Better would be something that actually understands the document format; the script just does a dumb search/remove. If it gets off-track, it will likely corrupt the document.
AOO4/LO5 • Linux • Fedora 23
Re: [Solved] Delete All Index entries at once
Replacing '<text:alphabetical-index-mark-(start|end)[^>]*>' did not work for me in OO 4.0.0. It corrupted the (copy of) my document. (Procedure: unzip doc with WinRAR, edit content.xml with jEdit, replace a few hundred index tags, save, compress as a ZIP). What I eventually did was follow acknak's advice: made a copy of my doc with no indexing and used that.
The problem I had is that when you build index markers using a concordance, each 'Update Index/Table' operation appears to *add* index marks, so you can end up with many duplicates. That makes the document size grow, but the real issue is that it makes changing how to index something (for example, if you add a keyword to collect related entries) not work - the item will show up both ways. Ideally, I'd like to see a checkbox in the 'Insert Index/Table' panel that causes OO to remove all index marks before generating new ones from a concordance. Of course, that comes with pitfalls of its own, chiefly, it requires you to either use a concordance or insert index entries manually, or live with duplicates.
The problem I had is that when you build index markers using a concordance, each 'Update Index/Table' operation appears to *add* index marks, so you can end up with many duplicates. That makes the document size grow, but the real issue is that it makes changing how to index something (for example, if you add a keyword to collect related entries) not work - the item will show up both ways. Ideally, I'd like to see a checkbox in the 'Insert Index/Table' panel that causes OO to remove all index marks before generating new ones from a concordance. Of course, that comes with pitfalls of its own, chiefly, it requires you to either use a concordance or insert index entries manually, or live with duplicates.
OO 4.0.0, Win7 HP SP1 64-bit
Re: [Solved] Delete All Index entries at once
@ Acknak,
Please could you spell out for a real dummy, step by step, how your script should be used. All I get is syntax errors.
Many thanks.
Please could you spell out for a real dummy, step by step, how your script should be used. All I get is syntax errors.
Many thanks.
Open Office 4.1 on Ubuntu 14.04
Re: [Solved] Delete All Index entries at once
Open a terminal window.
At the shell prompt, type echo $SHELL Enter. You should see: /bin/bash
If you see something else, start bash: /bin/bash If that doesn't work, or echo $SHELL still doesn't say /bin/bash, turn back now!
At the bash (shell) prompt, paste the code from the post above; press Enter a couple of times until you get another shell prompt.
Type: type rmndx
You should see the function definition displayed on the screen.
If you see any errors at these steps, turn back now!
To actually run the script on a document, type rmndx document.odt
The output will be a new file: document-index.odt
If you see errors when running the script, you may not have all the necessary components installed: zip/unzip, sed, etc.
At the shell prompt, type echo $SHELL Enter. You should see: /bin/bash
If you see something else, start bash: /bin/bash If that doesn't work, or echo $SHELL still doesn't say /bin/bash, turn back now!
At the bash (shell) prompt, paste the code from the post above; press Enter a couple of times until you get another shell prompt.
Type: type rmndx
You should see the function definition displayed on the screen.
If you see any errors at these steps, turn back now!
To actually run the script on a document, type rmndx document.odt
The output will be a new file: document-index.odt
If you see errors when running the script, you may not have all the necessary components installed: zip/unzip, sed, etc.
AOO4/LO5 • Linux • Fedora 23
Re: [Solved] Delete All Index entries at once
Thank you. Much appreciated.
Open Office 4.1 on Ubuntu 14.04
-
LDP-OpenOffice
- Posts: 4
- Joined: Mon Oct 11, 2021 9:12 pm
Re: [Solved] Delete All Index entries at once
How can the script be run on Windows in order to remove all index marks?
OpenOffice 4.1.10 on Windows 10
- MrProgrammer
- Moderator
- Posts: 5349
- Joined: Fri Jun 04, 2010 7:57 pm
- Location: Wisconsin, USA
Re: [Solved] Delete All Index entries at once
Just install the Linux subsystem for Windows Windows Subsystem for Linux. Then you can run bash scripts.LDP-OpenOffice wrote: ↑Wed Nov 16, 2022 2:05 am How can the script be run on Windows in order to remove all index marks?
Installing Linux for Windows is beyond the scope of this forum but that search provides links to explain the process.
Or install Perl and use this three-line program. You may find that Perl is already installed on your system.
$/ = undef; $_ = <STDIN>; # Read entire file
s/<text:alphabetical-index-mark[^>]+>//g; # Remove index tags
print $_; # Write updated XMLTo use the program
• The Writer document must be saved in ODT format (not DOC or DOCX)
• Ensure that the Writer document is not open in OpenOffice
• Ensure that you have a backup of your ODT file in case you damage it
• UnZIP the ODT file to an empty directory, creating content.xml and other files
• In the directory, run the perl program, reading content.xml and writing content.new
• Delete content.xml then rename content.new to content.xml
• Re-ZIP the files in the directory to the ODT file
Installing Perl and running Perl programs is beyond the scope of this forum but the Perl link above will have more information.
Some varieties of Unix/Linux use option -E instead of -r to indicate that this is an extended regular expression.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.6, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
AOO 4.1.7 Build 9800, MacOS 13.7.6, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
-
LDP-OpenOffice
- Posts: 4
- Joined: Mon Oct 11, 2021 9:12 pm
Re: [Solved] Delete All Index entries at once
Thank you very much.
Now it is much clearer how indexing is marked up in an .odt file and the file's structure, which is essentially a .zip file. It has 2 records.
The first is <?xml version="1.0" encoding="UTF-8"?>
The second is a very long XML record.
For example, the term ATLANTIS is marked up in it as follows:
<text:alphabetical-index-mark-start text:id="IMark66441952"/>
<text:alphabetical-index-mark-start text:id="IMark66442048"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark-start text:id="IMark66445264"/>
<text:alphabetical-index-mark-start text:id="IMark66281708"/>
<text:span text:style-name="T1">ATLANTIS</text:span>
<text:alphabetical-index-mark-end text:id="IMark66281708"/>
<text:alphabetical-index-mark-end text:id="IMark66445264"/>
<text:alphabetical-index-mark-end text:id="IMark66442048"/>
<text:alphabetical-index-mark-end text:id="IMark66441952"/>
I plan to try writing a gawk or PHP script to remove index related markup.
Now it is much clearer how indexing is marked up in an .odt file and the file's structure, which is essentially a .zip file. It has 2 records.
The first is <?xml version="1.0" encoding="UTF-8"?>
The second is a very long XML record.
For example, the term ATLANTIS is marked up in it as follows:
<text:alphabetical-index-mark-start text:id="IMark66441952"/>
<text:alphabetical-index-mark-start text:id="IMark66442048"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark text:string-value="ATLANTIS"/>
<text:alphabetical-index-mark-start text:id="IMark66445264"/>
<text:alphabetical-index-mark-start text:id="IMark66281708"/>
<text:span text:style-name="T1">ATLANTIS</text:span>
<text:alphabetical-index-mark-end text:id="IMark66281708"/>
<text:alphabetical-index-mark-end text:id="IMark66445264"/>
<text:alphabetical-index-mark-end text:id="IMark66442048"/>
<text:alphabetical-index-mark-end text:id="IMark66441952"/>
I plan to try writing a gawk or PHP script to remove index related markup.
OpenOffice 4.1.10 on Windows 10
-
LDP-OpenOffice
- Posts: 4
- Joined: Mon Oct 11, 2021 9:12 pm
Re: [Solved] Delete All Index entries at once
Found the LibreOffice generated xml file is much simpler and easier to process and once the .odt is re-zipped with the processed content.xml file the index marks disappear and there are less problems than with the OpenOffice version file.
open.xml contains 2 records: an XML header and the document. In my case it was a 5MB or so file.
In a first pass I forced each xml statement onto is own record. The second pass discarded indexing related statements. The 2 scrips were written in gawk.
OpenOffice indexes terns which are embedded in words.
Processing content.xml and rebuilding the .odt creates an unexpected problem that is not easy to resolve. For example if "Go" is an index term then "negotiate" will print like this "ne go tiate", etc.
After rebuilding the .odt file OpenOffice could not open it, but LibreOffice could.
open.xml contains 2 records: an XML header and the document. In my case it was a 5MB or so file.
In a first pass I forced each xml statement onto is own record. The second pass discarded indexing related statements. The 2 scrips were written in gawk.
OpenOffice indexes terns which are embedded in words.
Processing content.xml and rebuilding the .odt creates an unexpected problem that is not easy to resolve. For example if "Go" is an index term then "negotiate" will print like this "ne go tiate", etc.
After rebuilding the .odt file OpenOffice could not open it, but LibreOffice could.
OpenOffice 4.1.10 on Windows 10