Page 1 of 1

[Solved] Selecting columns containing multiple keywords

Posted: Mon Jan 25, 2021 7:56 pm
by banathy
Hi! Absolute beginner here... I have TONS of spredsheets containing metadata from tons of files, documents, images, videos, mp3s. Each sheet has hundreds of columns. I need only the columns containing FileName, FileSize, CreateDate, Resolution, FileType, PageCount, Bitrate and a few more.
I found a macro to delete specific columns, based on the column letter, but the problem is that in one sheet the FileName is in column B, in an other sheet it can be in D or Z or anywhere. Is there a macro or filter I can apply to keep only the columns containing specific text (FileName, FileSize, CreateDate, Resolution, FileType, PageCount, Bitrate), or to copy these columns to a new spreadsheet?
How would you aproach this? Thank you!

Re: Selecting columns containing multiple keywords

Posted: Mon Jan 25, 2021 8:52 pm
by RusselB
Welcome to the Forums.
I'd start by using a better app, namely a database, rather than a spreadsheet.

Re: Selecting columns containing multiple keywords

Posted: Tue Jan 26, 2021 5:10 pm
by MrProgrammer
Hi, and welcome to the forum.
banathy wrote:Absolute beginner here
[Tutorial] Ten concepts that every Calc user should know
banathy wrote:Is there a macro or filter I can apply to keep only the columns containing specific text (FileName, FileSize, CreateDate, Resolution, FileType, PageCount, Bitrate), or to copy these columns to a new spreadsheet?
You can copy them to a new sheet with a filter.
[Tutorial] Sorting and Filtering data with formulas
banathy wrote:I have TONS of [spreadsheets] containing metadata from tons of files, documents, images, videos, mp3s. Each sheet has hundreds of columns. … I need only the columns containing …
Did you type these megabytes of data yourself into Calc? If so, why did you enter a lot of data which you do not want?

Or was it actually imported into Calc from some external source? Which external source, very specifically? CSV files? Some damned Windows resource? Which resource, specifically? A database? Which database, specifically? Why do you not give us the full details about your problem?

Re: Selecting columns containing multiple keywords

Posted: Tue Jan 26, 2021 6:22 pm
by banathy
Thank you! :)
I have 4 TB worth of unsorted e-books, documents, images, videos, sound files in my university's library. Basically 20 years of bad data management. Now they want to catalogue all of them. Yesterday I managed to pull out all the metadata from every file from a single folder into a csv file. I had lots of csv files containing a lot of irrelevant tags in columns and I was trying to put all the columns I need into a single sheet and get rid of the rest. Today I figured it out how to pull out only the relevant metadata tags from the files and I generated new csv-s. Now I have to figure out, how to merge the csv files, and how to enter each file's filepath. I've never done anything of the sorts in my life, so I'm making it up as I go :D

Re: [Solved] Selecting columns containing multiple keywords

Posted: Tue Jan 26, 2021 9:33 pm
by jrkrideau
Once you get a bit of the mess tidied up you really need to look into something like a data base or the same mess will happen again.

Re: [Solved] Selecting columns containing multiple keywords

Posted: Wed Jan 27, 2021 12:46 am
by robleyd
how to merge the csv files
Make sure that the data structure of all the files you wish to combine is identical, otherwise you will have issues! Give all the files you wish to merge a common filename suffix, e.g. .csv and place them in the same directory. Navigate to that directory with file manager and open a command widow via right click. Then just issue the command

Code: Select all

copy *.csv merged.csv
If you need the ELI5 version of these instructions, with pictures, see https://www.wikihow.com/Merge-Text-(.Txt)-Files-in-Command-Prompt
You'll need to copy/paste the URL as the parentheses in it break the automatic link creation.

Re: [Solved] Selecting columns containing multiple keywords

Posted: Wed Jan 27, 2021 1:46 am
by banathy
Thank you so much, guys!
I'm almost done generating the csv-s for ALL folders. Hundreds and hundreds. Yeah, it is a nightmare. Exiftool doesn't likes folder names with spaces, so I have to rename them, run Exiftool then I have to revert to the original name. The csv file doesn't contain the full file path, just the (renamed) folder's name. So I will run find/replace to find the folder name and replace it with the full path. Afterwards I will try to merge the csv-s. Luckily the data structure is the same in all of them but my spider sense is tingling about the different special characters in the metadata. Hungarian, Turkish, Sanskrit... Oh well, we'll see.
My name is Attila and I will be an alcoholic.

Re: [Solved] Selecting columns containing multiple keywords

Posted: Wed Jan 27, 2021 2:43 pm
by DiGro
I always liked working with Sam Francke's CSVed when dealing with *.csv

https://csved.sjfrancke.nl/index.html

Re: [Solved] Selecting columns containing multiple keywords

Posted: Thu Jan 28, 2021 9:58 am
by Mountaineer
banathy wrote:Thank you so much, guys!
I'm almost done generating the csv-s for ALL folders. Hundreds and hundreds. Yeah, it is a nightmare. Exiftool doesn't likes folder names with spaces, so I have to rename them, run Exiftool then I have to revert to the original name. The csv file doesn't contain the full file path, just the (renamed) folder's name. ...
It may be to late for your case, but Exiftool accepts quoted filenames and in my case i used the recursive scan of all subfolders - no Problem with spaces here.
As I remember ther are names for filename and path, so your workaround shouldn't be necessary.

If you are interested I can find and post my script to access the collection of 80000 photos 4 photographers left behind in my archive at work...

J.

Re: [Solved] Selecting columns containing multiple keywords

Posted: Thu Jan 28, 2021 10:21 am
by banathy
It may be to late for your case, but Exiftool accepts quoted filenames and in my case i used the recursive scan of all subfolders - no Problem with spaces here.
As I remember ther are names for filename and path, so your workaround shouldn't be necessary.

If you are interested I can find and post my script to access the collection of 80000 photos 4 photographers left behind in my archive at work...
Oh, it is definitely not to late, there are several other hard drives I haven't even touched yet... It would be amazing to see your solution. I started to suspect there may be a solution like this, but I never worked with spreadsheets or code or Exiftool before last week, all of this is pretty new to me.

Re: [Solved] Selecting columns containing multiple keywords

Posted: Thu Jan 28, 2021 1:13 pm
by Mountaineer

Code: Select all

@echo off
REM Extracts Exif-Tags for all Files in an Directory given as Argument to csv-File
REM !!  -r = recursive scan / omitt, if necessary

IF NOT EXIST P:\ExifTool\ GOTO NOPORT	
P:
cd p:\ExifTool\

Echo Scan of %1
exiftool.exe -r -csv -SourceFile -FileName -DateTimeOriginal -Model -ImageWidth -ImageHeight -Orientation -PhotometricInterpretation -Keywords %1 >%1\_EXIFData.csv
goto ENDE

:NOPORT
echo "Portable P:ExifTool missing"
goto :ENDE

:ENDE
pause
exit
This is a Windows-Batch. As I didn't installed exiftool for this, I mount my portable Tools at P: on Windows - Therfore the NOPORT-Part

Exiftool:
-r recursion
-csv output as csv file
SourceFile give the full path, filename only the name with extension
the other parameters select EXIF-Data I needed for this run

%1 >%1\_EXIFData.csv scans the given directory, output is directed to the named File in the directory; % is Windows Batch - adapt for bash

In a separate batch I collect/append all csv-Files and import them afterwards to a sqlite database.
It is also possible to register the resulting csv as Text-Database in AOO/LO-Base for direct access or simple queries,
but a real SQL-database is much more flexible and trustworthy.
(Edit: Early version to access csv as Text-Database from a german forum: https://www.libreoffice-forum.de/viewto ... XIF#p41302 )
J.