[Solved] Clean up several thousand dates

Discuss the spreadsheet application
Locked
AfTech54
Posts: 71
Joined: Tue Dec 31, 2013 10:08 am

[Solved] Clean up several thousand dates

Post by AfTech54 »

I need to clean up and sort several thousand of dates with time (Column A ) so that it looks like Column B.
I have tried the following:

Removed the time by replacing [\d][\d][:][\d][\d] or [:digit:]{2}[:][:digit:]{2} with "Replace with" blank/no characters/null

Then the time (and day of the week sometimes??) disappear.

If the days of the week remained, I tried replacing *day with "Replace with" blank/no characters/null, it didn't work so I had to take day by day.

Then I formatted the column with the English date format DD MMMM YYYY to get a two-digit day number (to be able to order the dates). It only worked when the day of the week and timestamp was removed. Sometimes it didn't add the zero befor a single digit day date??

It doesn't work well maybe because there are so many rows??

Then I have also tried replacing single digit day dates ¨[^:digit:] with 0& to get two digit day no:s, one row works but not several at the same time.

Probably this old man don't understanding this or maybe I messing it up in some way.

Does anyone have good tips on how to do this most smoothly and accurate, working each time?
Datumsortering.jpg
Datumsortering.jpg (94.96 KiB) Viewed 5726 times
Attachments
Datum all years.ods
(52.96 KiB) Downloaded 137 times
Last edited by AfTech54 on Sun Dec 29, 2024 12:02 am, edited 1 time in total.
Ooo v4.1.15, Windows 10
User avatar
Villeroy
Volunteer
Posts: 31363
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Clean up and sort several thousand of dates

Post by Villeroy »

I had to shorten the list because of the max. size of uploads. It worked with all your strings.
P.S. wrong reference in column E. This happened when I inserted a new row for the column headers.
Replace $J$1:$J$13 with $J$2:$J$13
Attachments
Datum_all_years_Villeroy.ods
(56.03 KiB) Downloaded 123 times
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
AfTech54
Posts: 71
Joined: Tue Dec 31, 2013 10:08 am

Re: Clean up and sort several thousand of dates

Post by AfTech54 »

Thanks again Villeroy!

As I wrote probably this old man ...

But I still have problem. It seems like I have to and want to get rid of days and time stamps to get a zero in front.
No zero, jpg : I tried to use format
Add zero reg... : tried to add zero with regexp

Your're right *day worked if I put it in "staples" [*day], but the old man wanted to get rid of the whole name so a little of brain drain again :D

can you suggest a smooth way to get rid of the names of the days and timestamp?
I tried to replace several days Monday; Tuesday ... in one replacement but it didn't work.
No zero.jpg
No zero.jpg (121.84 KiB) Viewed 5674 times
Add zero RegExp.jpg
Add zero RegExp.jpg (123.65 KiB) Viewed 5674 times
Ooo v4.1.15, Windows 10
User avatar
MrProgrammer
Moderator
Posts: 5431
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: Clean up and sort several thousand of dates

Post by MrProgrammer »

AfTech54 wrote: Sat Dec 28, 2024 7:20 pm Does anyone have good tips on how to do this most smoothly and accurate, working each time?
This will be easy using standard OpenOffice features which have been available for decades. Calc can't interpret dates which contain the day of the week, so you have to remove those. As written, Calc regards your cell values as text, not dates. I used the following procedure on your attachment, sheet Ark2, column A.
  • Step 1: Select the column, Edit → Find & Replace → More options, select Current selection only and Regular Expressions, Search for ([^ ]+ )([^ ]+ )([^ ]+ ).+, Replace with $1$2$3, Replace All. There is a space after each ^, and a space after each + except the last one. You can copy/paste the regular expression from this web page to the Find & Replace dialog box. I recommend unchecking Current selection only and Regular expressions, before clicking Close. This changes, for example 13 December 2019 Friday 10:24 to 13 December 2019 . In my locale, English (Canada), 13 December 2019 is still text, not a recognizable date.
  • Step 2: Select the column, Data → Text to Columns, Click Standard, set the column type to Date (DMY), OK. This sets the correct values for the cells.
[Tutorial] Text to Columns

Now you have a column of Calc dates and you can format them any way you prefer. Format the dates after you have the correct values in the cells, not before. You can never change cell values by setting their format. You will be able to sort the dates after the cells contain the correct values.

For Date (DMY) to understand the month names in English I presume you will need to ensure that you are using an English-based locale. If your default locale is not based on the English language, I believe you can set that for the cells in Format → Cells → Numbers → Language. I have no good way to test that since my default locale is English (Canada).
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
Villeroy
Volunteer
Posts: 31363
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Clean up and sort several thousand of dates

Post by Villeroy »

Please fix my error in column E and replace the wrong reference $J$1:$J$13 with $J$2:$J$13.
My formula uses the DATE function to calculate the correct number from 3 numbers representing the year, month and day.
The English month name is looked up in $J$2:$J$13, so the locale setting does not matter.
Copy the results and paste-special values in order to convert formulas to constant values.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
Posts: 31363
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Clean up and sort several thousand of dates

Post by Villeroy »

Alternative approach:
Tools>Options>Language settings. Switch the locale setting to any English except US-English.
B1: =FIND(" ",$A1) [find first space]
C1: =FIND(" ",$A1;$B1) [find second space]
D1: =VALUE(LEFT($A1,$C1+4)) calculates the correct value.
Drag down formulas.
Copy the results, paste-special values and format the numbers to your liking.
Switch back to your preferred locale setting.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Lupp
Volunteer
Posts: 3756
Joined: Sat May 31, 2014 7:05 pm
Location: München, Germany

Re: Clean up and sort several thousand of dates

Post by Lupp »

Your supposed date-time values are text, and given in a format unusable in formulas or for sorting or filtering.
Treating data-time as text is basically a good idea, but to earn the fruits you need to use a locale-independent and globally understood format. ISO 8601 describes the standard, and software can also recognize some very similar variants.

Your question should therefore be: "How can I convert fancy pseudo data for date-time to usable data?"

To do in AOO Calc sheets would require a lot of additional formulas in helper columns or very complicated formulas.
In LibreOffice Calc V 6.2 or higher you can use the REGEX() function for relevant parts of the task.
Due to the size of the sheets I can't attach a full solution. The attached example contains a heavily abbreviated version.
Once again: It only works in LibO V 6.2 or higher.
aoo112276convertFancyLocaleDatetimeToUsable.ods
(107.55 KiB) Downloaded 128 times
(Sorry. I was distracted, and posted finally without having noticed and studied the posts by others,)
On Windows 10: LibreOffice 25.8.4 and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
AfTech54
Posts: 71
Joined: Tue Dec 31, 2013 10:08 am

Re: [Solved] Clean up and sort several thousand of dates

Post by AfTech54 »

THANKS all of you!!
Ooo v4.1.15, Windows 10
AfTech54
Posts: 71
Joined: Tue Dec 31, 2013 10:08 am

Re: [Solved] Clean up and sort several thousand of dates

Post by AfTech54 »

Well MrProgrammer, it worked fine!!
But can you spare this old Swede a minute or two just to explain why the RegExp doesn't recognize the first spaces in DD_MMMM_YYYY?
I asume that [^ ] should find the first space.
And I don't understand what $1$2$3 does/means, even though it works fine.
If you have time I 'll be grateful if you could explain the whole regexp, tried to read the Help, but it's hard to understand.
Ooo v4.1.15, Windows 10
User avatar
MrProgrammer
Moderator
Posts: 5431
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: [Solved] Clean up and sort several thousand of dates

Post by MrProgrammer »

AfTech54 wrote: Sun Dec 29, 2024 12:38 am But can you spare this old Swede a minute or two just to explain why the RegExp doesn't recognize the first spaces in DD_MMMM_YYYY? I assume that [^ ] should find the first space.
No, that's not the meaning of [^ ]. The link I gave you for Regular Expressions is old, but the information there covers my regular expression.
     ([^ ]+ )([^ ]+ )([^ ]+ ).+
The expression in parentheses means: find a non-space, the [^ ], 1 or more times, the +, then a space. So, the pattern is a string of non-spaces followed by a space to match the day of the month and its following space. This matching is repeated two more times for the month and the year. The parentheses capture those strings of text into variables $1, $2, and $3. Then .+ matches everything else so the RegEx will match the entire value of the cell. Since .+ was matched but not captured, the final part of the text (the day of the week and the time) is gone when the $1$2$3 replacement is done. In hindsight I should have used
     ([^ ]+ [^ ]+ [^ ]+).+
since I don't need to do three captures, nor capture the space after the year. Then the replacement is just $1.

AfTech54 wrote: Sun Dec 29, 2024 12:38 am … tried to read the Help, but it's hard to understand.
Agreed. The Wikipedia article for Regular expressions does a better job of explaining the important concepts, especialy sections POSIX basic and extended and Metacharacters in POSIX extended. But one has to be careful with regular expressions since there are multiple implementations of this idea which both extend and limit the basic concepts. The implementation of regular expressions has undergone several changes in OpenOffice. Current regular expressions in OpenOffice version 4 support many of the features of ICU regular expressions. I haven't seen any really good documentation from OpenOffice for the current implementation. The team just doesn't have the resources to create that. For the most part I stick to the basics. I believe that the regular expressions in this post have been supported for 40 years, perhaps 50 years, and thus pre-date OpenOffice by a decade or two.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
AfTech54
Posts: 71
Joined: Tue Dec 31, 2013 10:08 am

Re: [Solved] Clean up and sort several thousand of dates

Post by AfTech54 »

THANKS again!!!

I've red Wiki and tested the expr in a regex tester, so now I understand, very smooth! And there will always be a $x variable, storing the pinpointed, string for each parentes/group numbered 1 to x from left to right in an expr.
Ooo v4.1.15, Windows 10
Locked