[Solved] Clean up several thousand dates
[Solved] Clean up several thousand dates
I need to clean up and sort several thousand of dates with time (Column A ) so that it looks like Column B.
I have tried the following:
Removed the time by replacing [\d][\d][:][\d][\d] or [:digit:]{2}[:][:digit:]{2} with "Replace with" blank/no characters/null
Then the time (and day of the week sometimes??) disappear.
If the days of the week remained, I tried replacing *day with "Replace with" blank/no characters/null, it didn't work so I had to take day by day.
Then I formatted the column with the English date format DD MMMM YYYY to get a two-digit day number (to be able to order the dates). It only worked when the day of the week and timestamp was removed. Sometimes it didn't add the zero befor a single digit day date??
It doesn't work well maybe because there are so many rows??
Then I have also tried replacing single digit day dates ¨[^:digit:] with 0& to get two digit day no:s, one row works but not several at the same time.
Probably this old man don't understanding this or maybe I messing it up in some way.
Does anyone have good tips on how to do this most smoothly and accurate, working each time?
I have tried the following:
Removed the time by replacing [\d][\d][:][\d][\d] or [:digit:]{2}[:][:digit:]{2} with "Replace with" blank/no characters/null
Then the time (and day of the week sometimes??) disappear.
If the days of the week remained, I tried replacing *day with "Replace with" blank/no characters/null, it didn't work so I had to take day by day.
Then I formatted the column with the English date format DD MMMM YYYY to get a two-digit day number (to be able to order the dates). It only worked when the day of the week and timestamp was removed. Sometimes it didn't add the zero befor a single digit day date??
It doesn't work well maybe because there are so many rows??
Then I have also tried replacing single digit day dates ¨[^:digit:] with 0& to get two digit day no:s, one row works but not several at the same time.
Probably this old man don't understanding this or maybe I messing it up in some way.
Does anyone have good tips on how to do this most smoothly and accurate, working each time?
- Attachments
-
- Datum all years.ods
- (52.96 KiB) Downloaded 137 times
Last edited by AfTech54 on Sun Dec 29, 2024 12:02 am, edited 1 time in total.
Ooo v4.1.15, Windows 10
Re: Clean up and sort several thousand of dates
I had to shorten the list because of the max. size of uploads. It worked with all your strings.
P.S. wrong reference in column E. This happened when I inserted a new row for the column headers.
Replace $J$1:$J$13 with $J$2:$J$13
P.S. wrong reference in column E. This happened when I inserted a new row for the column headers.
Replace $J$1:$J$13 with $J$2:$J$13
- Attachments
-
- Datum_all_years_Villeroy.ods
- (56.03 KiB) Downloaded 123 times
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Clean up and sort several thousand of dates
Thanks again Villeroy!
As I wrote probably this old man ...
But I still have problem. It seems like I have to and want to get rid of days and time stamps to get a zero in front.
No zero, jpg : I tried to use format
Add zero reg... : tried to add zero with regexp
Your're right *day worked if I put it in "staples" [*day], but the old man wanted to get rid of the whole name so a little of brain drain again
can you suggest a smooth way to get rid of the names of the days and timestamp?
I tried to replace several days Monday; Tuesday ... in one replacement but it didn't work.
As I wrote probably this old man ...
But I still have problem. It seems like I have to and want to get rid of days and time stamps to get a zero in front.
No zero, jpg : I tried to use format
Add zero reg... : tried to add zero with regexp
Your're right *day worked if I put it in "staples" [*day], but the old man wanted to get rid of the whole name so a little of brain drain again
can you suggest a smooth way to get rid of the names of the days and timestamp?
I tried to replace several days Monday; Tuesday ... in one replacement but it didn't work.
Ooo v4.1.15, Windows 10
- MrProgrammer
- Moderator
- Posts: 5431
- Joined: Fri Jun 04, 2010 7:57 pm
- Location: Wisconsin, USA
Re: Clean up and sort several thousand of dates
This will be easy using standard OpenOffice features which have been available for decades. Calc can't interpret dates which contain the day of the week, so you have to remove those. As written, Calc regards your cell values as text, not dates. I used the following procedure on your attachment, sheet Ark2, column A.
- Step 1: Select the column, Edit → Find & Replace → More options, select Current selection only and Regular Expressions, Search for ([^ ]+ )([^ ]+ )([^ ]+ ).+, Replace with $1$2$3, Replace All. There is a space after each ^, and a space after each + except the last one. You can copy/paste the regular expression from this web page to the Find & Replace dialog box. I recommend unchecking Current selection only and Regular expressions, before clicking Close. This changes, for example 13 December 2019 Friday 10:24 to 13 December 2019 . In my locale, English (Canada), 13 December 2019 is still text, not a recognizable date.
- Step 2: Select the column, Data → Text to Columns, Click Standard, set the column type to Date (DMY), OK. This sets the correct values for the cells.
Now you have a column of Calc dates and you can format them any way you prefer. Format the dates after you have the correct values in the cells, not before. You can never change cell values by setting their format. You will be able to sort the dates after the cells contain the correct values.
For Date (DMY) to understand the month names in English I presume you will need to ensure that you are using an English-based locale. If your default locale is not based on the English language, I believe you can set that for the cells in Format → Cells → Numbers → Language. I have no good way to test that since my default locale is English (Canada).
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
Re: Clean up and sort several thousand of dates
Please fix my error in column E and replace the wrong reference $J$1:$J$13 with $J$2:$J$13.
My formula uses the DATE function to calculate the correct number from 3 numbers representing the year, month and day.
The English month name is looked up in $J$2:$J$13, so the locale setting does not matter.
Copy the results and paste-special values in order to convert formulas to constant values.
My formula uses the DATE function to calculate the correct number from 3 numbers representing the year, month and day.
The English month name is looked up in $J$2:$J$13, so the locale setting does not matter.
Copy the results and paste-special values in order to convert formulas to constant values.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Clean up and sort several thousand of dates
Alternative approach:
Tools>Options>Language settings. Switch the locale setting to any English except US-English.
B1: =FIND(" ",$A1) [find first space]
C1: =FIND(" ",$A1;$B1) [find second space]
D1: =VALUE(LEFT($A1,$C1+4)) calculates the correct value.
Drag down formulas.
Copy the results, paste-special values and format the numbers to your liking.
Switch back to your preferred locale setting.
Tools>Options>Language settings. Switch the locale setting to any English except US-English.
B1: =FIND(" ",$A1) [find first space]
C1: =FIND(" ",$A1;$B1) [find second space]
D1: =VALUE(LEFT($A1,$C1+4)) calculates the correct value.
Drag down formulas.
Copy the results, paste-special values and format the numbers to your liking.
Switch back to your preferred locale setting.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: Clean up and sort several thousand of dates
Your supposed date-time values are text, and given in a format unusable in formulas or for sorting or filtering.
Treating data-time as text is basically a good idea, but to earn the fruits you need to use a locale-independent and globally understood format. ISO 8601 describes the standard, and software can also recognize some very similar variants.
Your question should therefore be: "How can I convert fancy pseudo data for date-time to usable data?"
To do in AOO Calc sheets would require a lot of additional formulas in helper columns or very complicated formulas.
In LibreOffice Calc V 6.2 or higher you can use the REGEX() function for relevant parts of the task.
Due to the size of the sheets I can't attach a full solution. The attached example contains a heavily abbreviated version.
Once again: It only works in LibO V 6.2 or higher. (Sorry. I was distracted, and posted finally without having noticed and studied the posts by others,)
Treating data-time as text is basically a good idea, but to earn the fruits you need to use a locale-independent and globally understood format. ISO 8601 describes the standard, and software can also recognize some very similar variants.
Your question should therefore be: "How can I convert fancy pseudo data for date-time to usable data?"
To do in AOO Calc sheets would require a lot of additional formulas in helper columns or very complicated formulas.
In LibreOffice Calc V 6.2 or higher you can use the REGEX() function for relevant parts of the task.
Due to the size of the sheets I can't attach a full solution. The attached example contains a heavily abbreviated version.
Once again: It only works in LibO V 6.2 or higher. (Sorry. I was distracted, and posted finally without having noticed and studied the posts by others,)
On Windows 10: LibreOffice 25.8.4 and older versions, PortableOpenOffice 4.1.7 and older, StarOffice 5.2
---
Lupp from München
---
Lupp from München
Re: [Solved] Clean up and sort several thousand of dates
THANKS all of you!!
Ooo v4.1.15, Windows 10
Re: [Solved] Clean up and sort several thousand of dates
Well MrProgrammer, it worked fine!!
But can you spare this old Swede a minute or two just to explain why the RegExp doesn't recognize the first spaces in DD_MMMM_YYYY?
I asume that [^ ] should find the first space.
And I don't understand what $1$2$3 does/means, even though it works fine.
If you have time I 'll be grateful if you could explain the whole regexp, tried to read the Help, but it's hard to understand.
But can you spare this old Swede a minute or two just to explain why the RegExp doesn't recognize the first spaces in DD_MMMM_YYYY?
I asume that [^ ] should find the first space.
And I don't understand what $1$2$3 does/means, even though it works fine.
If you have time I 'll be grateful if you could explain the whole regexp, tried to read the Help, but it's hard to understand.
Ooo v4.1.15, Windows 10
- MrProgrammer
- Moderator
- Posts: 5431
- Joined: Fri Jun 04, 2010 7:57 pm
- Location: Wisconsin, USA
Re: [Solved] Clean up and sort several thousand of dates
No, that's not the meaning of [^ ]. The link I gave you for Regular Expressions is old, but the information there covers my regular expression.
([^ ]+ )([^ ]+ )([^ ]+ ).+The expression in parentheses means: find a non-space, the [^ ], 1 or more times, the +, then a space. So, the pattern is a string of non-spaces followed by a space to match the day of the month and its following space. This matching is repeated two more times for the month and the year. The parentheses capture those strings of text into variables $1, $2, and $3. Then .+ matches everything else so the RegEx will match the entire value of the cell. Since .+ was matched but not captured, the final part of the text (the day of the week and the time) is gone when the $1$2$3 replacement is done. In hindsight I should have used
([^ ]+ [^ ]+ [^ ]+).+since I don't need to do three captures, nor capture the space after the year. Then the replacement is just $1.
Agreed. The Wikipedia article for Regular expressions does a better job of explaining the important concepts, especialy sections POSIX basic and extended and Metacharacters in POSIX extended. But one has to be careful with regular expressions since there are multiple implementations of this idea which both extend and limit the basic concepts. The implementation of regular expressions has undergone several changes in OpenOffice. Current regular expressions in OpenOffice version 4 support many of the features of ICU regular expressions. I haven't seen any really good documentation from OpenOffice for the current implementation. The team just doesn't have the resources to create that. For the most part I stick to the basics. I believe that the regular expressions in this post have been supported for 40 years, perhaps 50 years, and thus pre-date OpenOffice by a decade or two.
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
AOO 4.1.7 Build 9800, MacOS 13.7.8, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
Re: [Solved] Clean up and sort several thousand of dates
THANKS again!!!
I've red Wiki and tested the expr in a regex tester, so now I understand, very smooth! And there will always be a $x variable, storing the pinpointed, string for each parentes/group numbered 1 to x from left to right in an expr.
I've red Wiki and tested the expr in a regex tester, so now I understand, very smooth! And there will always be a $x variable, storing the pinpointed, string for each parentes/group numbered 1 to x from left to right in an expr.
Ooo v4.1.15, Windows 10