[Solved] CSV : counting number of lines
[Solved] CSV : counting number of lines
Hello forum,
I have a huge CSV file with more 1 million of lines.
If I use a loop to read it, this take a long time.
Do you know a way to count total of lines instantanely?
I have a huge CSV file with more 1 million of lines.
If I use a loop to read it, this take a long time.
Do you know a way to count total of lines instantanely?
Last edited by Mr.Dandy on Wed Sep 02, 2020 11:02 am, edited 1 time in total.
OpenOffice 4.1.12 - Windows 10
Re: CSV : counting number of lines
Store a header at the start of the file that records the number of lines - whenever you modify the file you update it.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: CSV : counting number of lines
In this posting
https://www.codespeedy.com/count-the-nu ... in-python/
you will find code for line-counting in files.
https://www.codespeedy.com/count-the-nu ... in-python/
you will find code for line-counting in files.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
Re: CSV : counting number of lines
You could get an instant answer if you used fixed width fields instead of CSV - simple calculation from the size of the file.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: CSV : counting number of lines
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: CSV : counting number of lines
Thanks folks
I try python solution.
Powershell is only Windows-compatible.
I try python solution.
Powershell is only Windows-compatible.
OpenOffice 4.1.12 - Windows 10
Re: [Solved] CSV : counting number of lines
On Mac, Linux and WIndows with Linux tools:
Code: Select all
wc -l file.csv
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: [Solved] CSV : counting number of lines
Just in case you access database data with the database component:
I believe this works with any table in any type of database connection. Tested with Bibliography of type dBase and HSQL2.
Code: Select all
Sub test_RecordCount()
print getRecordCount(sDB:="Bibliography", sTBL:="biblio", sUSR:="", sPWD:="")
End Sub
Function getRecordCount(sDB, sTBL, sUSR, sPWD) AS Long
on error goto exitErr
dbc = CreateUnoService("com.sun.star.sdb.DatabaseContext")
db = dbc.getByName(sDB)
s = "SELECT COUNT(*) AS ""C"" FROM """& sTBL &""""
con = db.getConnection(sUSR, sPWD)
stm = con.prepareStatement(s)
q = stm.executeQuery()
q.next()
n = q.getLong(1)
exitErr:
con.close()
getRecordCount = n
End Function
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: [Solved] CSV : counting number of lines
OK, the above method is certainly the most complicated method to count lines of a csv file. However, it would work with any table that belongs to any type of database connection.
This is how to read lines with any descendant of (Visual) Basic since the early 90ies:
Why does nobody read any books before writing programs?
This is how to read lines with any descendant of (Visual) Basic since the early 90ies:
Code: Select all
Sub test_RecordCount()
Msgbox getLineCount("file:///tmp/Untitled.csv")
End Sub
Function getLineCount(sURL As String) As Long
Dim n As Long, s As String
FN = Freefile()
Open sURL For Input As #FN
while not eof(#FN)
n = n+1
line input #FN, s
wend
Close #FN
getLineCount = n
End Function
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: [Solved] CSV : counting number of lines
I took the issue to be the last word in the question. The answer is, for a million line CSV file, there isn't one. Get a supercomputer (maybe...)Do you know a way to count total of lines instantanely?
Edit:
My other thought was a hybrid of a CSV file/fixed width record file... where you make all the rows of equal size by adding a last record that pads the row size to a fixed width. You could then find the count instantly from the file size and load it as a CSV... but you'd need to adjust the last record to any other record changes and so on and that would be some work...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: [Solved] CSV : counting number of lines
On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Re: [Solved] CSV : counting number of lines
That's meaningless without the line/file size (?)Villeroy wrote:On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.
(I do suspect I'm going to have to concede here... I ran some tests with my laptop and got fed up with the wait... but mine's a Pentium 2020M)
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: [Solved] CSV : counting number of lines
Don't know perl - I'm guessing that creates a file? With a million unrealistically small 1 record ("say ") size lines?MrProgrammer wrote:
$time perl -nE '}{say " $. $ARGV"' Million.txt
Edit, no say looks to be a command? Anyway - a million lines is meaningless without knowing the file size.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
Re: [Solved] CSV : counting number of lines
My file is 97MB in size. About 100 chars per line.Villeroy wrote:On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.
wc -l takes 0.14 seconds
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
- MrProgrammer
- Moderator
- Posts: 4906
- Joined: Fri Jun 04, 2010 7:57 pm
- Location: Wisconsin, USA
Re: [Solved] CSV : counting number of lines
One cannot accomplish a task in zero time. How quickly is "instantly"? Asking in an OpenOffice forum about how to count the lines in a file suggests to me that this is a case of XY Problem.Mr.Dandy wrote:Do you know a way to count total of lines instantanely?
Perl is a general-purpose Unix scripting language originally developed to make report processing easier which is also available on Linux, MacOS, and Windows. Many tasks can be accomplished quickly and easily with short programs. Perl is included with MacOS, and is included with many Linux distributions.JeJe wrote:Don't know perl
Yes, I should have given the file size. Million0004.txt has 4-byte lines. Million1024.txt has 1K lines. On my machine, the larger file takes about a tenth of a second with wc and less than a second with perl.JeJe wrote:Anyway - a million lines is meaningless without knowing the file size.
$ ls -ln Million*.txt -rw-r--r-- 1 503 20 4000000 Sep 2 15:25 Million0004.txt -rw-r--r-- 1 503 20 1024000000 Sep 2 15:22 Million1024.txt $ for f in Million*.txt; do time wc -l $f; done 1000000 Million0004.txt real 0m0.006s user 0m0.003s sys 0m0.002s 1000000 Million1024.txt real 0m0.713s user 0m0.598s sys 0m0.114s $ for f in Million*.txt; do time perl -nE '}{say " $. $ARGV"' $f; done 1000000 Million0004.txt real 0m0.086s user 0m0.080s sys 0m0.004s 1000000 Million1024.txt real 0m0.902s user 0m0.727s sys 0m0.171s
Or, if speed is of the essence, keep the count in a separate file which is created (using one of the methods above) by the process which generates the CSV file. Then the lines only need to be counted once and any other times the count can be accessed quickly.JeJe wrote:Store a header at the start of the file that records the number of lines - whenever you modify the file you update it.
wc -l <Million1024.txt >Million1024_Count.txt
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel. The locale for any menus or Calc formulas in my posts is English (USA).
Re: [Solved] CSV : counting number of lines
Code: Select all
$ time python countlines.py
1051178
real 0m0,286s
user 0m0,216s
sys 0m0,069s
Code: Select all
with open('/tmp/test/test.csv') as my_file:
print(sum(1 for _ in my_file))
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice