[Solved] CSV : counting number of lines

Creating a macro - Writing a Script - Using the API (OpenOffice Basic, Python, BeanShell, JavaScript)
Post Reply
User avatar
Mr.Dandy
Posts: 427
Joined: Tue Dec 11, 2012 4:22 pm

[Solved] CSV : counting number of lines

Post by Mr.Dandy »

Hello forum,

I have a huge CSV file with more 1 million of lines.
If I use a loop to read it, this take a long time.

Do you know a way to count total of lines instantanely?
Last edited by Mr.Dandy on Wed Sep 02, 2020 11:02 am, edited 1 time in total.
OpenOffice 4.1.12 - Windows 10
JeJe
Volunteer
Posts: 2780
Joined: Wed Mar 09, 2016 2:40 pm

Re: CSV : counting number of lines

Post by JeJe »

Store a header at the start of the file that records the number of lines - whenever you modify the file you update it.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
RoryOF
Moderator
Posts: 34613
Joined: Sat Jan 31, 2009 9:30 pm
Location: Ireland

Re: CSV : counting number of lines

Post by RoryOF »

In this posting
https://www.codespeedy.com/count-the-nu ... in-python/
you will find code for line-counting in files.
Apache OpenOffice 4.1.15 on Xubuntu 22.04.4 LTS
JeJe
Volunteer
Posts: 2780
Joined: Wed Mar 09, 2016 2:40 pm

Re: CSV : counting number of lines

Post by JeJe »

You could get an instant answer if you used fixed width fields instead of CSV - simple calculation from the size of the file.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: CSV : counting number of lines

Post by Villeroy »

Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Mr.Dandy
Posts: 427
Joined: Tue Dec 11, 2012 4:22 pm

Re: CSV : counting number of lines

Post by Mr.Dandy »

Thanks folks
I try python solution.
Powershell is only Windows-compatible.
OpenOffice 4.1.12 - Windows 10
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Post by Villeroy »

On Mac, Linux and WIndows with Linux tools:

Code: Select all

wc -l file.csv
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Post by Villeroy »

Just in case you access database data with the database component:

Code: Select all

Sub test_RecordCount()
	print getRecordCount(sDB:="Bibliography", sTBL:="biblio", sUSR:="", sPWD:="") 
End Sub

Function getRecordCount(sDB, sTBL, sUSR, sPWD) AS Long
on error goto exitErr
	dbc = CreateUnoService("com.sun.star.sdb.DatabaseContext")
	db = dbc.getByName(sDB)
	s = "SELECT COUNT(*) AS ""C"" FROM """& sTBL &""""
	con = db.getConnection(sUSR, sPWD)
	stm = con.prepareStatement(s)
	q = stm.executeQuery()
	q.next()
	n = q.getLong(1)
exitErr:
	con.close()
	getRecordCount = n
End Function
I believe this works with any table in any type of database connection. Tested with Bibliography of type dBase and HSQL2.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Post by Villeroy »

OK, the above method is certainly the most complicated method to count lines of a csv file. However, it would work with any table that belongs to any type of database connection.

This is how to read lines with any descendant of (Visual) Basic since the early 90ies:

Code: Select all

Sub test_RecordCount()
	Msgbox getLineCount("file:///tmp/Untitled.csv") 
End Sub

Function getLineCount(sURL As String) As Long
Dim n As Long, s As String
	FN = Freefile()
	Open sURL For Input As #FN
	while not eof(#FN)
		n = n+1
		line input #FN, s
	wend
	Close #FN
	getLineCount = n
End Function
Why does nobody read any books before writing programs?
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
JeJe
Volunteer
Posts: 2780
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Solved] CSV : counting number of lines

Post by JeJe »

Do you know a way to count total of lines instantanely?
I took the issue to be the last word in the question. The answer is, for a million line CSV file, there isn't one. Get a supercomputer (maybe...)

Edit:
My other thought was a hybrid of a CSV file/fixed width record file... where you make all the rows of equal size by adding a last record that pads the row size to a fixed width. You could then find the count instantly from the file size and load it as a CSV... but you'd need to adjust the last record to any other record changes and so on and that would be some work...
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Post by Villeroy »

On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
JeJe
Volunteer
Posts: 2780
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Solved] CSV : counting number of lines

Post by JeJe »

Villeroy wrote:On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.
That's meaningless without the line/file size (?)

(I do suspect I'm going to have to concede here... I ran some tests with my laptop and got fed up with the wait... but mine's a Pentium 2020M)
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
JeJe
Volunteer
Posts: 2780
Joined: Wed Mar 09, 2016 2:40 pm

Re: [Solved] CSV : counting number of lines

Post by JeJe »

MrProgrammer wrote:
$time perl -nE '}{say " $. $ARGV"' Million.txt
Don't know perl - I'm guessing that creates a file? With a million unrealistically small 1 record ("say ") size lines?

Edit, no say looks to be a command? Anyway - a million lines is meaningless without knowing the file size.
Windows 10, Openoffice 4.1.11, LibreOffice 7.4.0.3 (x64)
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Post by Villeroy »

Villeroy wrote:On my notebook, the Basic loop takes 7 seconds to count a million lines. wc -l takes a fraction of a second.
My file is 97MB in size. About 100 chars per line.
wc -l takes 0.14 seconds
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
User avatar
MrProgrammer
Moderator
Posts: 4906
Joined: Fri Jun 04, 2010 7:57 pm
Location: Wisconsin, USA

Re: [Solved] CSV : counting number of lines

Post by MrProgrammer »

Mr.Dandy wrote:Do you know a way to count total of lines instantanely?
One cannot accomplish a task in zero time. How quickly is "instantly"? Asking in an OpenOffice forum about how to count the lines in a file suggests to me that this is a case of XY Problem.
JeJe wrote:Don't know perl
Perl is a general-purpose Unix scripting language originally developed to make report processing easier which is also available on Linux, MacOS, and Windows. Many tasks can be accomplished quickly and easily with short programs. Perl is included with MacOS, and is included with many Linux distributions.
JeJe wrote:Anyway - a million lines is meaningless without knowing the file size.
Yes, I should have given the file size. Million0004.txt has 4-byte lines. Million1024.txt has 1K lines. On my machine, the larger file takes about a tenth of a second with wc and less than a second with perl.
$ ls -ln Million*.txt
-rw-r--r--  1 503  20     4000000 Sep  2 15:25 Million0004.txt
-rw-r--r--  1 503  20  1024000000 Sep  2 15:22 Million1024.txt
$ for f in Million*.txt; do time wc -l $f; done
 1000000 Million0004.txt

real	0m0.006s
user	0m0.003s
sys	0m0.002s
 1000000 Million1024.txt

real	0m0.713s
user	0m0.598s
sys	0m0.114s
$ for f in Million*.txt; do time perl -nE '}{say " $. $ARGV"' $f; done
 1000000 Million0004.txt

real	0m0.086s
user	0m0.080s
sys	0m0.004s
 1000000 Million1024.txt

real	0m0.902s
user	0m0.727s
sys	0m0.171s
JeJe wrote:Store a header at the start of the file that records the number of lines - whenever you modify the file you update it.
Or, if speed is of the essence, keep the count in a separate file which is created (using one of the methods above) by the process which generates the CSV file. Then the lines only need to be counted once and any other times the count can be accessed quickly.
wc -l <Million1024.txt >Million1024_Count.txt
Mr. Programmer
AOO 4.1.7 Build 9800, MacOS 13.6.3, iMac Intel.   The locale for any menus or Calc formulas in my posts is English (USA).
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: [Solved] CSV : counting number of lines

Post by Villeroy »

Code: Select all

$ time python countlines.py 
1051178

real	0m0,286s
user	0m0,216s
sys	0m0,069s
countlines.py:

Code: Select all

with open('/tmp/test/test.csv') as my_file:
    print(sum(1 for _ in my_file))
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
Post Reply