[Solved] OOo upgrade has broken my Perl Spreadsheet parsers

Java, C++, C#, Delphi... - Using the UNO bridges
Post Reply
RCH
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am

[Solved] OOo upgrade has broken my Perl Spreadsheet parsers

Post by RCH »

I have been using a Perl script (running under WinXP)

Code: Select all

 
    use Spreadsheet::Read;
    my $ref = ReadData($ods_file,parser => "ods");
    if ($ref){
      for my $row qw(A B C D E F G H I J K L M N O P/   / ){
        for my $cell (map{"$row$_"} 1 .. 30){
          print $ref->[1]{$cell} . "\n";
        }
      }
    }
    

to pull info out of OOorg calc (*.ods) files.

The script has suddenly stopped working.
It returns a ref to a hash which exists, but is empty.

This only happens with *.ods files that have been made
since I upgraded to OpenOffice.org 3.1.0 000310m11(Build:9399)

All *.ods files made by the new OOorg are unparseable.
And if I open an old, parseable *.ods file in OOorg,
and save it under a new name, it too is unparseable.

The new version seem to have the same <text:p> elements
as the original.

But the values "ce9" etc in

Code: Select all

  <table:table-cell table:style-name="ce9" office:value-type="string">
and the values "ro5" etc in

Code: Select all

  <table:table-row table:style-name="ro5">
seem to have changed (increased by 1)

Does anyone have a fix?

I have also tried

Code: Select all

use OpenOffice::OODoc;

No joy

Thanks in advance
RCH
Last edited by Hagar Delest on Wed Nov 30, 2011 12:21 pm, edited 1 time in total.
Reason: tagged [Solved].
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
User avatar
Villeroy
Volunteer
Posts: 31279
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Post by Villeroy »

So you've got a problem with the Perl parser which is not a parser at all when it starts to fail because of another software's upgrade. If it were a parser it could parse office documents without depending on any office suite being installed or not.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04 with LibreOffice 6.0, latest OpenOffice and LibreOffice
RCH
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Post by RCH »

No I dont think it depends on any office suite being installed
It calls Perl's own Spreadsheet::ReadSXC
See line 42 of ..\Spreadsheet\Read.pm
($VERSION = "0.35";)
RCH
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
rudolfo
Volunteer
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Post by rudolfo »

According to the CPAN entry about the module the ReadSXC module is for StarOffice or OpenOffice 1.x spreadsheet data, as the sxc in the name might indicate.
By default OOo Calc 3.x uses ODF format 1.2 as default. Try changing this in the Load/Save options to "1.0/1.1 OpenOffice 2.x" and see if you get better results.
But be aware that this might have issues in other areas.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
RCH
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Post by RCH »

Well I am currently using a similar work-around - Save as Microsoft Excel
Works perfectly - but seems a bit disloyal
RCH

P.S. I've just written a Perl script that does
use XML::Simple;
to parse content.xml
It looks as though the problem is that new OOorg xml has elements like this
'table:number-columns-repeated' => '10',
which are not (so far as I know) understood by any of the current Perl OOorg modules

If you can point me at a Perl module that _does_ understand "table:number-columns-repeated"
I'd be most grateful
Last edited by RCH on Tue Nov 29, 2011 5:36 pm, edited 1 time in total.
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
User avatar
acknak
Moderator
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Post by acknak »

There are parser modules in Perl for ODF. That's the best format to depend on for the long-term, but it will mean a big change in your scripts to use a different parser, I expect.
AOO4/LO5 • Linux • Fedora 23
RCH
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Post by RCH »

Thank you for helpful replies
BEST SOLUTION
I have finally arrived at a rock solid solution

Code: Select all

    use XML::Parser::Expat;
    my $parser = new XML::Parser::Expat;
    $parser->setHandlers(
                          'Start'    => \&starthandler,
                          'End'      => \&endhandler,
                          'Char'     => \&charhandler,
                          'Default'  => \&everything_elsehandler,
                          );
    open(INFILE, $xml_file) or croak "Problem $! opening \$xml_file\n\tCroaked ";
    my $result = $parser->parse(*INFILE);
    close(INFILE)           or croak "Problem $! closing \$xml_file\n\tCroaked ";
where $xml file is content.xml
and *handler s are subroutines


PREVIOUS BEST SOLUTION
I found a good working solution using OpenOffice-OODoc
The key is "normalizeSheet(sheet, OPTIONS)"
For more, see http://search.cpan.org/~jmgdoc/OpenOffi ... columns%29

Code: Select all

use OpenOffice::OODoc;
my $doc = odfDocument(file => $infile);
my $table = 0;
my $sheetnumber=0;
my $stop_at_row_number = 10;
my $stop_at_col_number = 10;


(my @array) = ($doc->normalizeSheet($sheetnumber,$stop_at_row_number,$stop_at_col_number,));
# DONT do THIS UNLESS YOU HAVE LOTS OF MEMORY/LOTS OF SPARE TIME
#(my @array) = ($doc->normalizeSheet($sheetnumber,'full'));

foreach my $row (0 .. $stop_at_row_number){
    foreach my $col (0 .. $stop_at_col_number){
        my $value = $doc->getCellValue($table, $row, $col);
        printf OUTFILE "row %d   col %d    %s\n", $row, $col, $value;
    }# foreach col
    print OUTFILE "\n";
}# foreach line
In the returned text, watch out for guillemets (<<>>) e.g. N<<°>>
provoked by complex text:p elements such as
<text:p>N
<text:span text:style-name="T2">°
</text:span>
</text:p>
in content.xml
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
Post Reply