Page 1 of 1

[Solved] OOo upgrade has broken my Perl Spreadsheet parsers

Posted: Tue Nov 29, 2011 11:31 am
by RCH
I have been using a Perl script (running under WinXP)

Code: Select all

 
    use Spreadsheet::Read;
    my $ref = ReadData($ods_file,parser => "ods");
    if ($ref){
      for my $row qw(A B C D E F G H I J K L M N O P/   / ){
        for my $cell (map{"$row$_"} 1 .. 30){
          print $ref->[1]{$cell} . "\n";
        }
      }
    }
    

to pull info out of OOorg calc (*.ods) files.

The script has suddenly stopped working.
It returns a ref to a hash which exists, but is empty.

This only happens with *.ods files that have been made
since I upgraded to OpenOffice.org 3.1.0 000310m11(Build:9399)

All *.ods files made by the new OOorg are unparseable.
And if I open an old, parseable *.ods file in OOorg,
and save it under a new name, it too is unparseable.

The new version seem to have the same <text:p> elements
as the original.

But the values "ce9" etc in

Code: Select all

  <table:table-cell table:style-name="ce9" office:value-type="string">
and the values "ro5" etc in

Code: Select all

  <table:table-row table:style-name="ro5">
seem to have changed (increased by 1)

Does anyone have a fix?

I have also tried

Code: Select all

use OpenOffice::OODoc;

No joy

Thanks in advance
RCH

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Posted: Tue Nov 29, 2011 1:01 pm
by Villeroy
So you've got a problem with the Perl parser which is not a parser at all when it starts to fail because of another software's upgrade. If it were a parser it could parse office documents without depending on any office suite being installed or not.

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Posted: Tue Nov 29, 2011 1:37 pm
by RCH
No I dont think it depends on any office suite being installed
It calls Perl's own Spreadsheet::ReadSXC
See line 42 of ..\Spreadsheet\Read.pm
($VERSION = "0.35";)
RCH

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Posted: Tue Nov 29, 2011 2:03 pm
by rudolfo
According to the CPAN entry about the module the ReadSXC module is for StarOffice or OpenOffice 1.x spreadsheet data, as the sxc in the name might indicate.
By default OOo Calc 3.x uses ODF format 1.2 as default. Try changing this in the Load/Save options to "1.0/1.1 OpenOffice 2.x" and see if you get better results.
But be aware that this might have issues in other areas.

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Posted: Tue Nov 29, 2011 2:48 pm
by RCH
Well I am currently using a similar work-around - Save as Microsoft Excel
Works perfectly - but seems a bit disloyal
RCH

P.S. I've just written a Perl script that does
use XML::Simple;
to parse content.xml
It looks as though the problem is that new OOorg xml has elements like this
'table:number-columns-repeated' => '10',
which are not (so far as I know) understood by any of the current Perl OOorg modules

If you can point me at a Perl module that _does_ understand "table:number-columns-repeated"
I'd be most grateful

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Posted: Tue Nov 29, 2011 3:59 pm
by acknak
There are parser modules in Perl for ODF. That's the best format to depend on for the long-term, but it will mean a big change in your scripts to use a different parser, I expect.

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Posted: Wed Nov 30, 2011 10:32 am
by RCH
Thank you for helpful replies
BEST SOLUTION
I have finally arrived at a rock solid solution

Code: Select all

    use XML::Parser::Expat;
    my $parser = new XML::Parser::Expat;
    $parser->setHandlers(
                          'Start'    => \&starthandler,
                          'End'      => \&endhandler,
                          'Char'     => \&charhandler,
                          'Default'  => \&everything_elsehandler,
                          );
    open(INFILE, $xml_file) or croak "Problem $! opening \$xml_file\n\tCroaked ";
    my $result = $parser->parse(*INFILE);
    close(INFILE)           or croak "Problem $! closing \$xml_file\n\tCroaked ";
where $xml file is content.xml
and *handler s are subroutines


PREVIOUS BEST SOLUTION
I found a good working solution using OpenOffice-OODoc
The key is "normalizeSheet(sheet, OPTIONS)"
For more, see http://search.cpan.org/~jmgdoc/OpenOffi ... columns%29

Code: Select all

use OpenOffice::OODoc;
my $doc = odfDocument(file => $infile);
my $table = 0;
my $sheetnumber=0;
my $stop_at_row_number = 10;
my $stop_at_col_number = 10;


(my @array) = ($doc->normalizeSheet($sheetnumber,$stop_at_row_number,$stop_at_col_number,));
# DONT do THIS UNLESS YOU HAVE LOTS OF MEMORY/LOTS OF SPARE TIME
#(my @array) = ($doc->normalizeSheet($sheetnumber,'full'));

foreach my $row (0 .. $stop_at_row_number){
    foreach my $col (0 .. $stop_at_col_number){
        my $value = $doc->getCellValue($table, $row, $col);
        printf OUTFILE "row %d   col %d    %s\n", $row, $col, $value;
    }# foreach col
    print OUTFILE "\n";
}# foreach line
In the returned text, watch out for guillemets (<<>>) e.g. N<<°>>
provoked by complex text:p elements such as
<text:p>N
<text:span text:style-name="T2">°
</text:span>
</text:p>
in content.xml