[Solved] OOo upgrade has broken my Perl Spreadsheet parsers

Java, C++, C#, Delphi, ??? - Using the UNO bridges

[Solved] OOo upgrade has broken my Perl Spreadsheet parsers

Postby RCH » Tue Nov 29, 2011 11:31 am

I have been using a Perl script (running under WinXP)
Code: Select all   Expand viewCollapse view

    use Spreadsheet::Read;
    my $ref = ReadData($ods_file,parser => "ods");
    if ($ref){
      for my $row qw(A B C D E F G H I J K L M N O P/   / ){
        for my $cell (map{"$row$_"} 1 .. 30){
          print $ref->[1]{$cell} . "\n";
        }
      }
    }
   

to pull info out of OOorg calc (*.ods) files.

The script has suddenly stopped working.
It returns a ref to a hash which exists, but is empty.

This only happens with *.ods files that have been made
since I upgraded to OpenOffice.org 3.1.0 000310m11(Build:9399)

All *.ods files made by the new OOorg are unparseable.
And if I open an old, parseable *.ods file in OOorg,
and save it under a new name, it too is unparseable.

The new version seem to have the same <text:p> elements
as the original.

But the values "ce9" etc in
Code: Select all   Expand viewCollapse view
  <table:table-cell table:style-name="ce9" office:value-type="string">

and the values "ro5" etc in
Code: Select all   Expand viewCollapse view
  <table:table-row table:style-name="ro5">

seem to have changed (increased by 1)

Does anyone have a fix?

I have also tried
Code: Select all   Expand viewCollapse view
use OpenOffice::OODoc;


No joy

Thanks in advance
RCH
Last edited by Hagar Delest on Wed Nov 30, 2011 12:21 pm, edited 1 time in total.
Reason: tagged [Solved].
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
RCH
 
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Postby Villeroy » Tue Nov 29, 2011 1:01 pm

So you've got a problem with the Perl parser which is not a parser at all when it starts to fail because of another software's upgrade. If it were a parser it could parse office documents without depending on any office suite being installed or not.
Please, edit this topic's initial post and add "[Solved]" to the subject line if your problem has been solved.
Ubuntu 18.04, no OpenOffice, LibreOffice 6.4
User avatar
Villeroy
Volunteer
 
Posts: 28544
Joined: Mon Oct 08, 2007 1:35 am
Location: Germany

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Postby RCH » Tue Nov 29, 2011 1:37 pm

No I dont think it depends on any office suite being installed
It calls Perl's own Spreadsheet::ReadSXC
See line 42 of ..\Spreadsheet\Read.pm
($VERSION = "0.35";)
RCH
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
RCH
 
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Postby rudolfo » Tue Nov 29, 2011 2:03 pm

According to the CPAN entry about the module the ReadSXC module is for StarOffice or OpenOffice 1.x spreadsheet data, as the sxc in the name might indicate.
By default OOo Calc 3.x uses ODF format 1.2 as default. Try changing this in the Load/Save options to "1.0/1.1 OpenOffice 2.x" and see if you get better results.
But be aware that this might have issues in other areas.
OpenOffice 3.1.1 (2.4.3 until October 2009) and LibreOffice 3.3.2 on Windows 2000, AOO 3.4.1 on Windows 7
There are several macro languages in OOo, but none of them is called Visual Basic or VB(A)! Please call it OOo Basic, Star Basic or simply Basic.
rudolfo
Volunteer
 
Posts: 1488
Joined: Wed Mar 19, 2008 11:34 am
Location: Germany

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Postby RCH » Tue Nov 29, 2011 2:48 pm

Well I am currently using a similar work-around - Save as Microsoft Excel
Works perfectly - but seems a bit disloyal
RCH

P.S. I've just written a Perl script that does
use XML::Simple;
to parse content.xml
It looks as though the problem is that new OOorg xml has elements like this
'table:number-columns-repeated' => '10',
which are not (so far as I know) understood by any of the current Perl OOorg modules

If you can point me at a Perl module that _does_ understand "table:number-columns-repeated"
I'd be most grateful
Last edited by RCH on Tue Nov 29, 2011 5:36 pm, edited 1 time in total.
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
RCH
 
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Postby acknak » Tue Nov 29, 2011 3:59 pm

There are parser modules in Perl for ODF. That's the best format to depend on for the long-term, but it will mean a big change in your scripts to use a different parser, I expect.
AOO4/LO5 • Linux • Fedora 23
User avatar
acknak
Moderator
 
Posts: 22756
Joined: Mon Oct 08, 2007 1:25 am
Location: USA:NJ:E3

Re: Recent OOorg upgrade has broken my Perl Spreadsheet pars

Postby RCH » Wed Nov 30, 2011 10:32 am

Thank you for helpful replies
BEST SOLUTION
I have finally arrived at a rock solid solution
Code: Select all   Expand viewCollapse view
    use XML::Parser::Expat;
    my $parser = new XML::Parser::Expat;
    $parser->setHandlers(
                          'Start'    => \&starthandler,
                          'End'      => \&endhandler,
                          'Char'     => \&charhandler,
                          'Default'  => \&everything_elsehandler,
                          );
    open(INFILE, $xml_file) or croak "Problem $! opening \$xml_file\n\tCroaked ";
    my $result = $parser->parse(*INFILE);
    close(INFILE)           or croak "Problem $! closing \$xml_file\n\tCroaked ";

where $xml file is content.xml
and *handler s are subroutines


PREVIOUS BEST SOLUTION
I found a good working solution using OpenOffice-OODoc
The key is "normalizeSheet(sheet, OPTIONS)"
For more, see http://search.cpan.org/~jmgdoc/OpenOffice-OODoc/OODoc/Text.pod#normalizeSheet%28sheet,_rows,_columns%29

Code: Select all   Expand viewCollapse view
use OpenOffice::OODoc;
my $doc = odfDocument(file => $infile);
my $table = 0;
my $sheetnumber=0;
my $stop_at_row_number = 10;
my $stop_at_col_number = 10;


(my @array) = ($doc->normalizeSheet($sheetnumber,$stop_at_row_number,$stop_at_col_number,));
# DONT do THIS UNLESS YOU HAVE LOTS OF MEMORY/LOTS OF SPARE TIME
#(my @array) = ($doc->normalizeSheet($sheetnumber,'full'));

foreach my $row (0 .. $stop_at_row_number){
    foreach my $col (0 .. $stop_at_col_number){
        my $value = $doc->getCellValue($table, $row, $col);
        printf OUTFILE "row %d   col %d    %s\n", $row, $col, $value;
    }# foreach col
    print OUTFILE "\n";
}# foreach line


In the returned text, watch out for guillemets (<<>>) e.g. N<<°>>
provoked by complex text:p elements such as
<text:p>N
<text:span text:style-name="T2">°
</text:span>
</text:p>
in content.xml
RichardH
OpenOffice.org 3.1.0 running on WinXP professional
RCH
 
Posts: 4
Joined: Tue Nov 29, 2011 10:53 am


Return to External Programs

Who is online

Users browsing this forum: No registered users and 2 guests