Topic Options
#22865 - 01/17/07 09:58 AM Split large XML Files with xslt
PaulW Offline
Junior Member

Registered: 09/28/06
Posts: 4
Loc: Belgium
Hello,

we have problems with splitting large xml files (5000 invoices) in seperated xml files ( 1 xml file for each Invoice).

I did some tests by creating a simple watch process, which uses a xslt script to split the xml file. With an xml file up to +/- 2000 invoices, we have no problems.The watch process generates 2000 xml files.
With larger files the watch process is running for a long time, and in the output directory there is no single file.

I tried the same with the XML SPLIT action in PlanetPress Watch With the same result. With larger files there is no output.

Any ideas what can be the problem?

Thanks
Paul

Top
#22866 - 01/17/07 10:10 AM Re: Split large XML Files with xslt
Anonymous
Unregistered


Hello,

XML and XSLT struggle with large files due to the way the files are created. For example, in XSLT, the first thing that an XSLT engine does is load the entire file into memory and to top it off, the amount of memory used is usually about 2-4 bytes for every byte stored. So a 100 meg XML file would actually occupy about 200-500 meg of memory (depending on the engine used) etc.

I have done alot of research to try and get around this but as I was doing my research I read alot of people discussing the same issue. XSLT just has serious problems with large files (as do most XML parsers). For example, try and open a 100 meg XML file in internet explorer, it might open after a couple hours but the odds are it will just crash with memory errors.

In cases where XSLT just cannot do the job, a non parsing technique needs to be found. In other words, the file will need to be split as a text file instead of an XML file.

Top
#22867 - 03/06/07 10:07 AM Re: Split large XML Files with xslt
PaulW Offline
Junior Member

Registered: 09/28/06
Posts: 4
Loc: Belgium
Hello,

Wa added some memory in the server. Now we have 7 Giga Memory in the server. Still no success with files larger then 10 MB. AltovaXML process takes up to 1800 Mb of memory and then stops the process is halted without any output.

I've found some VB.NET resources to split large XML files. Application converts file of 40 MB in smaller files (5Mb each) in 10 seconds !!!

http://www.codeproject.com/vb/net/xmlsplitter.asp

Is it possible to implement Script in our watch coniguration?

Kind regards,
Paul

Top
#22868 - 03/06/07 10:23 AM Re: Split large XML Files with xslt
Anonymous
Unregistered


You would have to create an executable program that can accept the file that PlanetPress Watch passes as its input.

However, I find it very odd that you cannot split a file larger then 10 megs. I have personally split files of more then 100 megs using PlanetPress Watch and the Altova engine on a machine with much less RAM.

I am intersted in seeing what XSLT code you are using to split your file.

On a side note, I am always looking for better ways to split XML files and I am going to take a good look at the link you posted, thanks for that.

Top
#22869 - 03/06/07 10:36 AM Re: Split large XML Files with xslt
PaulW Offline
Junior Member

Registered: 09/28/06
Posts: 4
Loc: Belgium
We are using the following xslt script to split our Invoice export file in one xml file for each invoice.

Code:
    

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output method="xml" version="1.0" encoding="ISO-8859-1" indent="yes"/>
  <xsl:template match="/">
    <xsl:for-each select="ExportInvoice/CustInvoiceJour">
      <xsl:result-document href="C:\\VDP\\TEST\\BE\\Invoice\\Designdata\\{position()}.xml">
      <ExportInvoice>
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:apply-templates select="node()"/>
        </xsl:copy>
        </ExportInvoice>
      </xsl:result-document>
    </xsl:for-each> 
  </xsl:template>
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>
Paul

Top
#22870 - 03/06/07 10:40 AM Re: Split large XML Files with xslt
Anonymous
Unregistered


Ok that code looks the exact same as the code I generate from my XML splitter plug-in.

What version of PlanetPress Watch are you using? We change the version of the Altova engine that we ship a little while ago. The new version was much better at memory management and speed.

Top
#22871 - 03/06/07 10:51 AM Re: Split large XML Files with xslt
PaulW Offline
Junior Member

Registered: 09/28/06
Posts: 4
Loc: Belgium
we have installed PP watch V5.3.1.2324

Paul

Top
#22872 - 03/08/07 10:02 AM Re: Split large XML Files with xslt
Anonymous
Unregistered


Without being able to run everything here myself I am not totally sure why a file of that size would take that long to spilt. The XSL code above has been tested and the only slow down / issues we see are when you try and split large xml files (50-75 megs +). A file of your size really should not cause any issues.

Top