Processing big TMX files wiht PHP
Author:  1-shi [ Thu Apr 10, 2008 6:07 pm ]
Post subject:  Processing big TMX files wiht PHP

Hi everyone!
I want to process a TMX file (practically is a XML formatted document) with PHP. What I want exactly to do is to upload a TMX file with a PHP form and then read the entire file, parse it with PHP functions for parsing XML and then generate a SQL file that contains all the insert statements for inserting all TMX segments in Database. My TMX files format is as shown below:

<?xml version="1.0" ?>
<!DOCTYPE tmx SYSTEM "tmx11.dtd">
<tmx version="1.1">
creationtool="TRADOS Translator's Workbench for Windows"
creationtoolversion="Edition 7 Build 719"
o-tmf="TW4Win 2.0 Format"

<tu creationdate="20030821T144022Z" creationid="G.VERMEER">
<tuv lang="EN-US">
<seg>Set the required cutting depth.</seg>
<tuv lang="EL">
<seg>Ορισμός του απαιτούμενου βάθους κοπής.</seg>

<tu creationdate="20040322T214015Z" creationid="PAPOUS">
<tuv lang="EN-US">
<seg>Should the machine be damaged, it must not be used.</seg>
<tuv lang="EL">
<seg>Εάν η μηχανή έχει βλάβες, δεν πρέπει να χρησιμοποιηθεί.</seg>

<tu creationdate="20040322T214450Z" creationid="PAPOUS">
<tuv lang="EN-US">
<seg>Never use the machine for sawing into the ground.</seg>
<tuv lang="EL">
<seg>Μη χρησιμοποιήσετε ποτέ το μηχάνημα για κοπή μέσα στο έδαφος.</seg>


This algorithm works fine for small TMX files, but some TMX files are too big to be processed before the MAX_EXECUTION_TIME in php is elapsed. I have a shared hosting and the php is running in safe mode so I can’t use set_time_limit function. The only way to do the processing is by dividing it in small chunks that will be executed before MAX_EXECUTION_TIME. So for example my local server has a MAX_EXECUTION_TIME of 30 seconds, so when 25 seconds are passed since the beginning of the script execution, I cancel the script, store the file pointer in a $_SESSION variable and display a link to the web page telling the user to start next session. In the next session I start reading the file at the position I stopped at the first session. At this time an XML error is displayed saying “Empty document at line 1 column 1”. Maybe it stops for a character reading problem because as you can see the TMX file has Unicode format, I don’t know! Please help me!
Thanks in advance!

