Archive for the ‘linux’ Category

Splitting a large XML file into smaller well formed files

As we get into the holiday mood it seems a rush in order to solve eminent problems that have plagued us throught the year.

One of the most popular data exchange and export formats is XML. While XML files provide a means of moving data between systems, especially legacy systems, an issue arises when you have to transform and import the data into a different system.

The biggest problem usually has to do with the size of the files, its very easy to export files of 100MB and larger, but processing of such large files becomes unwieldy. The need therefore arises to split the large file into smaller ones, while maintaining the well-formedness of the smaller files, plus control the size of the smaller files.

This seems to have been solved by the xml_split ( which does exactly that.

Special thanks to Canonical for the tools that are making Ubuntu the fastest growing Linux distro

An excellent holiday gift

IP Cop – Linux, Old Computers, Broadband Combined

During the most recent switch of ISP, we foun that our bandwidth utilization was high, and we had to setup a transparent web proxy to cache commonly shared files. Our File/Print/Active Directory server is running Windows 2003 standard edition, and setting up any kind of proxy was convoluted, error prone and would affect the performace of the box.

I raised this on the local linux users forum,, where one of the members suggested that we try IP Cop ( which would run on an old Pentium III box. We had a PIV with 1GB RAM, and a spare network card so we were all set.

The download was amazing, 60MB, installation even more so, 10 minutes we were up and running. The green (local network) and red (ISP) network cards were recogonized and setup in a flash. The management is via a web interface so even we who were Linux challenged, had somewhere to start.

The proxy service was up and running in two clicks, configuring it to be transparent to the network browsers was just a checkbox. The ISP also provides an upstream proxy which we configured as the parent proxy sever, and now we are up and running.

The cache hits are now up to 30%, and so far at least we are saving that bandwidth.

The next challenge that we are grappling with is to configure NTOP ( on the IPCop so that we can identify where the traffic is coming from on the internal network.

Thats a story for another day ….

%d bloggers like this: