Y'herd thisun? 

“In theory, theory and practice are the same. In practice, they are not.”
-Albert Einstein

Flatfile fixer and dumper

TaggedCoding, VB, Database

Fix those stinking flat files.

Some things in development are not sexy, in fact they are downright boring, but you have to just pound out the code and get the job done before you can move on to the real cool, shiny and new stuff.

Consider the lowly mainframe text file dump. A big old block of characters that one corporation sticks on an FTP site and another one has to pickup and deal with.

It is true that one beautiful day in the future such files will be all XML and BIZTALK will have put that robber baron EDI out of business but till then, and unfortunately it *is* a ways off, corporate developers still have do deal with those cobol-born chunks of ASCII.

The first time you figured out a badly documented file format and finished the hard-coded parsing routine you have to admit that you felt darned proud of yourself. But the day that the format changed the thrill waned, and a few new files later the whole process was nothing but a boring chore.

Well, since the real job of a developer, when all else is said and done, is to automate the tedious for the user ... here is a chance to automate some developer tedium.

If you've been lucky enough to have been able to avoid a text dump so far, let us fill you in. A file comes your way looking something like this:

10849    Aniseed Syrup                001000
10857    Aniseed Syrup                001000
11017    Aniseed Syrup                001000
11077    Aniseed Syrup                001000
10309    Chef Anton's Cajun Seasoning 001760
10326    Chef Anton's Cajun Seasoning 001760
10336    Chef Anton's Cajun Seasoning 001760
10633    Queso Manchego La Pastora    003800
10678    Queso Manchego La Pastora    003800
10299    Teatime Chocolate Biscuits   000730
10329    Teatime Chocolate Biscuits   000730

On a good day you get a Key document describing what the file is all about. For the simple data file above, the Key might give some information like this:

RecCode 1,5 CHAR
Vendor 6,30 CHAR
Cost 31,6 Currency Decimal = 2

With this you know that there are three "columns" in the file, their starting characters, the length of the columns, the data types and any added conversion information you might require.

Now all you have to do is hard code a parsing routine and a conversion routine to put the data into a more useful database table.

OOPS! They just changed the spec, now it has to be put into a spreadsheet too so you'd better brush up on your Excel automation and Lotus DDE.

Wait! Did they forget to mention that they need it in an editable grid first so that they can manually fix the data before you have to move it on to it's final resting place?

No sweat.

Because we are lazy programmers, we made a set of classes that do it all, then we compiled these classes into a neat little ActiveX DLL that you can use to get working right away. Yes, the source code is available for free by request but before you ask for that and lose it somewhere on your hard drive, please give the dll a test, it comes with four demo projects to help you get up to speed real fast.

There are certain deps to run the DLL: 

1) ADO 2.5 
2) MS Scripting. 

These should be on your machine already, you may have to add them to your deployments (probably not if your user has IE5+). The only dep you need to add to your own projects is MDAC 2.5 or higher, because you'll be playing with the Recordsets that the dll spits out and because that includes the ISAM drivers used under the hood for the instant exports.

Now then, here is what you get...

The overview:

Pass in the location of the source file, set a few properties, call a single method and out pops a disconnected ADO Recordset with all that raw text neatly converted exactly as you require.

The extra benefits:

Files with comment lines that you do not want included in the result can have those lines taken out of the parse up front using the SkipLine property.

Conversely, files with multiple record types all packed in together can have just the records that you want included in the parse. A major time saver!

The MS Text ISAM driver only considers vbCRLF [chr(13)+chr(10)] to be a valid record separator, but mainframe dumps usually use vbLF [chr(10)] or ALT-Z (ugh, why are these companies still in business?!). The DumpParser defaults to vbLF but lets you specify what you need case by case.

The conversion allows you to pass in VBScript so your conversions can use optional code set at run-time ... great for the brick walls you can get with oddly formatted data like the Currency column with decimal details in the above example.

We've included the RecordsetToISAM converter class so that any result can be quickly transformed into Lotus WK1, Excel4, Excel97, Excel2000, Access97, Access2000, CSVText, dBase 4 or 5, or HTML Table. (This method can be used all by itself for any disconnected recordset).

The demos:

Demo 1

Just quickie to get you started. Uses a flatfile created from the Biblio database's Publishers table.

Shows the basic process of using the dll's simple object hierarchy to convert the file into an ADO Recordset. Fills a standard DBGrid with the Recordset.

A second option shows how to use the SkipLineComment properties to filter out lines you are not interested in such as comments. (This is similar to the SkipLine method of the ADO2.5+ Stream object.)

Demo 2

Shows the very basics of how to add custom script code to your parses. The source is a flatfile created using Northwind Riggers database Order Details table.

This uses the MS ScriptControl ... but uses the dll directly since using the ocx wrapper is a bit silly, especially in the case of making your own dlls where a host form is not ideal.

Look back at the example file at the top of this page, see that currency column? This demo passes a column object a run-time VBScript function and when the record is parsed what you want becomes what you get.

Demo 3

Uses the same source file as Demo2, but shows how to convert a disconnected Recordset to Excel, Lotus, HTML and Jet. Other types are supported (and enumerated so they show up in VB's intellisense lists) but we didn't have room on the form for all of them!

Notice that this feature includes an Overwrite option. Set the property as you wish to quietly kill any existing destination file or to raise a standard error 58. Note that if you choose to convert to Jet 3 or Jet 4 (Access 97 or Access 2000) the Overwrite works at the table level. If the Jet file exists the converter goes through the tables within the Jet database and deletes or raises an error at that deeper level, this allow you to pack a single Jet file with multiple tables.

Demo 4

Here's the one that covers those really annoying flatfiles, the ones that pack totally different record type all into a single file .. ugh! Each type of record, of course, requires a completely different parse definition. With this little dll the days of dreading these files are finally over.

As we mentioned up at the top, the source code for this dll is available ... to use it is to agree that you will not sell it as part of a commercial source code or library product, play nice ok? :) ... But, please, before you just get the code and attempt to wedge it into your established apps download the compiled dll and the demos and give them a test. Knowing how to be a user of the interfaces will be a great help to understanding the underlying code. Plus, you may find that the dll does the trick for you completely and that is that.

Hope it helps.

Robert Smith
Kirkland, WA

Click here to get the DLL, the full four demos and instructions for use.   Just 82k!

Click here to get the dll source code (please get the demo version first!! This source code does not include the demo programs and you will need them for understanding!) 

home     who is smith    contact smith     rss feed π
Since 1997 a place for my stuff, and it if helps you too then all the better