tempe_rider

Parsing Fixed width .dat file with Python

Discussion created by tempe_rider on Sep 12, 2011
Latest reply on Sep 19, 2011 by stacyrendall
I have data in the form of .dat file (really just a text file).  The problem is that the data are multi-line with each line having an independent fixed width.  Also there are no headers.  For example:

10012345678FIXEDWIDTHDATA
11012345678FIXEDWIDTHBUTLARGERTHAN
12012345678FIXEDWIDTH
10012345678AFIXEDWIDTHDATA
11012345678AFIXEDWIDTHBUTLARGERTHAN
12012345678AFIXEDWIDTH

The good news is that have cheat sheet with FIELDNAME, SIZE, TYPE (e.g. NUM or CHAR) and START POSITION.  The first three digits are RECORDTYPE (e.g. 100, 110 or 120).  The 12345678(A) is the PARCELNUM.  That is where the fixed with similarities end.

I am new to Python and have been struggling with this for a few days now.  I have manged to open the file, read it into a list, sort the list and write out a new file:

# Read mode opens a file for reading only.
DataFileIn = open("D:\Path\st4206001.dat", "r")
# Read all the lines into a list.
DataList = DataFileIn.readlines()
DataList.sort()
DataFileIn.close()
DataTextOut = open('D:\Path\Data.txt', 'w')
DataTextOut.writelines(DataList) # Write a sequence of strings to a file
DataTextOut.close()


This is where I need some direction.  My goal is to sort the list and output a file for each RECORDTYPE.  It would be nice to add the HEADERS to the files before writing them.  I was looking into using the re module to do the sorting (perhaps match) but, again, I am new to Python. 

My hope is that someone out there has a strategy I could follow (i.e.  suggest modules and python tricks).  I just need to be pointed in the right direction.

Outcomes