Select to view content in your preferred language

Datetime from CSV file

3542
8
11-06-2012 06:46 AM
JimKrist
Emerging Contributor
Good day. I am trying to parse a large amount of data (in CSV format) into minute intervals and then geocode the data into a geodatabase. Problem is that the datetime field does not display as text format in the geodatabase. It still shows a DATE format. The CSV file has the proper YYYY-MM-DD HH:MM text format after parsing. I have converted the date field to text format prior to geoprocessing but the data still appears to be processed in ArcGIS in date format.

I know shapefiles do not support the HH:MM timestamp date format so I have performed a text conversion on the date:

dt = DTTM[:19]
dtime = datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
str_date = dtime.strftime('%Y-%m-%d %H:%M')

And the datetime field is set as text format in the schema.ini file prior to geoprocessing.

[1output.csv]
Format=CSVDelimited
Col4=DTTM Text

I have also tried to write directly from the CSV to a file geodatabase, using CopyRows, i.e. arcpy.CopyRows_management(filename, geodb), but the geodatabase still contains the date format.

Any ideas what is going on?

Thanks in advance for all responses.

My code snippets are included below:

min_interval.py:

import csv
from datetime import datetime
import os

def min_aggr(infile, outfile):

ifile = open(infile, 'rb')
reader = csv.reader(ifile)
out = open(outfile, 'wb')
writer = csv.writer(out)
header = reader.next()
hdr = ['ID','LON','LAT','DTTM','HEADING','CALL_SIGN','vesselType']
writer.writerow(hdr)

for ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType in reader:
dt = DTTM[:19]
dtime = datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
str_date = dtime.strftime('%Y-%m-%d %H:%M')
data = [ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType]
writer.writerow(data)

filenum = 0
path = raw_input("Enter the full path name (ex. c:\\test\):")
datapath = raw_input('Enter the path for the output directory (ex. c:\\output\):')
dirList=os.listdir(path)
dirList.sort()
for fname in dirList:
filenum += 1
filename = path + fname
outfile = datapath + str(filenum) + 'output.csv'
try:
min_aggr(filename, outfile)
except IOError:
print filename

print 'files have been processed'

geocode.py:

import os
import arcpy
from arcpy import env
env.overwriteOutput = True

coord = "GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]" #spatial projection

#create file geodatabase
dir = raw_input('Enter path to create geodatabase (ex. c:\\data\):')
dbname = raw_input('Enter the name for the geodatabase (ex. test.gdb):')
arcpy.CreateFileGDB_management(dir, dbname)
print 'geodatabase ' + dbname + ' has been created'
geodb = dir + dbname

filenum = 0
path = raw_input('Enter the full path name for input files (ex. c:\\test1\):')
datapath = raw_input('Enter the path for the output directory (ex. c:\\output1\):')
dirList=os.listdir(path)
dirList.sort()
for fname in dirList:
filenum += 1
filename = path + fname
shp = datapath + str(filenum) + 'output.shp'
try:
arcpy.MakeTableView_management(filename,'View')
arcpy.MakeXYEventLayer_management('View','LON','LAT','Layer')
arcpy.CopyFeatures_management('Layer', shp)
arcpy.DefineProjection_management(shp, coord)
arcpy.FeatureClassToGeodatabase_conversion([shp], geodb)
print 'file ' + filename + ' has been geoprocessed'
except IOError:
print 'ERROR: ' + filename

print 'geoprocessing complete'
0 Kudos
8 Replies
BruceHarold
Esri Regular Contributor
Hi

Can you supply a sample of your data?  Your code is hard to read without indentation.

Thanks
0 Kudos
JimKrist
Emerging Contributor
Sorry about the code indent. Didn't realize I needed to use the INDENT tool.

Sample .csv data (including header):

ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType
0,-74.064842,40.617940,2010-01-01 00:01,0,WS0286,Other
0,-90.910605,30.080466,2010-01-01 00:02,0,WS0286,Other
0,-74.064867,40.617936,2010-01-01 00:03,0,WS0286,Other
1,-90.216484,29.125813,2010-01-01 00:00,511,FF0419,Cargo
1,-89.315449,29.047325,2010-01-01 00:00,511,FF0419,Cargo
1,-90.216495,29.125843,2010-01-01 00:01,511,FF0419,Cargo

Scripts with indentation:

min_interval.py:

import csv
from datetime import datetime
import os

def min_aggr(infile, outfile):
[INDENT] ifile = open(infile, 'rb')
reader = csv.reader(ifile)
out = open(outfile, 'wb')
writer = csv.writer(out)
header = reader.next()
hdr = ['ID','LON','LAT','DTTM','HEADING','CALL_SIGN','vesselType']
writer.writerow(hdr)
for ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType in reader:
[INDENT]dt = DTTM[:19]
dtime = datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
str_date = dtime.strftime('%Y-%m-%d %H:%M')
data = [ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType]
writer.writerow(data)[/INDENT]
[/INDENT]

filenum = 0
path = raw_input("Enter the full path name (ex. c:\\test\):") datapath = raw_input('Enter the path for the output directory (ex. c:\\output\):')
dirList=os.listdir(path)
dirList.sort()
for fname in dirList:
[INDENT]filenum += 1
filename = path + fname
outfile = datapath + str(filenum) + 'output.csv'
[/INDENT]
[INDENT]try:
[INDENT]min_aggr(filename, outfile)[/INDENT]
except IOError:
[INDENT]print filename[/INDENT]
[/INDENT]
print 'files have been processed'

geocode.py:

import os
import arcpy
from arcpy import env
env.overwriteOutput = True

coord = "GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]" #spatial projection

#create file geodatabase
dir = raw_input('Enter path to create geodatabase (ex. c:\\data\):')
dbname = raw_input('Enter the name for the geodatabase (ex. test.gdb):')
arcpy.CreateFileGDB_management(dir, dbname)
print 'geodatabase ' + dbname + ' has been created'
geodb = dir + dbname

filenum = 0
path = raw_input('Enter the full path name for input files (ex. c:\\test1\):')
datapath = raw_input('Enter the path for the output directory (ex. c:\\output1\):')
dirList=os.listdir(path)
dirList.sort()
for fname in dirList:
[INDENT]filenum += 1
filename = path + fname
shp = datapath + str(filenum) + 'output.shp'
[INDENT]try:
[INDENT]arcpy.MakeTableView_management(filename,'View')
arcpy.MakeXYEventLayer_management('View','LON','LAT','Layer')
arcpy.CopyFeatures_management('Layer', shp) arcpy.DefineProjection_management(shp, coord)
arcpy.FeatureClassToGeodatabase_conversion([shp], geodb) print 'file ' + filename + ' has been geoprocessed'[/INDENT]
except IOError:
[INDENT]print 'ERROR: ' + filename[/INDENT][/INDENT][/INDENT]

print 'geoprocessing complete'
0 Kudos
LT
by
Regular Contributor
Sorry about the code indent. Didn't realize I needed to use the INDENT tool.


Hi Jim,
I don't know what's going on with your data, but I just wanted to let you know there's a more effective way to post code on the forum... As it stands now, if someone copies and pastes your code into a Python script, they loose all the indentation, so it's pretty much a mess.  Instead, use the CODE tags...There's a little button that looks like a number sign (#).  You can click that and CODE tags appear.  Paste your code between these tags and it will retain the same indentation as you had in the script.  Here's an example of what it looks like when you use the code tags:

        for f in fs:
            try:
                desc = arcpy.Describe(f)
                type = desc.DataType
                if type in fDict:
                    fDict[type].append(f)
                #...and so forth
 
0 Kudos
DarrenWiens2
MVP Honored Contributor
Aren't you writing out the exact same input to the output? You put the datetime into a string (str_date) but never write it to the output.
for ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType in reader:
  dt = DTTM[:19]
  dtime = datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
  str_date = dtime.strftime('%Y-%m-%d %H:%M')
  data = [ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType]
  writer.writerow(data)

I'm guessing it should be something like:
data = [ID,LON,LAT,str_date,HEADING,CALL_SIGN,vesselType
0 Kudos
JimKrist
Emerging Contributor
Aren't you writing out the exact same input to the output? You put the datetime into a string (str_date) but never write it to the output. 
for ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType in reader:
  dt = DTTM[:19]
  dtime = datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
  str_date = dtime.strftime('%Y-%m-%d %H:%M')
  data = [ID,LON,LAT,DTTM,HEADING,CALL_SIGN,vesselType]
  writer.writerow(data)

I'm guessing it should be something like: 
data = [ID,LON,LAT,str_date,HEADING,CALL_SIGN,vesselType


Thanks. I did note that discrepency. It has been corrected.
0 Kudos
JimKrist
Emerging Contributor
Thanks to all for your responses. Problem has been solved by splitting date and time and then concatenating in text field.
  dt = RX_DTTM[:19]
  dtime = datetime.strptime(dt, '%Y-%m-%d %H:%M:%S')
  str_date = dtime.strftime('%Y-%m-%d %H:%M')
  DT,TM = str_date.split(' ',1)
  data = [ID,LON,LAT,DT,TM,vesselType]  
  writer.writerow(data)


When geocoding I use arcpy.ConcatenateDateAndTimeFields_ta to process DT, TM into DTTM:
filenum = 0
path = raw_input('Enter the full path name for input files (ex. c:\\input\): ')
datapath = raw_input('Enter the path for the output directory (ex. c:\\output\): ')
dirList=os.listdir(path)
dirList.sort()
for fname in dirList:
  if not fname.startswith('schema'):
 filenum += 1
 filename = path + fname
 shp = datapath + str(filenum) + 'output.shp'
 try:
  arcpy.MakeTableView_management(filename,'View')
  arcpy.MakeXYEventLayer_management('View','LON','LAT','Layer')
  arcpy.CopyFeatures_management('Layer', shp)
  arcpy.CheckOutExtension('tracking')
  arcpy.ConcatenateDateAndTimeFields_ta(shp, 'DT', 'TM','DTTM')
  arcpy.DefineProjection_management(shp, coord)
  arcpy.FeatureClassToGeodatabase_conversion([shp], geodb)
  print 'File ' + filename + ' has been geoprocessed'
 except IOError:
  print 'ERROR: ' + filename


Note: I did have to create a specific schema.ini file in the directory (explaining the if not fname.startswith('schema'): ignore file clause) to ensure all fields from the CSV files were assigned data type. One entry was required for each file in the directory as my investigation found that wildcards could not be used in the .ini file.
[1_output.csv]
Format=CSVDelimited
Col1=ID Long
Col2=LON Double
Col3=LAT Double
Col4=DT Text
Col5=TM Text
Col18=VesselType Text
[2_output.csv]
Format=CSVDelimited
Col1=ID Long
Col2=LON Double
Col3=LAT Double
Col4=DT Text
Col5=TM Text
Col6=VesselType Text


Could prove challenging as we will have hundreds of input files after parsing the large (600GB) CSV files.
0 Kudos
KimOllivier
Honored Contributor
Time to move to loading the data and transforming it with FME (otherwise disguised as the Data Interoperability Extension)
No scripting required, much better performance, easy to debug, the ETL standard tool for good reason.
0 Kudos
DarrenWiens2
MVP Honored Contributor
Could prove challenging as we will have hundreds of input files after parsing the large (600GB) CSV files.

You could automate the creation of your .ini file (if it's simply changing the csv filename by an increasing number):
iniFile = open(r"PATH_TO_YOUR_INI_FILE",'w')

numbers = range(1,100)

for i in numbers:
    iniFile.write("[" + str(i) + "_output.csv]\n\
Format=CSVDelimited\n\
Col1=ID Long\n\
Col2=LON Double\n\
Col3=LAT Double\n\
Col4=DT Text\n\
Col5=TM Text\n\
Col6=VesselType Text\n")
0 Kudos