rfairhur24

I've Saved Time in a Bottle.  How Do I Get it Back Out? - Doing More with Date Fields Using the Field Calculator and Python

Blog Post created by rfairhur24 on Jul 1, 2016

1. There is a lack of ArcGIS/Python resources that support the manipulation of Date Field values using the Field Calculator

 

The ArcGIS software provides good built-in support and documentation that let users take full advantage of the data that they have stored in Text fields or Numeric fields (Short, Long, Float, and Double).  This includes built-in support in the Field Calculator that provides quick access to Python methods that let users parse, manipulate and transform the values of an existing field into another field that will hold reformatted data that is more suitable for analysis and presentation.  Being able to take existing text and numeric data and store it in alternative forms is often vital for working with this information through symbology, labeling, geoprocessing, and other common operations.

 

However, when it comes to Date or Date-Time fields, the Python methods provided by the Field Calculator are limited to manipulating the current system date or time or the parsed date components (year, month, day, hour, minute, second, AM/PM) that make up a date, not a date itself.   The methods provided are of no real use for parsing, manipulating and transforming dates or times that have been stored in a Date/Date-Time field.  The help documentation for the Field Calculator provides numerous examples of existing text and numeric fields, but is completely silent on how to apply any of the available Python date methods to dates stored in a date field.  As a result, beyond using the actual date value itself, few users know how to extract information from the values stored in a date field that would let them fully analyze and present their date-based data in the most effective way through the ArcGIS software.

 

The purpose of this Blog is to try to tackle the challenges of parsing, manipulating and transforming values stored in date fields or text fields holding date-based data using Python and the ArcGIS Field Calculator.  Some challenges in tackling date field values are specific to the use of Python in the Field Calculator, and could be handled relatively easily by using a VB Script Field Calculation alternative or exporting to Excel and using the formulas of that program.  However, this Blog will limit itself to offering a pure ArcGIS/Python solution, even where the ArcGIS/Python solution is a fair amount more difficult to implement than the VB Script calculation or an Excel formula.

 

2. Why is it important to be able to manipulate Date-Time fields and text fields with date-based data in ArcGIS?

 

Since ArcGIS 10.1, all products of the GIS software have included the ability to apply time awareness to user data.  However, often this data does not comply with the standards required to make good use of the time aware features of ArcGIS.

 

For example, time aware data must naturally sort in its native type according to datetime sorting rules, and often an alphabetical sort of data stored as text violates the patterns of a true datetime value sort.  Even if you have your data stored in a Date-Time field and it can be sorted, it is most often stored as the local datetime value of the data provider.  However, if that data ever needs to be compiled with data in other time zones it will need to be standardized to a universal datetime, such as UTC, before it can be integrated with data coming from outside of that local datetime representation and used with the time aware features of ArcGIS.

 

Also, although the ArcGIS software continues to expand the available geoprocessing tools that can make your data more suitable for use with its time aware capabilities, much of the processing still only takes places on the fly in conjunction with the ArcGIS time slider.  But in many cases, it is often more convenient if the datetime data has been recalculated into a field that has converted the datetime into periods of different multiples or divisions of years, quarters, months, days of the month, etc, so that the date-based data can be easily summarized into a table format for reports or easily used within a geoprocessing workflow.

 

The time aware features of ArcGIS also do not readily handle all datetime periods, such as divisions of a year into pay period cycles, the natural seasons of winter, summer, spring and autumn, lunar cycles, etc. all of which can be derived from a datetime value rather than typed by a user.  Being able to convert a datetime value into custom or natural time period divisions or groupings can be essential to fully analyzing hidden relationships within time-based data and invaluable to let you take advantage of the full range of ArcGIS capabilities for presenting that data.

 

3. When is a Date or a Date-Time Field Value neither a Date nor a DateTime?

 

Unfortunately, the answer to this section's question is "when it is a Unicode String", which is what is returned by a Date or Date-Time field when the Field Calculator is set to use the Python parser option.  This is a peculiarity of a Python field calculation, since If you are using a cursor in a Python script the value returned by a date field will be a date or datetime variable and not a Unicode string  The most likely reason a Unicode String is returned is that the field calculator interface is closely related to the labeling expression interface.  When a date field is used to create a label the returned formatted date string ensures that a user-friendly date will appear in the label without using any other Python code.  If a Python Date or DateTime variable was returned, the label would not appear in a user-friendly format without specifying a format through a method like datetime.strptime().

 

At least in the case of Windows-based systems, the format of the date string is specified in the OS Settings and is controlled by one or both of the OS Short Date and OS Long Time settings.  Users of other operating systems will have to verify for themselves where the setting that controls the date format string can be found.  The OS short date setting controls both the way the date will appear in the Table Window and the format of the string provided from a date field to a Python Field Calculation.

 

Because different cultures use different short date and long time formats and each culture has multiple short date and long time formats to choose from, the conversion of the string to an actual date or datetime variable that can be easily manipulated to extract date information is the biggest challenge that has to be overcome in order to use the Field Calculator with the Python option.

 

 

4. Windows and Python format codes used to parse date and/or time strings

 

The format strings for the US & Canada region default Windows short date format is "M/d/yyyy" which equates to the Python format u'%m/%d/%Y' (i.e, u'12/25/2016').  The format strings for the US & Canada region default Windows long time format is 'h:mm:ss tt' which equates to the Python format u'%I:%M:%S %p' (i.e., u'1:52:20 PM').  The Python formats would need to change if you have selected an incompatible alternative format, or if you have regional settings that use a different format setting than the US & Canada default short date and long time formats.

 

The Windows and Python equivalent formats for the common date or time components are:

 

MM or M = %m (Month number with or without a leading zero)

MMM = %b (Month as locale’s abbreviated name)

MMMM = %B (Month as locale’s full name)

dd or d = %d (Day of the month number with or without a leading zero)

ddd = %a (Weekday as locale’s abbreviated name)

dddd = %A (Weekday as locale’s full name)

yy or y = %y (Year number without century with or without a leading zero)

yyyy = %Y (Year number with century)

hh or h = %I (Hour (12-hour clock) with or without a leading zero)

HH or H = %H (Hour (24-hour clock) with or without a leading zero)

mm or m = %M (Minute number with or without a leading zero)

ss or s = %S (Second number with or without a leading zero)

tt = %p (Locale’s equivalent of either AM or PM)

/, -, : = /, -, : (literal date or time separator characters)

 

Be sure to pay attention to the case of each Windows or Python code, since uppercase codes and lowercase codes parse different date or time components.  If you are using an OS other than Windows you will have to determine the equivalents between the format codes your OS uses and those Python uses to match your OS' short date and long time formats.

 

The full set of Windows US & Canada short date and long time codes and their Python code equivalents are shown below:

 

 

5. A Field Calculator calculation for parsing a formatted date and/or time in a Text field or a Date or Date-Time field and convert it into a Python datetime variable.

 

The date value formats that are provided to a Python Field Calculation from a Date or Date-Time field include Null values (Python==None), short date, short date and time, and time only (which occurs if the date is 12/30/1899 in a file gdb).  This also often applies to text fields that store a date and/or time if users have been well trained or use a front end validation interface to enter data.  The code I have designed is intended to deal with each possibility.  To do the parsing I will be using datetime.strptime method which attempts to parse a Unicode string into a datetime based on the specified date and time format strings.

 

Open the Field Calculator on a blank date field (right-click the field name in the output column in a table and choose Field Calculator from the context menu) and set the Field Calculator to the following settings:

 

Parser: Python

 

Show Codeblock: Checked

 

Pre-Logic Script Code:

def unicodeToDate( uniDate, dateFormat=u'%m/%d/%Y', timeFormat=u'%I:%M:%S %p' ):
  datetimeFormat = '{0} {1}'.format(dateFormat, timeFormat)
  if uniDate == None:
    return None
  try:
    return datetime.datetime.strptime(uniDate, dateFormat)
  except:
    try:
      return datetime.datetime.strptime(uniDate, datetimeFormat)
    except:
      try:
        # Python parses time only values as 1/1/1900 so subtract 2 days to match 12/30/1899, the date a file gdb uses.
        return datetime.datetime.strptime(uniDate, timeFormat) + timedelta(days=-2)
      except:
        return None

 

Expression: unicodeToDate( !Your_Date_Field!, dateFormat = u'%m/%d/%Y' , timeFormat = u'%I:%M:%S %p' )

 

Be sure to replace !Your_Date_Field! with a real Date/Date-Time/Time field or Text field containing dates and/or times.  The number of embedded try-except blocks can be expanded to include other formats.  This can be useful when attempting to convert a text field with multiple date and/or time formats into a true date field.  The conversion of all of the formats can be done by a single calculation as long as the order of the try-except blocks are arranged to test for the most common formats to the least common formats from top to bottom.

 

6. A Field Calculation that can return many different derived date components from the datetime variable returned by the calculation in Step 5.

 

Date fields have many components that can derived from the datetime values, and I prefer to use a single calculation to get at them all rather than many separate calculations.  This saves me from having to load many different .cal files and replace the date field used by each calculation to get a different date component.  The Calculation below uses a partIndex value ranging from 0-25 to get any of 26 components from any date field that can be parsed by the calculation in step 5.  (These calculations work for Python 2.7, but some may need to be changed to work for Python 3.4.x used by the latest version of ArcGIS Pro.)

 

Open the Field Calculator on a blank numeric field with a date component name like Year_ (right-click the field name in the output column in a table and choose Field Calculator from the context menu) and set the Field Calculator to the following settings:

 

Parser: Python

 

Show Codeblock: Checked

 

Pre-Logic Script Code:

def dateFieldPart( uniDate , partIndex = 8 , dateFormat = u'%m/%d/%Y' , timeFormat = u'%I:%M:%S %p' , offsetStartYear = 0 ):  

  # use the def defined in the previous field calculation which is included in this new calculation  
  theDate = unicodeToDate( uniDate, dateFormat=u'%m/%d/%Y', timeFormat=u'%I:%M:%S %p' )  

  # if the unicode could not be parsed to a datetime return None, otherwise use the partIndex specified by the user to return one of many different derived components of a datetime.  
  if theDate == None:  
    return None  
  try:  
    if partIndex == 0: # century (100-year periods-returns period start year-default is 00)
      return theDate.year - ((theDate.year + (100 - (offsetStartYear % 100) % 100)) % 100)
    elif partIndex == 1: # half-century (50-year periods-returns period start year-defaults end in 00 and 50)  
      return theDate.year - ((theDate.year - (50 - (100 - (offsetStartYear % 50)) % 50)) % 50)  
    elif partIndex == 2: # quarter-century (25 year periods-returns period start year-defaults end in 00, 25, 50, 75)  
      return theDate.year - ((theDate.year - (25 - (100 - (offsetStartYear % 25)) % 25)) % 25)  
    elif partIndex == 3: # decade (10 year periods-returns period start year-defaults end in 00, 10, 20, 30, ... 90)  
      return theDate.year - ((theDate.year - (10 - (100 - (offsetStartYear % 10)) % 10)) % 10)  
    elif partIndex == 4: # lustrum (5-year periods-returns period start year-defaults end in 00, 05, 10, 15, ... 95)
      return theDate.year - ((theDate.year - (5 - (100 - (offsetStartYear % 5)) % 5)) % 5)  
    elif partIndex == 5: # quadrennium  (4-year periods-returns period start year-defaults end in 00, 04, 08, 12, ... 96)  
      return theDate.year - ((theDate.year - (4 - (100 - (offsetStartYear % 4)) % 4)) % 4)  
    elif partIndex == 6: # triennium  (3-year periods-returns period start year-defaults end in 00, 03, 06, 09, ... 97 over a 300 year cycle)
      return theDate.year - ((theDate.year - (3 - (300 - (offsetStartYear % 3)) % 3)) % 3)  
    elif partIndex == 7: # biennium  (2-year periods-returns period start year-defaults end in 00, 02, 04, 06, ... 98)  
      return theDate.year - ((theDate.year - (2 - (100 - (offsetStartYear % 2)) % 2)) % 2)  
    elif partIndex == 8: # year  
      return theDate.year  
    elif partIndex == 9: # fiscal year (year-1 for January-June and year for July-December)  
      return theDate.year-(abs(theDate.month-13))//7  
    elif partIndex == 10: # semiannual  (six-month periods, with first 6 months = 1 and second 6 months = 2)  
      return (theDate.month - 1)//6 + 1  
    elif partIndex == 11: # calendar quarter  
      return (theDate.month - 1)//3 + 1  
    elif partIndex == 12: # fiscal quarter  
      return ((theDate.month + 5)%12)//3 + 1  
    elif partIndex == 13: # month of year  
      return theDate.month  
    elif partIndex == 14: # month of year percent (January=0% and 99%<December<100%)  
      return float(float(theDate.month - 1) / 12) * 100.0  
    elif partIndex == 15: # day of year number (1-365 for non-leap years and 1-366 for leap years)  
      return theDate.timetuple().tm_yday  
    elif partIndex == 16: # day of year percent (January 1 = 0% and 99% < December 31 < 100%)  
      return float(float(theDate.timetuple().tm_yday - 1) / datetime.datetime(theDate.year, 12, 31).timetuple().tm_yday) * 100.0  
    elif partIndex == 17: # day of month  
      return theDate.day  
    elif partIndex == 18: # ISO week of the year (1-52 or 1-53)  
      return theDate.isocalendar()[1]  
    elif partIndex == 19: # day is weekday/weekend (0=Weekday and 1=Weekend)  
      return theDate.weekday()//5  
    elif partIndex == 20: # day of week (0=Monday to 6=Sunday)  
      return theDate.weekday()  
    elif partIndex == 21: # day of week (0=Sunday to 6=Saturday)  
      return (theDate.weekday()+1) % 7  
    elif partIndex == 22: # season northern hemisphere  
      return get_season(theDate)  
    elif partIndex == 23: # season southern hemisphere  
      return (get_season(theDate) + 1) % 4 + 1  
    elif partIndex == 24: # moonlight percent approximate (0=no moonlight to 100=max. moonlight),   
      return moon_light(theDate, 0)  
    elif partIndex == 25: # moon phase approximate   
      return moon_light(theDate, 1)  
  except:  
    return None  

def unicodeToDate( uniDate, dateFormat=u'%m/%d/%Y', timeFormat=u'%I:%M:%S %p' ):  
  datetimeFormat = '{0} {1}'.format(dateFormat, timeFormat)  
  if uniDate == None:  
    return None  
  try:  
    return datetime.datetime.strptime(uniDate, dateFormat)  
  except:  
    try:  
      return datetime.datetime.strptime(uniDate, datetimeFormat)  
    except:  
      try:  
        # Python parses time only values as 1/1/1900 so subtract 2 days to match 12/30/1899, the date a file gdb uses. 
        return datetime.datetime.strptime(uniDate, timeFormat) + datetime.timedelta(days=-2)
      except:  
        return None  

# Approximate ranges defining seasons only.  Not precise. Use a site-package like PyEphem for precision.  
Y = 2000 # dummy leap year to allow input X-02-29 (leap day)  
seasons = [(1, (datetime.date(Y,  1,  1),  datetime.date(Y,  3, 20))),  
           (2, (datetime.date(Y,  3, 21),  datetime.date(Y,  6, 20))),  
           (3, (datetime.date(Y,  6, 21),  datetime.date(Y,  9, 22))),  
           (4, (datetime.date(Y,  9, 23),  datetime.date(Y, 12, 20))),  
           (1, (datetime.date(Y, 12, 21),  datetime.date(Y, 12, 31)))]  

def get_season(dDate):  
  dDate = datetime.date(Y, dDate.month, dDate.day)  
  return next(season for season, (start, end) in seasons  
                if start <= dDate <= end)  

# Approximate moon light and phase only.  Not precise. Use a site-package like PyEphem for precision.  
def moon_light(dDate, moon_type=0):  
  # moon_type is: 0=moonlight percent, 1=moon phase  
  month = dDate.month  
  day = dDate.day  
  year = dDate.year  
  ages = [18, 0, 11, 22, 3, 14, 25, 6, 17, 28, 9, 20, 1, 12, 23, 4, 15, 26, 7]  
  offsets = [-1, 1, 0, 1, 2, 3, 4, 5, 7, 7, 9, 9]  
  description = ['New (Totally dark)',
                 'Waxing Cresent (Increasing to full)',
                 'In its first quarter (Increasing to full)',
                 'Waxing Gibbous (Increasing to full)',
                 'Full (Full light)',
                 'Waning Gibbous (Decreasing from full)',
                 'In its last quarter (Decreasing from full)',
                 'Waning Cresent (Decreasing from full)']  
  months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",  
            "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]  
  if day == 31:  
    day = 1  
  days_into_phase = ((ages[(year + 1) % 19] +  
                      ((day + offsets[month-1]) % 30) +  
                      (year < 1900)) % 30)  
  index = int((days_into_phase + 2) * 16/59.0)  
  if index > 7:  
    index = 7  
  status = description[index]  
  # light should be 100% 15 days into phase  
  light = int(2 * days_into_phase * 100/29)  
  if light > 100:  
    light = abs(light - 200);  
  date = "%d%s%d" % (day, months[month-1], year)  
  if moon_type == 0:  
    return light  
  elif moon_type == 1:  
    return index
  elif moon_type == 2:
    return status

 

Expression:  dateFieldPart( !Your_Date_Field! , partIndex = 8 , dateFormat = u'%m/%d/%Y' , timeFormat = u'%I:%M:%S %p' , offsetStartYear = 0 )

 

Be sure to replace !Your_Date_Field! with a real Date/Date-Time field or Text field containing dates or date-times.  Although the calculation above will convert a real Time field or Text field containing time, only values derived from the date associated with the time value (i.e., 12/30/1899) will be returned.

 

The partindex parameter determines the kind of data that will be derived from the date and returned by the calculation.  Since the partIndex = 8 in this example, the Expression above will convert the field date to a Python datetime variable and return the Year of that value in each record.  If you change the partIndex to another number a different date component will be outputted.  For example, if you change the partIndex parameter to  partIndex = 0 the calculation will convert the field date to a Python datetime variable and return the Century of that value in each record.

 

The offsetStartYear parameter affects the year grouping values returned by partIndexes 0 through 7.  It adjusts the start year that begins the period interval to a custom starting year.  This can be used to satisfy those who need to mark the start of centuries from year 1, and regard the 21st century as having started on 2001, not 2000. By default the function returns centuries based on a year 0 starting year due to the mathematical simplicity of computing that value and its popular usage.  To make centuries aligned to the Year 1 as the century starting point instead, use partIndex = 0 and offsetStartYear = 1 as your parameters.  Other less traditional starting years can also be used.  For example, by setting partIndex = 0 and offsetStartYear = 50 you can define periods of 100 years (centuries) that extend from one mid-century to the next mid-century according to popular reckoning, rather than a traditional century.  Year 1 advocates can similarly use offsetStartYear = 51 to align their custom century groupings to the year that they regard as the mid-century start year.

 

Other calculations are possible, for example I used a very simple method of estimating seasons and moon phases, but those methods could be replaced with methods that use a site package like PyEphem to return very precise values.  Or I could add an option for a custom 2 week pay period cycle, primary seasonal year so that December to March seasons avoid being separated across different calendar years, or other custom groupings or divisions of years, months, day, week, hours, minutes, seconds, etc.

 

7. A sample time-based animation demo video showing some of what you can do using the calculation.

 

Here is a link to a YouTube video of a time based animation demo I created that shows the importance of being able to manipulating date fields to create alternative formats and date groupings.  This time based animation was only possible after I used my calculation to transform my original Applied_Date field into an Applied_Fiscal_Year Long field for some of my building permits data  The new field let me symbolize each Fiscal Year with a distinct color using categorized symbols.  The new field also let me Dissolve the permits to get a Count of the Permits in each Fiscal Year.  The Dissolve allowed me to create labels on the map showing the counts and a time based graph of the permit activity by Fiscal Year.  The Dissolve also dramatically improved the redraw performance over my original data.

 

Below are a three screenshots from the demo that shows different stages of building permit activity over time using both a time enabled map and a linked bar chart graph.

 

 

8. Please let me know any feedback you may have.

 

If you discover any errors in the formulas I have used in my calculations or can suggest an alternative formula that is more Pythonic or that makes the code more efficient/flexible, please let me know, so that I can incorporate your corrections/suggestions into the calculations of this Blog for the benefit of everyone.  Also if you identify any changes that need to be made to allow these calculations to be compatible with ArcGIS Pro/Python 3.4.x and above, please let me know.  I hope to eventually update this Blog with additional versions of the calculations that have been tested for compatibility and optimized for use with ArcGIS Pro/Python 3.4.x and above.

 

I hope this helps you uncork your Time in a Bottle.

 

 

Revisions:

Corrected formula for the case when the only the timeFormat parses the date on 7/2/2016 at 8:26 AM PST.

Corrected offsetStartYear formulas on 7/2/2016 8:26 AM PST.

Corrected descriptions of formulas that use offsetStartYear on 7/2/2016 12:50 PM PST and 7/3/2016 8:06 AM PST.

Outcomes