Python: Replacing strings in a list with values from a dictionary.

deleted-user-UWSzsDdSZSYb · ‎11-24-2017

I am writing code to edit the field names in a csv file to be imported into ArcMap. I have all the field names in a list of strings. I have a dictionary with the current field name as the key, and the new field name as the value. I am trying to iterate through the list of current field names and replace them with the values in the dictionary. I am trying to use the .replace() method and it is not working for me. Here is my current code. Any suggestions?

headersDict = {'Number; SEX AND AGE - Total Population':'TPop', 'Number; SEX AND AGE - Male Population':'MPop', 'Number; SEX AND AGE - Female Population':'FPop',
 'Under 5 years': '<5', '5 to 9 years': '5_9', '10 to 14 years': '10_14', '15 to 19 years': '15_19', '20 to 24 years': '20_24', '25 to 29 years': '25_29',
 '30 to 34 years': '30_34', '35 to 39 years': '35_39', '40 to 44 years': '40_44', '45 to 49 years': '45_49', '50 to 54 years': '50_54', '55 to 59 years': '55_59'
 ,'60 to 64 years': '60_64', '65 to 69 years': '65_69', '70 to 74 years': '70_74', '75 to 79 years': '75_79', '80 to 84 years': '80_84', '85 years and over': '85+',
 'Median age(years)': 'Medage', '16 years and over': '16+', '18 years and over': '18+', '21 years and over': '21+', '62 years and over': '62+',
 '65 years and over': '65+'}

 
 for header in headersList:
 header.replace('Number; SEX AND AGE - Total Population', headersDict['Number; SEX AND AGE - Total Population'])
 outfile.write(header)

RandyBurton · ‎11-24-2017

Try this:

headersDict = {'Number; SEX AND AGE - Total Population':'TPop',
               'Number; SEX AND AGE - Male Population':'MPop',
               'Number; SEX AND AGE - Female Population':'FPop',
               'Under 5 years': '<5',
               '5 to 9 years': '5_9',
               '10 to 14 years': '10_14',
               '15 to 19 years': '15_19',
               '20 to 24 years': '20_24',
               '25 to 29 years': '25_29',
               '30 to 34 years': '30_34',
               '35 to 39 years': '35_39',
               '40 to 44 years': '40_44',
               '45 to 49 years': '45_49',
               '50 to 54 years': '50_54',
               '55 to 59 years': '55_59',
               '60 to 64 years': '60_64',
               '65 to 69 years': '65_69',
               '70 to 74 years': '70_74',
               '75 to 79 years': '75_79',
               '80 to 84 years': '80_84',
               '85 years and over': '85+',
               'Median age(years)': 'Medage',
               '16 years and over': '16+',
               '18 years and over': '18+',
               '21 years and over': '21+',
               '62 years and over': '62+',
               '65 years and over': '65+'}

headersList = [ '10 to 14 years',
                '15 to 19 years'
                ]
# list comprehension
headersList = [ headersDict.get(item,item) for item in headersList ]

print headersList‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Several of the values in the dictionary may be problematic as field names. "Under 5", "16 and over", etc. use symbols that may cause issues. There may also be a problem in starting field names with a number. See: FAQ: What characters should not be used in ArcGIS for field names and table names?

deleted-user-UWSzsDdSZSYb · ‎11-25-2017

Thanks for the idea about the symbols that may cause issues. This code still did not work for what I am trying to do. For example, a current field name reads: 'NUMBER, SEX AND AGE - Total Population - 15 to 19 years'. This needs to be changed to 'TPop15_19'.

RandyBurton · ‎11-25-2017

In your first bit of code there is a semicolon after 'NUMBER', and in your latest response there is a comma. Replacing with this method will require an exact match. Extra spaces, long versus short dashes, and non-printable characters in your original csv header can cause problems.

Also, since you are working with a CSV file, you need to make sure each item in the header row is properly quoted when there is a comma used.

RandyBurton · ‎11-26-2017

As a follow-up to my suggestions previously, I am assuming you are reading the CSV file, correcting the header, and then writing a new CSV. Something like:

import csv
import os

inputFileName = "test.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_modified.csv"

headersDict = {'NUMBER, SEX AND AGE - Total Population - Under 5 years': 'TPop00_05',
               'NUMBER, SEX AND AGE - Total Population - 5 to 9 years': 'TPop05_09',
               'NUMBER, SEX AND AGE - Total Population - 10 to 14 years': 'TPop10_14',
               'NUMBER, SEX AND AGE - Total Population - 15 to 19 years': 'TPop15_19',
               'NUMBER, SEX AND AGE - Total Population - 20 to 24 years': 'TPop20_24',
               'NUMBER, SEX AND AGE - Total Population - 25 to 29 years': 'TPop25_29'
               }

with open(inputFileName, 'rb') as inFile, open(outputFileName, 'wb') as outfile:
    r = csv.reader(inFile)
    w = csv.writer(outfile)

    # read the first row from the reader, the old header
    header = next(r)
    print header

    # replace items in header using dictionary, if item not in dictionary, it will not be replaced
    newHeader = [ headersDict.get(item,item) for item in header ]
    print newHeader
    
    # write new header
    w.writerow(newHeader)

    # copy the rest
    for row in r:
        w.writerow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If not, can you describe your process.

deleted-user-UWSzsDdSZSYb · ‎11-27-2017

Yes that works thank you. How would I do this without using the csv module?

RandyBurton · ‎11-27-2017

I think it is best to use the CSV module. If you just import the CSV directly, ArcMap will create its own field names and use the actual field names as an alias. Fixing the CSV gives you control over the field names that will be used. You can always add longer aliases later if you want.