Anonomize License Plate Data for Parking Survey Using Python Code

290
4
Jump to solution
01-24-2022 07:05 AM
Labels (2)
MattCotterill
New Contributor III

We are hoping to anonymously track parking behavior in several two-hour parking areas by reading license plates at various times in a day and recording how often individual vehicles move. I believe we should be able to do this with python code that reads each license plate and saves it as a variable containing random characters. Then, if is reads the same license plate later in the data set, it will "remember" it and change it to the same set of characters. We should do this for both the "State" Field and the "Number" field. I think I should be able to do this in the field calculator.

0 Kudos
1 Solution

Accepted Solutions
BrianWilson7
Regular Contributor

That would almost work in a CalculateField code block. The license plate comes into the function as "a", and a.encode() is doing some conversion thing to the string so that it's acceptable to hashlib. That is "a" is the string variable that will have the original license plate in it.

Here is my screenshot.

Yes, every time you run the plate through the hash function, the result will be the same.

If you want to keep the contents of the column so you can re-run it instead of transforming once and all the original plates are gone forever it would look like this, you can see the table has 2 columns "plate" and "encrypted" and I am doing the Calc on the "encrypted" column and feeding the "fn()" function with the contents of the "plate" column. 

BrianWilson7_1-1643402359341.png

I almost always try to write expressions so that I can run and rerun them on the same dataset as I debug the code so I don't lose the original data.

 

TL;DR About the math, the function is one-way, which means with the same input it should always produce the same output, but it can't go the other direction. Given the hash you can't get the license plate. They use this for storing passwords. When you log in, the password you type gets pushed through the hash function, and then the result is compared with the stored hash from a table or database. If they match, you typed it correctly, but anyone who steals the password table cannot see your password. This is why sysadmins say "I can reset your password for you but I can't tell you what the old one was."

 

View solution in original post

4 Replies
BrianWilson7
Regular Contributor

You can use any one-way hashing function like SHA1.

https://www.mytecbits.com/internet/python/sha1-hash-code

import hashlib
a = 'XPF855'.encode()
h = hashlib.sha1(a)
hexa = h.hexdigest()
b = 'XPF856'.encode()
h = hashlib.sha1(b)
hexb = h.hexdigest()
print(hexa, hexb)

This produces two completely different encrypted strings even though the plates differ by one character.

2c078090aa0febc3a5daacc243bf3d660f7eb41a

def426fc95087ed1c1ff481cbc887c9f78716de5

You should be able to make that into a calc field function in the code block with something like

import hashlib

def fn(a):

  h=hashlib.sha1(a.encode())

  return h.hexdigest()

but be aware there is no easy way to reverse this function, that's the whole point. So if you run this the original contents of the field are gone forever

MattCotterill
New Contributor III

HI Brian,

 

Thanks for this! I should have mentioned that I am a beginning python user and so I'm not exactly sure where and how to enter this code. I've attached a screen shot of a sample data set, and I was hoping to enter the code in the field calculator; would it look like this?:

import hashlib

def fn(a):

  h=hashlib.sha1(a.encode(LicensePlate))

  return h.hexdigest()

 

Also, my dataset would have several features with the same license plate but different time stamps. Would the code "remember" each license plate and change it to the same string as before? Is that embedded in the code that I would be importing when I "import hashlib" at the beginning?

Thanks, and apologies for being slow...

0 Kudos
BrianWilson7
Regular Contributor

That would almost work in a CalculateField code block. The license plate comes into the function as "a", and a.encode() is doing some conversion thing to the string so that it's acceptable to hashlib. That is "a" is the string variable that will have the original license plate in it.

Here is my screenshot.

Yes, every time you run the plate through the hash function, the result will be the same.

If you want to keep the contents of the column so you can re-run it instead of transforming once and all the original plates are gone forever it would look like this, you can see the table has 2 columns "plate" and "encrypted" and I am doing the Calc on the "encrypted" column and feeding the "fn()" function with the contents of the "plate" column. 

BrianWilson7_1-1643402359341.png

I almost always try to write expressions so that I can run and rerun them on the same dataset as I debug the code so I don't lose the original data.

 

TL;DR About the math, the function is one-way, which means with the same input it should always produce the same output, but it can't go the other direction. Given the hash you can't get the license plate. They use this for storing passwords. When you log in, the password you type gets pushed through the hash function, and then the result is compared with the stored hash from a table or database. If they match, you typed it correctly, but anyone who steals the password table cannot see your password. This is why sysadmins say "I can reset your password for you but I can't tell you what the old one was."

 

MattCotterill
New Contributor III

It worked, thanks very much!

0 Kudos