Anonomize License Plate Data for Parking Survey Using Python Code

766
6
Jump to solution
01-24-2022 07:05 AM
MattCotterill
Occasional Contributor

We are hoping to anonymously track parking behavior in several two-hour parking areas by reading license plates at various times in a day and recording how often individual vehicles move. I believe we should be able to do this with python code that reads each license plate and saves it as a variable containing random characters. Then, if is reads the same license plate later in the data set, it will "remember" it and change it to the same set of characters. We should do this for both the "State" Field and the "Number" field. I think I should be able to do this in the field calculator.

0 Kudos
1 Solution

Accepted Solutions
Brian_Wilson
Occasional Contributor III

That would almost work in a CalculateField code block. The license plate comes into the function as "a", and a.encode() is doing some conversion thing to the string so that it's acceptable to hashlib. That is "a" is the string variable that will have the original license plate in it.

Here is my screenshot.

Yes, every time you run the plate through the hash function, the result will be the same.

If you want to keep the contents of the column so you can re-run it instead of transforming once and all the original plates are gone forever it would look like this, you can see the table has 2 columns "plate" and "encrypted" and I am doing the Calc on the "encrypted" column and feeding the "fn()" function with the contents of the "plate" column. 

BrianWilson7_1-1643402359341.png

I almost always try to write expressions so that I can run and rerun them on the same dataset as I debug the code so I don't lose the original data.

 

TL;DR About the math, the function is one-way, which means with the same input it should always produce the same output, but it can't go the other direction. Given the hash you can't get the license plate. They use this for storing passwords. When you log in, the password you type gets pushed through the hash function, and then the result is compared with the stored hash from a table or database. If they match, you typed it correctly, but anyone who steals the password table cannot see your password. This is why sysadmins say "I can reset your password for you but I can't tell you what the old one was."

 

View solution in original post

6 Replies
Brian_Wilson
Occasional Contributor III

You can use any one-way hashing function like SHA1.

https://www.mytecbits.com/internet/python/sha1-hash-code

import hashlib
a = 'XPF855'.encode()
h = hashlib.sha1(a)
hexa = h.hexdigest()
b = 'XPF856'.encode()
h = hashlib.sha1(b)
hexb = h.hexdigest()
print(hexa, hexb)

This produces two completely different encrypted strings even though the plates differ by one character.

2c078090aa0febc3a5daacc243bf3d660f7eb41a

def426fc95087ed1c1ff481cbc887c9f78716de5

You should be able to make that into a calc field function in the code block with something like

import hashlib

def fn(a):

  h=hashlib.sha1(a.encode())

  return h.hexdigest()

but be aware there is no easy way to reverse this function, that's the whole point. So if you run this the original contents of the field are gone forever

MattCotterill
Occasional Contributor

HI Brian,

 

Thanks for this! I should have mentioned that I am a beginning python user and so I'm not exactly sure where and how to enter this code. I've attached a screen shot of a sample data set, and I was hoping to enter the code in the field calculator; would it look like this?:

import hashlib

def fn(a):

  h=hashlib.sha1(a.encode(LicensePlate))

  return h.hexdigest()

 

Also, my dataset would have several features with the same license plate but different time stamps. Would the code "remember" each license plate and change it to the same string as before? Is that embedded in the code that I would be importing when I "import hashlib" at the beginning?

Thanks, and apologies for being slow...

0 Kudos
Brian_Wilson
Occasional Contributor III

That would almost work in a CalculateField code block. The license plate comes into the function as "a", and a.encode() is doing some conversion thing to the string so that it's acceptable to hashlib. That is "a" is the string variable that will have the original license plate in it.

Here is my screenshot.

Yes, every time you run the plate through the hash function, the result will be the same.

If you want to keep the contents of the column so you can re-run it instead of transforming once and all the original plates are gone forever it would look like this, you can see the table has 2 columns "plate" and "encrypted" and I am doing the Calc on the "encrypted" column and feeding the "fn()" function with the contents of the "plate" column. 

BrianWilson7_1-1643402359341.png

I almost always try to write expressions so that I can run and rerun them on the same dataset as I debug the code so I don't lose the original data.

 

TL;DR About the math, the function is one-way, which means with the same input it should always produce the same output, but it can't go the other direction. Given the hash you can't get the license plate. They use this for storing passwords. When you log in, the password you type gets pushed through the hash function, and then the result is compared with the stored hash from a table or database. If they match, you typed it correctly, but anyone who steals the password table cannot see your password. This is why sysadmins say "I can reset your password for you but I can't tell you what the old one was."

 

MattCotterill
Occasional Contributor

It worked, thanks very much!

0 Kudos
cashbay_Fishbeck
New Contributor II

Hi Bryan,

Did you get this working?  I am working on the same type of project and am a beginner python user.

Any advice?  


Thank you!

0 Kudos
MattCotterill
Occasional Contributor

Hi cashbay,

It worked for me. Right-click on the field you want to populate with the encrypted string and select "Field Calculator. Then copy the content from Brian's screenshot above substituting your own field names for mine ('encrypted' and 'plate').

Let me know how it works!

Matt

 

0 Kudos