I'd also recommend you take a look at the SDI Toolkit Extension from Search Discovery. They added functionality that might be of interest to you. It is a one-way hash but will hash to consistent values.
Here is their description:
Converts a data element to a consistent obfuscated string.
One way hashing is commonly used to obfuscate sensitive information for analytics purposes. Hashing an email address or a customer id will return the same obfuscated value every time for tracking purposes.
firstname.lastname@example.org => 4da3a5736c67e46036f26146662a6e9491176a63de36b49b7ed2c5b47b37e96e
read.csv is actually an R function. It pulls in the file, /Users/Eric/Documents/R-Data-Files/custID.csv and puts it into a new data frame by the name, sampleCRMdata.
The file, custID.csv, can have as many columns as you want as long as it has one column with a header of "custID".
Yes, The same hashing function (SHA256) can be used on the keys within your CRM so that you can make the connection between the hashed values in your analytics data and your customers in your CRM.
Here's how I would do it in R. Take note that I'm applying the same data normalization (force to lowercase & trim whitespace) as I do within the SDI Toolkit's one-way-hash.
#####THIS IS AN R Script
sampleCRMdata$custIDHashed <- sapply( tolower( trimws( sampleCRMdata$custID ) ), digest, algo="sha256", serialize=FALSE)
#####HERE ENDS THE R SCRIPT
My advice to you would be to run a small sample of your CRM custId's and make sure that you get the same hashed output from both ends (Launch client-side and CRM server-side) before you go great guns with data collection (or in using the hash as an integration key in ECID service or AAM).
I think the main problem is that you have the value in the querystring. Even if you obfuscate the values sent in the analytics payload, the plaintext version will still be sent in the referrer. We had a similar issue with email addresses in plaintext in URLs. The only surefire solution is to stop putting PII in URLs.
I had not thought about using R. Thanks for the tip...
I have the following set up...
custID.csv contains 1 column, 2 rows, 1 ID in each row.
read.csv is a blank file.
When I run your code, I get the following. Error message in bold.
> sampleCRMdata<-read.csv(file="custID.csv", header=TRUE)
> sampleCRMdata$custIDHashed <- sapply( tolower( trimws( sampleCRMdata$custID ) ), digest, algo="sha256", serialize=FALSE)
Error in `$<-.data.frame`(`*tmp*`, custIDHashed, value = list()) :
replacement has 0 rows, data has 1
Do you have any recommendations?
We have customer user IDs. These customer user IDs are exposed as plain text in the URL string. It's our company policy not to store plain text customer user IDs in 3rd party tools such as Adobe Analytics. We are allowed to store these customer user IDs as obfuscated/encrypted values. I am looking for a way to use Adobe Launch to obfuscate or encrypt these values and push to Adobe Analytics.