I have a repeated measurement data of individuals. How to mask the unique identifier variable (before data sharing) while keeping nature of repeated records and other logics embedded? Uniq_id variable is very long in and length and numeric. Using SAS 9.4.
data temp;
input uniq_id;
datalines;
2007122345567889
2007122345567889
2007122345567889
2008235689875421
2008235689875421
2008235689875421
;
data temp; set temp;
format uniq_id 20.;
run;
1. Create a list of your ID's, only unique values
2. Create a list of random IDs in the data set from step1, keeping the seed value stored - you'll want to keep track of the seeds over time so I recommend keeping a master file of seeds.
3. Match ID to RandomID so that an ID for a person is constant throughout the data set but it doesn't have the any significance.
Fully worked example here:
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
1. Create a list of your ID's, only unique values
2. Create a list of random IDs in the data set from step1, keeping the seed value stored - you'll want to keep track of the seeds over time so I recommend keeping a master file of seeds.
3. Match ID to RandomID so that an ID for a person is constant throughout the data set but it doesn't have the any significance.
Fully worked example here:
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.