I have a repeated measurement data of individuals. How to mask the unique identifier variable (before data sharing) while keeping nature of repeated records and other logics embedded? Uniq_id variable is very long in and length and numeric. Using SAS 9.4.
data temp;
input uniq_id;
datalines;
2007122345567889
2007122345567889
2007122345567889
2008235689875421
2008235689875421
2008235689875421
;
data temp; set temp;
format uniq_id 20.;
run;
1. Create a list of your ID's, only unique values
2. Create a list of random IDs in the data set from step1, keeping the seed value stored - you'll want to keep track of the seeds over time so I recommend keeping a master file of seeds.
3. Match ID to RandomID so that an ID for a person is constant throughout the data set but it doesn't have the any significance.
Fully worked example here:
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
1. Create a list of your ID's, only unique values
2. Create a list of random IDs in the data set from step1, keeping the seed value stored - you'll want to keep track of the seeds over time so I recommend keeping a master file of seeds.
3. Match ID to RandomID so that an ID for a person is constant throughout the data set but it doesn't have the any significance.
Fully worked example here:
https://gist.github.com/statgeek/fd94b0b6e78815430c1340e8c19f8644
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.