I have 4 datasets - A,B,C,D. Each dataset has around 500 observations. The subjects need to be scrambled within the available subject ID's. For example Actual subjects are 101,102,103,104 , they need to be scrambled as 103,101,104,102 . I am instructed to merge 4 datasets and apply scrambling for those which have subjects in dataset A. For example if dataset A has 101 as subject id and any of the datasets B,C or D has 101 as Subject id it should be scrambled with same number. Unmatched subjects should remain as it is .
Eg 1. A-101 B-101 C-101 D-101 -Actual subjects.
A-103 B-103 C-103 D-103 - Scrambled subjects should be same across the datasets for a particular subject.
Eg 2. A-104 B-105 C-109 D-104 - Actual subjects
A-107 B-108 C-106 D-107 - Please observe scrambling for Dataset A and D.
Please help me with this requirement .
Sorry, I don't really understand the requirement.
I would never tweak identifiers and map to other exisrting ones. Why?
For the sake of anomonymization I can see a reason to transform id's, but that is usuallly to a new value domain.
Nevertheless, I guess if you create a mapping table with original id, and the corresponding scrambled id, you can use that to remap data fram any data set.
Hi . Could please provide me a sample code for mapping table and remapping . It would be helpful.
Sort your datasets by ID.
Then create a dataset with the IDs from A in random order:
proc sql;
create table newids as select
id as new_id from a order by ranuni(0);
quit;
And put that together with the original IDs from A:
data scramble;
set a(keep=id);
set newids;
run;
Then for each of your original datasets, do the following:
data X;
merge X(in=ok) scramble;
by id;
if ok;
id=coalesce(new_id,id);
drop new_id;
run;
The COALESCE function is not necessary for the A dataset, but it does not hurt much (a very slight performance penalty).
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.