BookmarkSubscribeRSS Feed
Prudhvi_007
Calcite | Level 5

I have 4 datasets - A,B,C,D. Each dataset has around 500 observations. The subjects need to be scrambled within the available subject ID's. For example Actual subjects are 101,102,103,104 , they need to be scrambled as 103,101,104,102 . I am instructed to merge 4 datasets and apply scrambling for those which have subjects in dataset A. For example if dataset A has 101 as subject id and any of the datasets B,C or D has 101 as Subject id it should be scrambled with same number. Unmatched subjects should remain as it is .

Eg 1. A-101 B-101 C-101 D-101 -Actual subjects.

         A-103  B-103 C-103 D-103 - Scrambled subjects should be same across the datasets for a  particular subject.

Eg 2. A-104 B-105 C-109 D-104 - Actual subjects

         A-107 B-108 C-106 D-107 - Please observe scrambling for Dataset A and D.

Please help me with this requirement .

3 REPLIES 3
LinusH
Tourmaline | Level 20

Sorry, I don't really understand the requirement.

I would never tweak identifiers and map to other exisrting ones. Why?

For the sake of anomonymization I can see a reason to transform id's, but that is usuallly to a new value domain.

Nevertheless, I guess if you create a mapping table with original id, and the corresponding scrambled id, you can use that to remap data fram any data set.

Data never sleeps
Prudhvi_007
Calcite | Level 5

Hi . Could please provide me a sample code for mapping table and remapping . It would be helpful.

s_lassen
Meteorite | Level 14

Sort your datasets by ID.

 

Then create a dataset with the IDs from A in random order:

proc sql;                                  
  create table newids as select            
    id as new_id from a order by ranuni(0);
quit;

And put that together with the original IDs from A:

data scramble;   
  set a(keep=id);
  set newids;    
run;

Then for each of your original datasets, do the following:

data X;                    
  merge X(in=ok) scramble; 
  by id;                   
  if ok;                   
  id=coalesce(new_id,id);  
  drop new_id;             
run;

The COALESCE function is not necessary for the A dataset, but it does not hurt much (a very slight performance penalty).

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 751 views
  • 1 like
  • 3 in conversation