BookmarkSubscribeRSS Feed
Prudhvi_007
Calcite | Level 5

I have 4 datasets - A,B,C,D. Each dataset has around 500 observations. The subjects need to be scrambled within the available subject ID's. For example Actual subjects are 101,102,103,104 , they need to be scrambled as 103,101,104,102 . I am instructed to merge 4 datasets and apply scrambling for those which have subjects in dataset A. For example if dataset A has 101 as subject id and any of the datasets B,C or D has 101 as Subject id it should be scrambled with same number. Unmatched subjects should remain as it is .

Eg 1. A-101 B-101 C-101 D-101 -Actual subjects.

         A-103  B-103 C-103 D-103 - Scrambled subjects should be same across the datasets for a  particular subject.

Eg 2. A-104 B-105 C-109 D-104 - Actual subjects

         A-107 B-108 C-106 D-107 - Please observe scrambling for Dataset A and D.

Please help me with this requirement .

3 REPLIES 3
LinusH
Tourmaline | Level 20

Sorry, I don't really understand the requirement.

I would never tweak identifiers and map to other exisrting ones. Why?

For the sake of anomonymization I can see a reason to transform id's, but that is usuallly to a new value domain.

Nevertheless, I guess if you create a mapping table with original id, and the corresponding scrambled id, you can use that to remap data fram any data set.

Data never sleeps
Prudhvi_007
Calcite | Level 5

Hi . Could please provide me a sample code for mapping table and remapping . It would be helpful.

s_lassen
Meteorite | Level 14

Sort your datasets by ID.

 

Then create a dataset with the IDs from A in random order:

proc sql;                                  
  create table newids as select            
    id as new_id from a order by ranuni(0);
quit;

And put that together with the original IDs from A:

data scramble;   
  set a(keep=id);
  set newids;    
run;

Then for each of your original datasets, do the following:

data X;                    
  merge X(in=ok) scramble; 
  by id;                   
  if ok;                   
  id=coalesce(new_id,id);  
  drop new_id;             
run;

The COALESCE function is not necessary for the A dataset, but it does not hurt much (a very slight performance penalty).

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 737 views
  • 1 like
  • 3 in conversation