BookmarkSubscribeRSS Feed
Prudhvi_007
Calcite | Level 5

I have 4 datasets - A,B,C,D. Each dataset has around 500 observations. The subjects need to be scrambled within the available subject ID's. For example Actual subjects are 101,102,103,104 , they need to be scrambled as 103,101,104,102 . I am instructed to merge 4 datasets and apply scrambling for those which have subjects in dataset A. For example if dataset A has 101 as subject id and any of the datasets B,C or D has 101 as Subject id it should be scrambled with same number. Unmatched subjects should remain as it is .

Eg 1. A-101 B-101 C-101 D-101 -Actual subjects.

         A-103  B-103 C-103 D-103 - Scrambled subjects should be same across the datasets for a  particular subject.

Eg 2. A-104 B-105 C-109 D-104 - Actual subjects

         A-107 B-108 C-106 D-107 - Please observe scrambling for Dataset A and D.

Please help me with this requirement .

3 REPLIES 3
LinusH
Tourmaline | Level 20

Sorry, I don't really understand the requirement.

I would never tweak identifiers and map to other exisrting ones. Why?

For the sake of anomonymization I can see a reason to transform id's, but that is usuallly to a new value domain.

Nevertheless, I guess if you create a mapping table with original id, and the corresponding scrambled id, you can use that to remap data fram any data set.

Data never sleeps
Prudhvi_007
Calcite | Level 5

Hi . Could please provide me a sample code for mapping table and remapping . It would be helpful.

s_lassen
Meteorite | Level 14

Sort your datasets by ID.

 

Then create a dataset with the IDs from A in random order:

proc sql;                                  
  create table newids as select            
    id as new_id from a order by ranuni(0);
quit;

And put that together with the original IDs from A:

data scramble;   
  set a(keep=id);
  set newids;    
run;

Then for each of your original datasets, do the following:

data X;                    
  merge X(in=ok) scramble; 
  by id;                   
  if ok;                   
  id=coalesce(new_id,id);  
  drop new_id;             
run;

The COALESCE function is not necessary for the A dataset, but it does not hurt much (a very slight performance penalty).

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 558 views
  • 1 like
  • 3 in conversation