BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ravindra_
Quartz | Level 8

I have a dataset with almost more than 1000 subjects and they have subjid in the format of 303-101-120, 303-100-130, 303-103-140........ I need to scramble the subject id, i.e i need to assign a subejct id from one record to other record and so on but i should not make any modification to the subject id for example: 1,2,3,4,5 are subject id's and these need to assigned as 3,5,1,4,2. here we are not making any modification but just scrambling subject id. I am trying to use Ranuni but not sure i am not able to aceive the result. Is there any possible way i can do that in few steps as its very hectic to change that manually. i have provided a sample code for an example, i have 1000+usubjid. Any help please

data ndsn;
infile datalines;
input subjid  study  site  usubjid $15.;
datalines;
120 303 100 303-100-120
121 303 100 303-100-121
122 303 101 303-101-122
123 303 101 303-101-123
124 303 102 303-102-124
125 303 102 303-102-125
126 303 103 303-103-126
127 303 103 303-103-127
128 303 104 303-104-128
129 303 104 303-104-129
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

@Ravindra_ wrote:
Thanks for the reply, i think this working for me, can i know how do i apply this on site and sitename variable(Not mentioned in the sample program) as well please

You're welcome.

 

Let's create a sitename variable for demonstration (you don't need this step, of course):

data have;
set ndsn;
length sitename $30;
sitename=put(site,words30.);
run;

The code below scrambles variable USUBJID and -- independently -- the combination of SITE and SITENAME.

proc surveyselect data=have(keep=usubjid) seed=3141 noprint
                   out=scrm_id(keep=usubjid) samprate=1 outrandom;
run;

proc surveyselect data=have(keep=site sitename) seed=2718 noprint
                   out=scrm_site(keep=site sitename) samprate=1 outrandom;
run;

data want;
merge have(drop=usubjid site sitename) scrm_id scrm_site;
run;

So, the principle is the same as before: PROC SURVEYSELECT is applied to one or more selected variables (specified in the KEEP= options). Different (positive) seed values ensure independence. The final MERGE step (without BY statement) combines the scrambled output datasets from PROC SURVEYSELECT "side by side" with the remaining variables of the original observations. (The NOPRINT option in the PROC steps is optional.)

 

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

Define "scramble". Do you mean you want to change the records to some random order? Show us a desired output.

--
Paige Miller
Ksharp
Super User
data ndsn;
infile datalines;
input subjid  study  site  usubjid $15.;
datalines;
120 303 100 303-100-120
121 303 100 303-100-121
122 303 101 303-101-122
123 303 101 303-101-123
124 303 102 303-102-124
125 303 102 303-102-125
126 303 103 303-103-126
127 303 103 303-103-127
128 303 104 303-104-128
129 303 104 303-104-129
;
run;

data id;
 set ndsn(keep=usubjid);
 call streaminit(123456789);
 id=rand('uniform');
run;
proc sort data=id;by id;run;
data temp;
 merge ndsn(keep=usubjid) id(keep=usubjid rename=(usubjid=new_usubjid));
run;


data want;
 merge ndsn temp;
 by usubjid;
run;
FreelanceReinh
Jade | Level 19

Hi @Ravindra_,

 

Do you want to randomly permute the values in variable USUBJID and leave the other three variables unchanged?

If so, try this:

proc surveyselect data=ndsn(keep=usubjid) seed=3141 
                   out=scrm(keep=usubjid) samprate=1 outrandom;
run;

data want;
merge ndsn(drop=usubjid) scrm;
run;

 

Edit: Needless to say, you can apply the same technique to variable SUBJID. By using the STRATA statement of PROC SURVEYSELECT (requires a sorted input dataset) you can scramble SUBJID within STUDY or (as shown below) within STUDY SITE:

proc surveyselect data=ndsn seed=3141 
                   out=scrm(keep=subjid) samprate=1 outrandom;
strata study site;
run;

data want;
merge scrm ndsn(drop=subjid);
run;
Ravindra_
Quartz | Level 8
Thanks for the reply, i think this working for me, can i know how do i apply this on site and sitename variable(Not mentioned in the sample program) as well please
FreelanceReinh
Jade | Level 19

@Ravindra_ wrote:
Thanks for the reply, i think this working for me, can i know how do i apply this on site and sitename variable(Not mentioned in the sample program) as well please

You're welcome.

 

Let's create a sitename variable for demonstration (you don't need this step, of course):

data have;
set ndsn;
length sitename $30;
sitename=put(site,words30.);
run;

The code below scrambles variable USUBJID and -- independently -- the combination of SITE and SITENAME.

proc surveyselect data=have(keep=usubjid) seed=3141 noprint
                   out=scrm_id(keep=usubjid) samprate=1 outrandom;
run;

proc surveyselect data=have(keep=site sitename) seed=2718 noprint
                   out=scrm_site(keep=site sitename) samprate=1 outrandom;
run;

data want;
merge have(drop=usubjid site sitename) scrm_id scrm_site;
run;

So, the principle is the same as before: PROC SURVEYSELECT is applied to one or more selected variables (specified in the KEEP= options). Different (positive) seed values ensure independence. The final MERGE step (without BY statement) combines the scrambled output datasets from PROC SURVEYSELECT "side by side" with the remaining variables of the original observations. (The NOPRINT option in the PROC steps is optional.)

 

Ravindra_
Quartz | Level 8
Thanks a lot for this help, i had accepted this as a solution, my report was validated and it was accepted by the client. thanks a lot for your time
Ravindra_
Quartz | Level 8
Hello, i have a follow up question over here, i have got 3 datasets where i need to blind only the subjects those were having a specific condition, for example CM dataset has a variable called CMANG and if CMANG =Yes only then i need to scramble or else the rest of the subjects should remain same without any scrambling. Client is having this requirement only for 3 datasets and the rest of the datasets will be scrambled as per earlier solution you had provided. Can you please help me with this. I tried to use where condition but not sure if it is working.
FreelanceReinh
Jade | Level 19

Hello @Ravindra_,

 

I think the easiest way to proceed is to split the dataset to be partially blinded into two temporary datasets: one (say A) with the subjects satisfying the condition, and one (say B) with the remaining subjects. Then you can apply the same scrambling technique to dataset A alone (result: A1) and finally interleave A1 and B.

 

This assumes that the structure of the original dataset is similar to that shown in your initial post, in particular: one observation per subject. If this is not the case, you may want to develop a more general blinding strategy (involving a blinding list applicable to both types of datasets, those with unique and those with multiple subject IDs). If you need help with that, please open a new thread.

Sajid01
Meteorite | Level 14

Hello @Ravindra_ 
From your question the following statements are a bit confusing
(1)I need to scramble the subject id, i.e i need to assign a subejct id from one record to other record and so on (2)but i should not make any modification to the subject id for example: 1,2,3,4,5 are subject id's and

(3) these need to assigned as 3,5,1,4,2.

(4) here we are not making any modification but just scrambling subject id"

Can you please clarify?

Ravindra_
Quartz | Level 8

Thank you all for the response, although i did not try every solution given by you all, one of yours had helped me, i thnak you all for taking time in helping me.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2271 views
  • 2 likes
  • 5 in conversation