BookmarkSubscribeRSS Feed
stellapersis7
Obsidian | Level 7

Hi all,

I need some help with coding. I want the IDs in a single column and flag the study ids as 1 and control ids as 0.

Please see below

data I have

study id  control id  study id duration  controlid duration
x a 10 25
x b 10 45
x c  10 103
x d 10 150
y m 45 91
y n 45 123
y o 45 321
y p 45 147

Data I want

ID flag duration
x 1 10
a 0 25
b 0 45
c 0 103
d 0 150
y 1 45
m 0 91
n 0 123
o 0 321
p 0 147

The dataset is as follows:

data WORK.CC;
infile datalines dsd truncover;
input study_id:32. control_id:32. study_duration:32. control_duration:32.;
label study_id="Enrollee ID" control_id="Enrollee ID";
datalines;
1096503 737477001 141 1038
1096503 1350295201 141 1069
1096503 2661039302 141 640
1096503 33596554301 141 415
15101702 2266821602 107 949
15101702 2844664901 107 1078
15101702 4272672001 107 810
15101702 33155637601 107 712
27264303 1324920304 36 40
27264303 3409397803 274 448
27264303 3955249504 274 340
27264303 4859531601 36 123
;;;;

Please help.

thank you in advance.

4 REPLIES 4
ballardw
Super User

You don't provide an explicit rule for what the Flag means.

 

Where did all of the X and Y go? If you have 4 input X values then you need to explicitly state a rule for why there is only one.

 

The general "transpose" would go like this:

data work.want;
   set work.cc;
   array A (*) study_id        control_id;
   array d (*) study_duration  control_duration;
   do i=1 to dim(A);
      id= A[i];
      Duration = d[i];
      flag=(i=1);
      output;
   end;
   keep id duration flag;
run;

      

If you want a specific order in the output that should also be stated because that example needs work if there is an expected order.

stellapersis7
Obsidian | Level 7

Hi Ballardw,

Thanks for the response.

I matched cases and controls in my study (1:4 matching) and each case and control has a drug utilization duration. So now I need all the ids in one column, however I need all the cases (study_id) to be coded as 1 and all the controls (control_id) to be coded as 0 for differentiation. 

I used the code you gave, but it gives me repetitive study ids for each control id, like in 1:1 matching.

Screen Shot 2024-06-12 at 14.59.25.png

Can you help me fix it.

Thank you.

ballardw
Super User

 

I guessed at the rule for the flag because your data set does not have any of the values that matched your output.

I still do not see a clear definition of the rule for assigning the value of the flag.

You have at least one other requirement (i suspect actually more) that you have not provided a programming rule for. Namely, WHY is there only 1 X in the output? Or what are the rules for removing the duplicates?

I can make a guess as to a possible rule for removing duplicates but you really don't what to rely on guessing. Provide the rule(s).

I may be going on a bit on this topic but the clearer you state requirements the better a solution will be.

 

I know and easy way to do that but it involves changing the order of the data. You did not address the question is there an expected order requirement.

 

Now lets ask another question: After you use this combined data will you need to rematch the same "controls" to the "same cases"?

 

Tom
Super User Tom
Super User

Does NOT look like a TRANSPOSE at all.

data have;
  infile datalines truncover;
  input study_id control_id study_duration control_duration;
datalines;
1096503 737477001 141 1038
1096503 1350295201 141 1069
1096503 2661039302 141 640
1096503 33596554301 141 415
15101702 2266821602 107 949
15101702 2844664901 107 1078
15101702 4272672001 107 810
15101702 33155637601 107 712
27264303 1324920304 36 40
27264303 3409397803 274 448
27264303 3955249504 274 340
27264303 4859531601 36 123
;;;;

data want;
  set have(in=in1) have(in=in2);
  by study_id ;
  if in1 then do;
    if first.study_id;
    id=study_id;
    flag=1;
    duration=study_duration;
  end;
  else do;
    id=control_id;
    flag=0;
    duration=control_duration;
  end;
  keep id flag duration;
run;

Or you can do the something reading the dataset only once by adding some OUTPUT statements.

data want;
  set have;
  by study_id;
  flag = first.study_id;
  if flag then output;
  flag=0;
  study_id=control_id;
  study_duration=control_duration;
  output;
  keep study_id flag study_duration;
  rename study_id=id study_duration=duration;
run;

Result

Obs             id    flag    duration

  1        1096503      1        141
  2      737477001      0       1038
  3     1350295201      0       1069
  4     2661039302      0        640
  5    33596554301      0        415
  6       15101702      1        107
  7     2266821602      0        949
  8     2844664901      0       1078
  9     4272672001      0        810
 10    33155637601      0        712
 11       27264303      1         36
 12     1324920304      0         40
 13     3409397803      0        448
 14     3955249504      0        340
 15     4859531601      0        123

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 794 views
  • 0 likes
  • 3 in conversation