BookmarkSubscribeRSS Feed
queso
Calcite | Level 5

Here are two data sets:

data1:                               

         

data2:

As you can see, data2 has a new observation for each time a drug was given to each patient (including duplicates). i.e. Patient 1 received 2 unique drugs (a3, s1).

What I need as an end result is a merged data set by patient_id with indicator variables of whether or not the patient received any of the available drugs.  It should look like this:

data3:

What I first did in attempt to solve the problem was to add these indicator variables onto data2 which looks like this:

data4:

Now I need to consolidate all indicators into one line per patient, but can't figure out how to do so.  Does anyone know what my next step could be?  Or another approach starting with the initial two data sets?  Thanks.

10 REPLIES 10
data_null__
Jade | Level 19

Why not just find MAX of all the indicator variables for each patient_id

proc means noprint;

   by patient_id;

   output out=unique max=;

   run;

Haikuo
Onyx | Level 15

Ok, this gives you the look, but if you want those new chars to be numeric, then you need to convert them:

data have;

     input id drug :$8.;

     cards;

1 a3

1 a3

1 s1

2 a2

2 s2

2 s2

2 s3

2 s1

;

proc transpose data=have out=h1 (drop=_name_) let;

     by id;

     var drug;

     id drug;

run;

data want;

     set h1;

     array char _character_;

     do over char;

           char=ifc(missing(char),'0','1');

     end;

run;

kaade
SAS Employee

I would stay away from do over;. it is a very old technique and is not recommended. I was surprised it still worked.

Also, you should not use array names that are the same as function names such as char.

I suggest adding a proc sort step to eliminate duplicate ids with the same drug, as proc transpose issues warnings when I ran your code .The warnings were about the same  id values occurring in the same group.

Your final code would be:

data have;
     input id drug :$8.;
     cards;
1 a3
1 a3
1 s1
2 a2
2 s2
2 s2
2 s3
2 s1
;

proc sort data=haveout=uniquehave nodups;
    by id drug;
run;

proc transpose data=uniquehave out=h1(drop=_name_);
     by id;

    var drug;
    id drug;
run;

data want;
    set h1;
     array mychar _character_;
     do i=1 to dim(mychar);
           mychar{i}=ifc(missing(mychar{i}),'0','1');
     end;
run;

Haikuo
Onyx | Level 15

Are you implicating I am a senior citizen? Smiley Wink

Kidding aside, if the only reason for not using 'do over' is being OLD, it sounds a little shaky, doesn't? Besides, similar 'do over' has be resurrected in the 9.4 Hash objects?

About Char array, I admit I dropped the ball there, it was not a good practice to use char, a SAS reserved function name as an array name.

I am aware of the warning in Transpose, and the 'let' option is invented for that purpose, so we can skip one dedup step, depending on the size of the data, sometimes it worths the warning for the gaining on efficiency.

my2cents,

Haikuo

art297
Opal | Level 21

As a senior citizen, I resemble that remark! I agree that do over isn't in the documentation other than the fact that they state that it isn't supported in SCL.

However, more interestingly, it is used in a number of the 9.4 examples. e.g.:

http://support.sas.com/documentation/onlinedoc/stat/ex_code/121/modstyex.html
http://support.sas.com/...


BurntDirt
Calcite | Level 5

Since you've already created Data4, just do what John says.

Another option would be to do something like

data data2;
    set data2;
    flg=1;
run;

/* something to collapse the data */

proc summary nway missing data=data2;
    class _all_;
    output out=data2a(drop=_:);

proc transpose data=data2a
               out=data3(drop=_name_);
    by patient_id;
    var flg;
    id drug;
run;

This will give you missing and 1, then you'll have to set the missing to 0 if you really need them.

queso
Calcite | Level 5

Thank you all for the help.  Problem has been solved and moving forward.

Cynthia_sas
SAS Super FREQ

Hi:

  For future folks reading this post, an alternate solution would have been to use PROC REPORT or PROC TABULATE on the DRUG2 dataset directly, without needing a PROC SUMMARY or a PROC TRANSPOSE.

Cynthia

data data2;
  infile datalines;
  input ptid drug $;
return;
datalines;
1 a3
1 a3
1 s1
2 a2
2 s2
2 s2
2 s3
2 s1
3 s2
3 b1
3 s2
3 s2
3 s3
3 s3
3 s1
;
run;

    

options missing=0;
ods listing close;
ods html file='c:\temp\drug_report.html';

proc report data=data2 split='*';
  title 'Use PROC REPORT';
  column ptid drug n;
  define ptid / group style(column)=Header;
  define drug / across;
  define n / 'Patient*Total';
  rbreak after / summarize style=Header;
run;

     

proc tabulate data=data2 f=6.;
  title 'Use PROC TABULATE';
  class ptid drug;
  table ptid all*{s=Header},
        drug*n all;
  keylabel all='Total'
            n=' ';
run;
ods html close;


compare_report_tabulate.png
BurntDirt
Calcite | Level 5

Hi Cynthia,

I liked your example, but he didn't want to total them up.

He just wanted indicator variables for them (1=exist; 0=not exist) and he was asking for a dataset.

Could be a design matrix or some other reason for needing these in a dataset.

art297
Opal | Level 21

Queso: Since you still haven't marked this question as "answered", here is an alternative solution.

Of course, since you have already created your data4 file, data_null_'s proposal is all that you need.

However, if not, the following wouldn't require much more than what you did to create your data4 file.

data data1;

  input patient_id;

  cards;

1

2

3

4

;

data data2;

  input patient_id drug $;

  cards;

1 a3

1 a3

1 s1

2 a2

2 s2

2 s2

2 s3

2 s1

3 s2

3 b1

3 s2

3 s2

3 s3

3 s3

3 s1

;

proc transpose data=data2 out=data3 (drop=_name_) let prefix=_;

     by patient_id;

     var drug;

     id drug;

run;

data want (drop=_:);

  merge data1 data3;

  array _drugs(*) _:;

  array drugs(*) a1 a2 a3 b1 s1 s2 s3;

  by patient_id;

  do _i=1 to dim(drugs);

    drugs(_i)=0;

  end;

  do _i=1 to dim(_drugs);

    if      _drugs(_i)='a1' then drugs(1)=1;

    else if _drugs(_i)='a2' then drugs(2)=1;

    else if _drugs(_i)='a3' then drugs(3)=1;

    else if _drugs(_i)='b1' then drugs(4)=1;

    else if _drugs(_i)='s1' then drugs(5)=1;

    else if _drugs(_i)='s2' then drugs(6)=1;

    else if _drugs(_i)='s3' then drugs(7)=1;

  end;

run;

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2049 views
  • 3 likes
  • 7 in conversation