DATA Step, Macro, Functions and more

Need help changing data structure

Reply
New Contributor
Posts: 2

Need help changing data structure

Here are two data sets:

data1:                               

         

data2:

As you can see, data2 has a new observation for each time a drug was given to each patient (including duplicates). i.e. Patient 1 received 2 unique drugs (a3, s1).

What I need as an end result is a merged data set by patient_id with indicator variables of whether or not the patient received any of the available drugs.  It should look like this:

data3:

What I first did in attempt to solve the problem was to add these indicator variables onto data2 which looks like this:

data4:

Now I need to consolidate all indicators into one line per patient, but can't figure out how to do so.  Does anyone know what my next step could be?  Or another approach starting with the initial two data sets?  Thanks.

Respected Advisor
Posts: 3,777

Re: Need help changing data structure

Why not just find MAX of all the indicator variables for each patient_id

proc means noprint;

   by patient_id;

   output out=unique max=;

   run;

Respected Advisor
Posts: 3,124

Re: Need help changing data structure

Ok, this gives you the look, but if you want those new chars to be numeric, then you need to convert them:

data have;

     input id drug :$8.;

     cards;

1 a3

1 a3

1 s1

2 a2

2 s2

2 s2

2 s3

2 s1

;

proc transpose data=have out=h1 (drop=_name_) let;

     by id;

     var drug;

     id drug;

run;

data want;

     set h1;

     array char _character_;

     do over char;

           char=ifc(missing(char),'0','1');

     end;

run;

SAS Employee
Posts: 7

Re: Need help changing data structure

I would stay away from do over;. it is a very old technique and is not recommended. I was surprised it still worked.

Also, you should not use array names that are the same as function names such as char.

I suggest adding a proc sort step to eliminate duplicate ids with the same drug, as proc transpose issues warnings when I ran your code .The warnings were about the same  id values occurring in the same group.

Your final code would be:

data have;
     input id drug :$8.;
     cards;
1 a3
1 a3
1 s1
2 a2
2 s2
2 s2
2 s3
2 s1
;

proc sort data=haveout=uniquehave nodups;
    by id drug;
run;

proc transpose data=uniquehave out=h1(drop=_name_);
     by id;

    var drug;
    id drug;
run;

data want;
    set h1;
     array mychar _character_;
     do i=1 to dim(mychar);
           mychar{i}=ifc(missing(mychar{i}),'0','1');
     end;
run;

Respected Advisor
Posts: 3,124

Re: Need help changing data structure

Are you implicating I am a senior citizen? Smiley Wink

Kidding aside, if the only reason for not using 'do over' is being OLD, it sounds a little shaky, doesn't? Besides, similar 'do over' has be resurrected in the 9.4 Hash objects?

About Char array, I admit I dropped the ball there, it was not a good practice to use char, a SAS reserved function name as an array name.

I am aware of the warning in Transpose, and the 'let' option is invented for that purpose, so we can skip one dedup step, depending on the size of the data, sometimes it worths the warning for the gaining on efficiency.

my2cents,

Haikuo

PROC Star
Posts: 7,364

Re: Need help changing data structure

As a senior citizen, I resemble that remark! I agree that do over isn't in the documentation other than the fact that they state that it isn't supported in SCL.

However, more interestingly, it is used in a number of the 9.4 examples. e.g.:

http://support.sas.com/documentation/onlinedoc/stat/ex_code/121/modstyex.html
http://support.sas.com/...


Occasional Contributor
Posts: 15

Re: Need help changing data structure

Since you've already created Data4, just do what John says.

Another option would be to do something like

data data2;
    set data2;
    flg=1;
run;

/* something to collapse the data */

proc summary nway missing data=data2;
    class _all_;
    output out=data2a(drop=_Smiley Happy;

proc transpose data=data2a
               out=data3(drop=_name_);
    by patient_id;
    var flg;
    id drug;
run;

This will give you missing and 1, then you'll have to set the missing to 0 if you really need them.

New Contributor
Posts: 2

Re: Need help changing data structure

Thank you all for the help.  Problem has been solved and moving forward.

SAS Super FREQ
Posts: 8,743

Re: Need help changing data structure

Hi:

  For future folks reading this post, an alternate solution would have been to use PROC REPORT or PROC TABULATE on the DRUG2 dataset directly, without needing a PROC SUMMARY or a PROC TRANSPOSE.

Cynthia

data data2;
  infile datalines;
  input ptid drug $;
return;
datalines;
1 a3
1 a3
1 s1
2 a2
2 s2
2 s2
2 s3
2 s1
3 s2
3 b1
3 s2
3 s2
3 s3
3 s3
3 s1
;
run;

    

options missing=0;
ods listing close;
ods html file='c:\temp\drug_report.html';

proc report data=data2 split='*';
  title 'Use PROC REPORT';
  column ptid drug n;
  define ptid / group style(column)=Header;
  define drug / across;
  define n / 'Patient*Total';
  rbreak after / summarize style=Header;
run;

     

proc tabulate data=data2 f=6.;
  title 'Use PROC TABULATE';
  class ptid drug;
  table ptid all*{s=Header},
        drug*n all;
  keylabel all='Total'
            n=' ';
run;
ods html close;


compare_report_tabulate.png
Occasional Contributor
Posts: 15

Re: Need help changing data structure

Hi Cynthia,

I liked your example, but he didn't want to total them up.

He just wanted indicator variables for them (1=exist; 0=not exist) and he was asking for a dataset.

Could be a design matrix or some other reason for needing these in a dataset.

PROC Star
Posts: 7,364

Re: Need help changing data structure

Queso: Since you still haven't marked this question as "answered", here is an alternative solution.

Of course, since you have already created your data4 file, data_null_'s proposal is all that you need.

However, if not, the following wouldn't require much more than what you did to create your data4 file.

data data1;

  input patient_id;

  cards;

1

2

3

4

;

data data2;

  input patient_id drug $;

  cards;

1 a3

1 a3

1 s1

2 a2

2 s2

2 s2

2 s3

2 s1

3 s2

3 b1

3 s2

3 s2

3 s3

3 s3

3 s1

;

proc transpose data=data2 out=data3 (drop=_name_) let prefix=_;

     by patient_id;

     var drug;

     id drug;

run;

data want (drop=_Smiley Happy;

  merge data1 data3;

  array _drugs(*) _:;

  array drugs(*) a1 a2 a3 b1 s1 s2 s3;

  by patient_id;

  do _i=1 to dim(drugs);

    drugs(_i)=0;

  end;

  do _i=1 to dim(_drugs);

    if      _drugs(_i)='a1' then drugs(1)=1;

    else if _drugs(_i)='a2' then drugs(2)=1;

    else if _drugs(_i)='a3' then drugs(3)=1;

    else if _drugs(_i)='b1' then drugs(4)=1;

    else if _drugs(_i)='s1' then drugs(5)=1;

    else if _drugs(_i)='s2' then drugs(6)=1;

    else if _drugs(_i)='s3' then drugs(7)=1;

  end;

run;

Ask a Question
Discussion stats
  • 10 replies
  • 432 views
  • 3 likes
  • 7 in conversation