Solved: Re: Transpose character data wide

stancemcgraw · Posted 08-06-2020 05:22 PM

Hi all,

I'm trying to transpose my long data to wide data, but both variables are character variables. I'd like to count the number of drugs identified per study-ID. Some of the study-IDs also have more than 1 drug found. I also want to create a variable that is the total number of drugs per study-ID after its been transposed:

Data have:

Study-ID Drugs_found

ID-1-RMC Opioid

ID-2-RMC None

ID-3-RMC Benzo

ID-4-RMC Cocaine

ID-4-RMC Opioid

ID-5-RMC Meth

Data want:

Study-ID OPIOIDS METH NONE COCAINE BENZOS Total_drugs

ID-1-RMC 1 0 0 0 0 1

ID-2-RMC 0 0 0 0 0 0

ID-3-RMC 0 0 0 0 1 1

ID-4-RMC 1 0 0 1 0 2

ID-5-RMC 0 1 0 0 0 1

Reeza · Posted 08-06-2020 06:02 PM

data long ;
set have;

count = 1;
run;

proc transpose data=long out=wide prefix=DRUG_;
by studyID;
ID Drugs_Found;
IDLABEL drugs_found;
var Count;
run;

data want;
set wide;
array _drugs(*) drugs_:;
do i=1 to dim(_drugs);
    _drugs (i) = coalesce(_drugs(i), 0);
end;

#calculate total drugs;
total_drugs = sum(of drugs_:);

run;

PROC TRANSPOSE works well. If all you need is the output though, PROC TABULATE or PROC REPORT maybe a better option.

View solution in original post

Reeza · Posted 08-06-2020 06:02 PM

data long ;
set have;

count = 1;
run;

proc transpose data=long out=wide prefix=DRUG_;
by studyID;
ID Drugs_Found;
IDLABEL drugs_found;
var Count;
run;

data want;
set wide;
array _drugs(*) drugs_:;
do i=1 to dim(_drugs);
    _drugs (i) = coalesce(_drugs(i), 0);
end;

#calculate total drugs;
total_drugs = sum(of drugs_:);

run;

PROC TRANSPOSE works well. If all you need is the output though, PROC TABULATE or PROC REPORT maybe a better option.

stancemcgraw · Posted 08-06-2020 06:16 PM

Thanks Reeza,

The first part worked, but the array came back in my log as the following error:

74 data want;

475 set wide;

476 array _drugs(*) drugs_:;

WARNING: Defining an array with zero elements.

477 do i=1 to dim(_drugs);

478 _drugs (i) = coalesce(_drugs(i), 0);

479 end;

480

481 **#calculate total drugs;

482 total_drugs = sum(of drugs_:);

---

71

ERROR 71-185: The SUM function call does not have enough arguments.

483

484 run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0

observations and 15 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.06 seconds

cpu time 0.04 seconds

Reeza · Posted 08-06-2020 06:38 PM

Show the full code. The issues isn't with this step it's with a preceding step.

ballardw · Posted 08-07-2020 05:04 AM

@stancemcgraw wrote:

Thanks Reeza,

The first part worked, but the array came back in my log as the following error:

74 data want;

475 set wide;

476 array _drugs(*) drugs_:;

WARNING: Defining an array with zero elements.

477 do i=1 to dim(_drugs);

478 _drugs (i) = coalesce(_drugs(i), 0);

479 end;

480

481 **#calculate total drugs;

482 total_drugs = sum(of drugs_:);

   ---

   71

ERROR 71-185: The SUM function call does not have enough arguments.

483

484 run;

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0

   observations and 15 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.06 seconds

cpu time 0.04 seconds

This bit of your code:

476  array _drugs(*) drugs_:;
WARNING: Defining an array with zero elements.

almost certainly means that your data set Wide has no variables whose names start with "drugs_".

And no existing variables would be the cause of the error using the Sum function to sum the nonexistent.

stancemcgraw · Posted 08-09-2020 03:17 PM

So I am trying to this today with another character variable, comorbidities, and it's giving me the following error when I run the proc transpose step
(proc sort data=long; by mitchid; run;
proc transpose data=long out=wide prefix=comorbid_;
by mitchID;
ID comorbidities;
IDLABEL comorbidities;
var Count;
run;)
"ERROR: The ID value "comorbid_CVA" occurs twice in the same BY group.
ERROR: The ID value "comorbid_DEMENTIA" occurs twice in the same BY group.
ERROR: The ID value "comorbid_DEPENDENT_HLTH" occurs twice in the same BY group.
ERROR: The ID value "comorbid_HYPERTENSION" occurs twice in the same BY group.
ERROR: The ID value "comorbid_CHEMOTHERAPY" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
MITCHID=
WARNING: 1647 observations omitted due to missing ID values.
WARNING: 1 BY groups omitted due to earlier errors.

ballardw · Posted 08-06-2020 06:04 PM

It is best to provide data in form of a data step.

Try this:

data have;
  input StudyID $    Drugs_found $;
datalines;
ID-1-RMC        Opioid
ID-2-RMC         None
ID-3-RMC         Benzo
ID-4-RMC        Cocaine
ID-4-RMC        Opioid
ID-5-RMC         Meth
;

proc summary data=have nway completetypes;
   class StudyID drugs_found ;
   output out=work.summary (drop=_type_ rename=(_freq_=count))
   ;
run;

proc transpose data=work.summary
   out=work.trans (drop=_name_);
   by StudyID;
   var count;
   id drugs_found;
run;

The summary step COMPLETETYPES makes sure that all of the combinations of the class variables that appear are in the output data. The NWAY option only gives us the summaries of both class variables, other wise there would also be summaries overall and for each of the variables only, which is useful sometimes.

Without any VAR variable all we get is a _freq_ variable that I renamed count just for fun.

The default behavior of Proc Summary will sort by the values of the class variables so the output is sorted by StudyId and ready for Proc transpose.

stancemcgraw · Posted 08-07-2020 11:34 AM

This worked too, thank you!

SAS Innovate 2025: Register Today!

SAS Training: Just a Click Away