BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
stancemcgraw
Obsidian | Level 7

Hi all,

 I'm trying to transpose my long data to wide data, but both variables are character variables. I'd like to count the number of drugs identified per study-ID. Some of the study-IDs also have more than 1 drug found. I also want to create a variable that is the total number of drugs per study-ID after its been transposed:

 

Data have:

Study-ID     Drugs_found

ID-1-RMC        Opioid

ID-2-RMC         None

ID-3-RMC         Benzo

ID-4-RMC        Cocaine

ID-4-RMC        Opioid

ID-5-RMC         Meth

 

Data want:

Study-ID          OPIOIDS    METH    NONE    COCAINE    BENZOS      Total_drugs

ID-1-RMC            1                 0            0               0                  0                  1

ID-2-RMC            0                 0            0                0                 0                   0

ID-3-RMC            0                 0            0                0                 1                   1

ID-4-RMC            1                 0            0                 1                0                  2

ID-5-RMC             0                1            0                 0                0                   1

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data long ;
set have;

count = 1;
run;

proc transpose data=long out=wide prefix=DRUG_;
by studyID;
ID Drugs_Found;
IDLABEL drugs_found;
var Count;
run;

data want;
set wide;
array _drugs(*) drugs_:;
do i=1 to dim(_drugs);
    _drugs (i) = coalesce(_drugs(i), 0);
end;

#calculate total drugs;
total_drugs = sum(of drugs_:);

run;

PROC TRANSPOSE works well. If all you need is the output though, PROC TABULATE or PROC REPORT maybe a better option.

 

View solution in original post

7 REPLIES 7
Reeza
Super User
data long ;
set have;

count = 1;
run;

proc transpose data=long out=wide prefix=DRUG_;
by studyID;
ID Drugs_Found;
IDLABEL drugs_found;
var Count;
run;

data want;
set wide;
array _drugs(*) drugs_:;
do i=1 to dim(_drugs);
    _drugs (i) = coalesce(_drugs(i), 0);
end;

#calculate total drugs;
total_drugs = sum(of drugs_:);

run;

PROC TRANSPOSE works well. If all you need is the output though, PROC TABULATE or PROC REPORT maybe a better option.

 

stancemcgraw
Obsidian | Level 7

Thanks Reeza,

 

  The first part worked, but the array came back in my log as the following error:

 

 

74  data want;

475  set wide;

476  array _drugs(*) drugs_:;

WARNING: Defining an array with zero elements.

477  do i=1 to dim(_drugs);

478      _drugs (i) = coalesce(_drugs(i), 0);

479  end;

480

481  **#calculate total drugs;

482  total_drugs = sum(of drugs_:);

                   ---

                   71

ERROR 71-185: The SUM function call does not have enough arguments.

 

483

484  run;

 

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0

         observations and 15 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

      real time           0.06 seconds

      cpu time            0.04 seconds

Reeza
Super User
Show the full code. The issues isn't with this step it's with a preceding step.
ballardw
Super User

@stancemcgraw wrote:

Thanks Reeza,

 

  The first part worked, but the array came back in my log as the following error:

 

 

74  data want;

475  set wide;

476  array _drugs(*) drugs_:;

WARNING: Defining an array with zero elements.

477  do i=1 to dim(_drugs);

478      _drugs (i) = coalesce(_drugs(i), 0);

479  end;

480

481  **#calculate total drugs;

482  total_drugs = sum(of drugs_:);

                   ---

                   71

ERROR 71-185: The SUM function call does not have enough arguments.

 

483

484  run;

 

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.WANT may be incomplete.  When this step was stopped there were 0

         observations and 15 variables.

WARNING: Data set WORK.WANT was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

      real time           0.06 seconds

      cpu time            0.04 seconds


This bit of your code:

476  array _drugs(*) drugs_:;
WARNING: Defining an array with zero elements.

almost certainly means that your data set Wide has no variables whose names start with "drugs_".

And no existing variables would be the cause of the error using the Sum function to sum the nonexistent.

stancemcgraw
Obsidian | Level 7
So I am trying to this today with another character variable, comorbidities, and it's giving me the following error when I run the proc transpose step
(proc sort data=long; by mitchid; run;
proc transpose data=long out=wide prefix=comorbid_;
by mitchID;
ID comorbidities;
IDLABEL comorbidities;
var Count;
run;)
"ERROR: The ID value "comorbid_CVA" occurs twice in the same BY group.
ERROR: The ID value "comorbid_DEMENTIA" occurs twice in the same BY group.
ERROR: The ID value "comorbid_DEPENDENT_HLTH" occurs twice in the same BY group.
ERROR: The ID value "comorbid_HYPERTENSION" occurs twice in the same BY group.
ERROR: The ID value "comorbid_CHEMOTHERAPY" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
MITCHID=
WARNING: 1647 observations omitted due to missing ID values.
WARNING: 1 BY groups omitted due to earlier errors.
ballardw
Super User

It is best to provide data in form of a data step.

Try this:

data have;
  input StudyID $    Drugs_found $;
datalines;
ID-1-RMC        Opioid
ID-2-RMC         None
ID-3-RMC         Benzo
ID-4-RMC        Cocaine
ID-4-RMC        Opioid
ID-5-RMC         Meth
;

proc summary data=have nway completetypes;
   class StudyID drugs_found ;
   output out=work.summary (drop=_type_ rename=(_freq_=count))
   ;
run;

proc transpose data=work.summary
   out=work.trans (drop=_name_);
   by StudyID;
   var count;
   id drugs_found;
run;

 

The summary step COMPLETETYPES makes sure that all of the combinations of the class variables that appear are in the output data. The NWAY option only gives us the summaries of both class variables, other wise there would also be summaries overall and for each of the variables only, which is useful sometimes.

Without any VAR variable all we get is a _freq_ variable that I renamed count just for fun.

The default behavior of Proc Summary will sort by the values of the class variables so the output is sorted by StudyId and ready for Proc transpose.

stancemcgraw
Obsidian | Level 7
This worked too, thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 997 views
  • 2 likes
  • 3 in conversation