Help using Base SAS procedures

Combining multiple rows & counting

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

Combining multiple rows & counting

[ Edited ]

Hi, I'm new to SAS, and apologize if this question has been asked elsewhere.  I have been searching previous messages for this type of work and couldn't find what I wanted.  I have a dataset that contains multiple rows per patient.  I'd like to combine all the rows for each patient, and also create an additional column with a count of how many rows were combined. 

 

So, for example, I'm starting with a dataset that looks like this:

patient          fruit          color

patientA          apple          red

patientA          banana          red

patientA          mango          red

patientB          apple          blue

patientC          cherry          green

patientC          grape          green.

 

And I want to reorganize into a dataset that looks something like this:

patient          count          fruit_1          fruit_2          fruit_3          color

patientA          3          apple          banana          mango          red

patientB          1          apple          (blank)          (blank)          blue

patientC          2          cherry          grape          (blank)          green.

 

Thank you in advance for your help.  I have been struggling with this for days, trying combinations of different lines I've been piecing together from searches.


Accepted Solutions
Solution
‎01-22-2016 05:15 PM
Trusted Advisor
Posts: 1,115

Re: Combining multiple rows & counting

Try this:

data have;
input patient $ fruit $ color $;
cards;
patientA apple  red
patientA banana red
patientA mango  red
patientB apple  blue
patientC cherry green
patientC grape  green
;

proc transpose data=have prefix=fruit_ out=trans(drop=_name_);
by patient color;
var fruit;
run;

proc freq data=have noprint;
tables patient / out=cnt(drop=percent);
run;

data want;
merge trans cnt;
by patient;
run;
      
proc print data=want;
var patient count fruit_: color;
run;

View solution in original post


All Replies
Super User
Posts: 17,819

Re: Combining multiple rows & counting

You're looking to transpose your data set. You can use proc transpose or a data step and then you can count the number of missing/non missing to get your number of rows.

 

Here's two links with examples of transposing your data from a wide to a long format. 

 

http://www.ats.ucla.edu/stat/sas/modules/ltow_transpose.htm

 

http://www.ats.ucla.edu/stat/sas/modules/longtowide_data.htm

Contributor
Posts: 25

Re: Combining multiple rows & counting

Thanks so much for your fast reply, Reeza.  Sorry if this is a naive question, but how do this if my data is already in SAS as a SAS spreadsheet?  In the example you linked for the transposition, I would be typing in data as part of the command(s), right?  My dataset is really big (like 10s of columns), so I don't want to be reentering the data.

Trusted Advisor
Posts: 1,115

Re: Combining multiple rows & counting

[ Edited ]

I've included the first data step just to create a test dataset for demonstration. With your existing dataset you would start at the PROC TRANSPOSE step.

 

The work dataset TRANS created by PROC TRANSPOSE already contains everything you want, except for variable COUNT. The frequency counts are created by PROC FREQ and written to the output dataset of this procedure, which I named CNT (and which originally contained an additional variable PERCENT, but I dropped this one, because you didn't request percentages).

 

Variable COUNT and the variables from dataset TRANS are merged in the data step creating dataset WANT, using the common variable PATIENT (i.e. common to TRANS and CNT) as a key to identify matching observations.

 

Finally, I used PROC PRINT just to show the final result. The purpose of the VAR statement was to obtain the same column order in the output as in your example table. The internal order of the variables in the dataset is left unchanged (and irrelevant in most cases).

Super User
Posts: 17,819

Re: Combining multiple rows & counting

Working of @FreelanceReinhard solution, but his last three steps can be simplified into one step. 

 

data have;
input patient $ fruit $ color $;
cards;
patientA apple  red
patientA banana red
patientA mango  red
patientB apple  blue
patientC cherry green
patientC grape  green
;

proc transpose data=have prefix=fruit_ out=trans(drop=_name_);
by patient color;
var fruit;
run;

data want;
set trans;
array fr(*) $ fruit_:;

count=dim(fr)-cmiss(of fr(*));

run;
Trusted Advisor
Posts: 1,115

Re: Combining multiple rows & counting

@Reeza: Would your simplification produce the same result as my code, if there were missing values of FRUIT in the HAVE dataset?

Contributor
Posts: 25

Re: Combining multiple rows & counting

Thank you, FreelanceReinhard.  How should I change the syntax, if I decide to remove the "color" variable from my dataset?  (E.g., will the : followed by a ; cause problems?)

Super User
Posts: 17,819

Re: Combining multiple rows & counting

Drop color from proc transpose and everything should be fine. Unless you have multiples of your fruits...then you'll have an issue.

Trusted Advisor
Posts: 1,115

Re: Combining multiple rows & counting

The ":;" does not cause any problems. So, please feel free to delete "color" from the PROC TRANSPOSE and the PROC PRINT step.

 

@Reeza: What do you mean by "multiples of your fruits"? Duplicate values?

Super User
Posts: 17,819

Re: Combining multiple rows & counting

@FreelanceReinhard Yes to duplicates.


Missing at the data level may cause issues but I'm assuming they could easily be filtered out with a where clause. 

Trusted Advisor
Posts: 1,115

Re: Combining multiple rows & counting

Maybe I'm too tired to see, but how would duplicate FRUIT values (or any other content of this variable, for that matter) interfere with the transposing?

Super User
Posts: 17,819

Re: Combining multiple rows & counting

[ Edited ]

@FreelanceReinhard Nope, I'm too tired....I was thinking of the case when the fruit became the header, which isn't occuring in this case. So there's no issue and your code is correct Smiley Happy

Contributor
Posts: 25

Re: Combining multiple rows & counting

Hi, FreelanceReinhard (or others).  I think I might be encountering the problem you & Reeza discuss here.  If I have different values of color for any 1 patient, how would you recommend handling the situation?  In other words, is there a way to transpose multiple variables simultaneously (i.e., create new fruit_n AND create color_n columns)?

 

Trusted Advisor
Posts: 1,115

Re: Combining multiple rows & counting

[ Edited ]

Hi @beginner,

 

Yes, there are several different approaches to transpose more than one variable. I learned (what I found to be) the most elegant one -- using PROC SUMMARY's IDGROUP option -- only ten days ago from @Ksharp's posting here:

https://communities.sas.com/t5/General-SAS-Programming/getting-multiple-rows-of-dates-for-a-subject-....

I can write up the code if you like.

 

Edit:

Here it is (borrowing the PROC SQL idea again from @Ksharp, now from the current thread):

data have;
input patient $ fruit $ color $;
cards;
patientA apple  red
patientA banana pink
patientA mango  magenta
patientB apple  blue
patientC cherry green
patientC grape  lime
;

proc sql noprint;
select max(n) into :n
from (select count(*) as n from have group by patient);
quit;

proc summary data=have;
by patient;
output out=want(drop=_type_ rename=(_freq_=count)) idgroup(out[&n] (fruit color)=);
run;

proc print data=want;
run;

As an additional benefit, PROC SUMMARY computes the frequencies by default (in variable _FREQ_). So, we need neither PROC TRANSPOSE for the transposing nor PROC FREQ for the counting.

Contributor
Posts: 25

Re: Combining multiple rows & counting

[ Edited ]

Hi @FreelanceReinhard.  Thanks for yet another fast & super helpful reply.  Just tried the @Ksharp script you pointed me to, and wow, this is great!!  Thank you so much!  Sorry for what is likely a dense question, but for this new scripting, where would I input the names of columns that I do not want to split into n columns (columns that have the same value for all n entries, for any 1 patient)?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 21 replies
  • 665 views
  • 0 likes
  • 5 in conversation