BookmarkSubscribeRSS Feed
christinagting0
Quartz | Level 8

Hi everyone, 

 

I am new to SAS and just learning from scratch.

 

I need help understanding how to collapse my data. I have a data set with a number of observations with a unique ID number. However, these observations may be entered more than once in the dataset becuase they have a variable of interest. So for example, observation #1234 might be listed 5 times in the data set because they have the colour blue, short hair, height x, black hair and eat meat. What I want to do is collapse my data so that I only have one observation for #1234 but be able to see all the variables that they have in one line instead of 5 separate entries. 

 

How can I do this?

 

thanks!

4 REPLIES 4
Astounding
PROC Star

Note that once you do this, you may lose some data.  It would be possible, for example, that HAIR would be different on different original observations.  But you will end up with only one value after the collapse.

 

That being said, here's a way to approach this.  Assuming you have a variable named ID:

 

proc sort data=have;

by id;

run;

 

data want;

update have (obs=0) have;

by id;

run;

 

The output data set should be just what you are looking for.

 

Good luck.

christinagting0
Quartz | Level 8

Hi Astounding, 

 

thanks for your reply. Do you think you could explain your code a bit further so I can understand what is happening?

 

I understand the proc sort, but I"m a little confused as to what the obs=0 line is saying. When I googled it it says that obs=0 creates an empty data set that has the structure, but not the attributes, of another data set.

 

i'm a little confused how your code collapses data?

 

thank you!!!

Astounding
PROC Star

When creating an output data set, OBS=0 creates an empty data set.  But when reading an incoming data set, OBS=0 reads zero observations from the data set.  That's important here, because the UPDATE statement requires two datasets:  a master data set, and a set of transactions to apply to that master data set.  In effect, the UPDATE statement in this program is saying that the master data set contains zero observations, and all the data is coming from the transaction data set.

 

UPDATE is not necessarily where you would start learning about SAS.  It just happens to be a good tool for this particular job.  It automatically ignores missing values, and automatically outputs one observation for each value of the BY variable (after all the transactions have been applied).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1892 views
  • 2 likes
  • 3 in conversation