06-20-2016 09:30 AM
I am new to SAS and just learning from scratch.
I need help understanding how to collapse my data. I have a data set with a number of observations with a unique ID number. However, these observations may be entered more than once in the dataset becuase they have a variable of interest. So for example, observation #1234 might be listed 5 times in the data set because they have the colour blue, short hair, height x, black hair and eat meat. What I want to do is collapse my data so that I only have one observation for #1234 but be able to see all the variables that they have in one line instead of 5 separate entries.
How can I do this?
06-20-2016 09:38 AM
Look at proc transpose. It is the tool for exactly this kind of operation.
06-20-2016 09:43 AM
Note that once you do this, you may lose some data. It would be possible, for example, that HAIR would be different on different original observations. But you will end up with only one value after the collapse.
That being said, here's a way to approach this. Assuming you have a variable named ID:
proc sort data=have;
update have (obs=0) have;
The output data set should be just what you are looking for.
06-20-2016 09:53 AM
thanks for your reply. Do you think you could explain your code a bit further so I can understand what is happening?
I understand the proc sort, but I"m a little confused as to what the obs=0 line is saying. When I googled it it says that obs=0 creates an empty data set that has the structure, but not the attributes, of another data set.
i'm a little confused how your code collapses data?
06-20-2016 10:27 AM
When creating an output data set, OBS=0 creates an empty data set. But when reading an incoming data set, OBS=0 reads zero observations from the data set. That's important here, because the UPDATE statement requires two datasets: a master data set, and a set of transactions to apply to that master data set. In effect, the UPDATE statement in this program is saying that the master data set contains zero observations, and all the data is coming from the transaction data set.
UPDATE is not necessarily where you would start learning about SAS. It just happens to be a good tool for this particular job. It automatically ignores missing values, and automatically outputs one observation for each value of the BY variable (after all the transactions have been applied).