I'm trying to un-merge a dataset. More specifically, I have a dataset with variables:
PersonID
EventID (there can be multiple events per person)
a number of person-level variables that are constant within personID
a number of event-level variables that can vary within personID
I would like to end up with a person-level file, with one record per personID, including all the person-level variables. I could do this manually by just selecting on first.personID and keeping the person-level variables. But there are a large number of variables, and it is not obvious which are person and which are event level. Is there an easy way to identify which variables are fixed within personID, and keep only these in the output file. (They are a mix of numeric and character).
Do you have the code which merged them in the first place? What if there were variables common to both? If you don't have the original merge code then intelligent guesswork is pretty much your only option.
In that case I would go with that assumption to split the data. Something like this might help:
data person;
set PersonEvent;
by PersonID;
if first.PersonID;
run;
I'll use that for now. Would be nice if there was a way of flagging which variables are constant within personID so I could just keep them - but for now I'll get by with just using the first value of each variable.
First pass is to use NLEVELS option on PROC FREQ.
proc freq data=HAVE nlevels;
tables _all_ / noprint;
run;
You could then check in more detail any variable that has the same number of levels as the PERSONID variable.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.