Re: un-merging a dataset

BruceBrad · Posted 02-27-2023 08:37 PM

I'm trying to un-merge a dataset. More specifically, I have a dataset with variables:

PersonID

EventID (there can be multiple events per person)

a number of person-level variables that are constant within personID

a number of event-level variables that can vary within personID

I would like to end up with a person-level file, with one record per personID, including all the person-level variables. I could do this manually by just selecting on first.personID and keeping the person-level variables. But there are a large number of variables, and it is not obvious which are person and which are event level. Is there an easy way to identify which variables are fixed within personID, and keep only these in the output file. (They are a mix of numeric and character).

SASKiwi · Posted 02-27-2023 08:57 PM

Do you have the code which merged them in the first place? What if there were variables common to both? If you don't have the original merge code then intelligent guesswork is pretty much your only option.

BruceBrad · Posted 02-27-2023 09:00 PM

No. Don't have the code or original datasets. I'm prepared to assume that any variable that is constant within personid comes from the person file. (Even if not, the fact that it is constant within person is useful enough).

SASKiwi · Posted 02-27-2023 09:22 PM

In that case I would go with that assumption to split the data. Something like this might help:

data person;
  set PersonEvent;
  by PersonID;
  if first.PersonID;
run;

BruceBrad · Posted 02-27-2023 11:38 PM

I'll use that for now. Would be nice if there was a way of flagging which variables are constant within personID so I could just keep them - but for now I'll get by with just using the first value of each variable.

Tom · Posted 02-27-2023 11:29 PM

First pass is to use NLEVELS option on PROC FREQ.

proc freq data=HAVE nlevels;
tables _all_ / noprint;
run;

You could then check in more detail any variable that has the same number of levels as the PERSONID variable.

un-merging a dataset