04-28-2016 09:42 AM
I have a dataset which is supposed to be at the person-level, but really there are some people on multiple rows (I know because I've added an umbrella ID and there are some people on two rows with the same umbrella ID but different values of my former ID variable, ID2). My dataset has ID1 (the umbrella ID), ID2, indicators for each month (=1 if the person had an event in that month), and a number of categorical variables.
ID1 ID2 mth_200901 mth_200902...mth_201311 mth_201312 categ_1 categ_2
1 1 1 1 . . abc pqr
1 2 . . 1 1 def xyz
What I want is two things. One is simple and is just that for each value of ID1, they should have a value of 1 for each mth_ indicator as long as one of their sub-IDs (ID2) had a 1 for that month. The other is that I want the values of categ_1 and categ_2 (and the several other categorical vars) as of the row with the latest month that =1. So my final dataset would have this summary row for ID1=1.
ID1 mth_200901 mth_200902...mth_201311 mth_201312 categ_1 categ_2
1 1 1 1 1 def xyz
Any help is much appreciated.