Hi, I have a dataset at the person-month level with an ID variable that is not clean, another ID variable, and a few other categorical variables. Dataset1: ID1 ID2 categ vars..... 1 123 1.1 123 1.2 234 1.2 234 In ID1, the 1 is the valid value, while 1.1 and 1.2 are invalid values. ID 2 is the way I can distinguish one person from another, but I need to retain as many valid values of ID1 as possible. I need to do two things: One is that for every person (ie, value of ID2) that has at least one valid value of ID1, I want to fill the rest of their rows with that valid value of ID1. The other is that for every person that never has a valid value of ID1, I want a new variable, ID3, to give them a value. Dataset 2 ID1 ID2 ID3 categ vars..... 1 123 1 123 . 234 1 . 234 2 This is what I have coded so far to accomplish this: proc sql; create table want1 as select *, case when ID1 like '%.% then 1 else 0 end as invalid_ID1 from have; quit; proc sql; create table want2 as select *, count(ID2) as num_rows, count(invalid_ID1) as rows_w_invalid_ID1 from want1 group by ID2 order by ID2, invalid_ID1; quit; data want3; set want2; by ID2 invalid_ID1; retain ID3 1; if num_rows>rows_w_invalid_ID1 then ID3=.; else if first.ID2 and num_rows=rows_w_invalid_ID1 then do; ID3+1; ID1=' ';end; retain ID1_keep; if first.ID2 then ID1_keep=ID1; if rows_w_invalid_id=1 and num_rows>rows_w_invalid_id then ID1=ID1_keep; run; This code correctly completes my first objective of overwriting invalid values of ID1 with valid values for each person using ID2 as the person identifier. However, when assigning ID3, it resets to 1 every time it encounters a missing value, ie someone who would not be assigned a value of ID3. So it looks like this: Dataset 2 ID1 ID2 ID3 categ vars..... 1 123 1 123 . 234 1 . 234 2 2 345 2 345 . 456 1 What I would want it to do is continue counting rather than resetting upon encountering the missing variable. Any help is much appreciated.
... View more