Hello SAS community,
I have a dataset with 3 rows: ID, Date, Event. ID and Date are pretty self-explanatory but Event contains different education phases for each ID: Kindergarten, Preschool, Primary, High School and University. Now, the data is full of errors - sometimes the dates are wrong in which it is out of sequence or duplicated. Sometimes, it may show that the customer participated in say for example, high school before primary school etc. (Gaps are completely fine - ie. Fine for a customer to have a date where he attended Kindergarten then very next line can be the date he went to University). My logic is to assume these are entries are completely erroneous.
How can I logically sort this so that any data out-of-sequence which does not make sense is deleted? (If customer had a 'University' stint in between 'Primary' and 'High School' then delete this row)
Data I have:
data have;
input id date event;
1 01012000 Kindergarten
1 01012001 High school
1 01012003 Primary
1 01012010 High School
2 01012000 University
2 01012010 High School
3 01012009 Kindergarten
3 01012010 Preschool
3 01012012 Kindergarten
3 01012019 High School
3 01012022 Kindergarten
3 01012024 University
;
run;
Data I want:
data want;
input id date event;
1 01012000 Kindergarten
1 01012003 Primary
1 01012010 High School
2 01012010 High School
3 01012009 Kindergarten
3 01012010 Preschool
3 01012019 High School
3 01012024 University
;
run;
Many thanks for your kind help SAS community.