Hello SAS community,
I have a dataset with 3 rows: ID, Date, Event. ID and Date are pretty self-explanatory but Event contains different education phases for each ID: Kindergarten, Preschool, Primary, High School and University. Now, the data is full of errors - sometimes the dates are wrong in which it is out of sequence or duplicated. Sometimes, it may show that the customer participated in say for example, high school before primary school etc. (Gaps are completely fine - ie. Fine for a customer to have a date where he attended Kindergarten then very next line can be the date he went to University). My logic is to assume these are entries are completely erroneous.
How can I logically sort this so that any data out-of-sequence which does not make sense is deleted? (If customer had a 'University' stint in between 'Primary' and 'High School' then delete this row)
Data I have:
data have;
input id date event;
1 01012000 Kindergarten
1 01012001 High school
1 01012003 Primary
1 01012010 High School
2 01012000 University
2 01012010 High School
3 01012009 Kindergarten
3 01012010 Preschool
3 01012012 Kindergarten
3 01012019 High School
3 01012022 Kindergarten
3 01012024 University
;
run;
Data I want:
data want;
input id date event;
1 01012000 Kindergarten
1 01012003 Primary
1 01012010 High School
2 01012010 High School
3 01012009 Kindergarten
3 01012010 Preschool
3 01012019 High School
3 01012024 University
;
run;
Many thanks for your kind help SAS community.
@Bounce wrote:
Hello SAS community,
...
How can I logically sort this so that any data out-of-sequence which does not make sense is deleted? (If customer had a 'University' stint in between 'Primary' and 'High School' then delete this row)
Data I have:
data have;
input id date event;
2 01012000 University
2 01012010 High School
For ID 2 you have University followed by High School. You could delete either one to meet your objective. What rule are you using to choose which one to delete?
First thing to do: convert the dates to sas-dates, so that sorting by date yields the expected results - also necessary to compare dates.
Please try the below code
data have;
input id date:ddmmyy8. event&:$20.;
format date date9.;
cards;
1 01012000 Kindergarten
1 01012001 High school
1 01012003 Primary
1 01012010 High School
2 01012000 University
2 01012010 High School
3 01012009 Kindergarten
3 01012010 Preschool
3 01012012 Kindergarten
3 01012019 High School
3 01012022 Kindergarten
3 01012024 University
;
proc format;
invalue sch
'Kindergarten'=1
'High school'=3
'High School'=3
'Primary'=2
'University'=4
'Preschool'=1
;
proc sort data=have;
by id date;
run;
data have;
set have;
schnum=input(event,sch.);
by id date;
run;
proc sort data=have;
by id schnum;
run;
data want;
set have;
by id schnum;
retain schnum2 date2;
if first.id then schnum2=schnum;
if first.id then date2=date;
format date2 date9.;
if schnum>schnum2 and date<date2 then delete;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.