Hi....I am trying to fill in missing entries and obtain a set of unique records by UID, EnrollmentStatus and PartialCode. The rules for keeping and/or correcting the missing entries are:
1) if more than one record exist based on the UID, EnrollmentStatus and PartialCode where RegisterStatus is missing and the other not missing, keep the record where RegisterStatus is not missing.
2) if the record is the first record in the group where the RegisterStatus is missing, then fill the missing entry with the next entry in the RegisterStatus.
3) if the record is not the first record in the group where the RegisterStatus is missing, then fill the missing entry with the previous entry in the RegisterStatus.
Any suggestions on how to solve this.Thanks.
Have:
UID | EnrollmentStatus | RegisterStatus | PartialCode |
104 | Post Secondary | CC1 PTSD | 00-01 |
106 | Post Secondary | CC4 Post Sec | 00-01 |
107 | Secondary Access | CC1 PTSD | 00-01 |
108 | Secondary Access | CC2 PTSD Grad | 00-01 |
108 | Secondary Access | CC2 PTSD Grad | 01-02 |
109 | Secondary Access | CC2 PTSD Grad | 00-01 |
109 | Secondary Access | CC2 PTSD Grad | 01-02 |
161 | Post Secondary | CC3 LRSD | 00-01 |
161 | Post Secondary | CC4 Post Sec | 07-08 |
161 | Post Secondary | CC4 Post Sec | 08-09 |
161 | Post Secondary | 08-09 | |
167 | Secondary Access | CC3 LRSD | 00-01 |
167 | Secondary Access | CC3 LRSD | 01-02 |
167 | Secondary Access | 02-03 | |
192 | Secondary Access | CC3 LRSD | 00-01 |
192 | Secondary Access | 01-02 | |
192 | Secondary Access | CC4 Post Sec | 02-03 |
193 | Secondary Access | 00-01 | |
193 | Secondary Access | CC5 PstSec Spon | 02-03 |
193 | Secondary Access | CC5 PstSec Spon | 04-05 |
Want:
UID | EnrollmentStatus | RegisterStatus | PartialCode |
104 | Post Secondary | CC1 PTSD | 00-01 |
106 | Post Secondary | CC4 Post Sec | 00-01 |
107 | Secondary Access | CC1 PTSD | 00-01 |
108 | Secondary Access | CC2 PTSD Grad | 00-01 |
108 | Secondary Access | CC2 PTSD Grad | 01-02 |
109 | Secondary Access | CC2 PTSD Grad | 00-01 |
109 | Secondary Access | CC2 PTSD Grad | 01-02 |
161 | Post Secondary | CC3 LRSD | 00-01 |
161 | Post Secondary | CC4 Post Sec | 07-08 |
161 | Post Secondary | CC4 Post Sec | 08-09 |
167 | Secondary Access | CC3 LRSD | 00-01 |
167 | Secondary Access | CC3 LRSD | 01-02 |
167 | Secondary Access | CC3 LRSD | 02-03 |
192 | Secondary Access | CC3 LRSD | 00-01 |
192 | Secondary Access | CC3 LRSD | 01-02 |
192 | Secondary Access | CC4 Post Sec | 02-03 |
193 | Secondary Access | CC5 PstSec Spon | 00-01 |
193 | Secondary Access | CC5 PstSec Spon | 02-03 |
193 | Secondary Access | CC5 PstSec Spon | 04-05 |
flip it over twice.
data have;
infile cards expandtabs truncover;
input UID EnrollmentStatus & $20. RegisterStatus & $20. PartialCode $;
cards;
192 Secondary Access CC3 LRSD 00-01
192 Secondary Access . 01-02
192 Secondary Access CC4 Post Sec 02-03
193 Secondary Access . 00-01
193 Secondary Access CC5 PstSec Spon 02-03
193 Secondary Access CC5 PstSec Spon 04-05
;
run;
data temp;
set have;
by uid;
if first.uid then n=0;
n+1;
run;
data temp1;
set temp;
by uid;
length temp $ 40;
retain temp;
if first.uid then call missing(temp);
if not missing(RegisterStatus) then temp=RegisterStatus;
drop RegisterStatus;
run;
proc sort data=temp1;
by uid descending n;
run;
data want;
set temp1;
by uid;
length RegisterStatus $ 40;
retain RegisterStatus;
if first.uid then call missing(RegisterStatus);
if not missing(temp) then RegisterStatus=temp;
drop temp;
run;
proc sort data=want;
by uid n;
run;
Sort your data by the group variables and by desecnding of the other one:
proc sort data=have;
by UID EnrollmentStatus PartialCode
descending RegisterStatus ;
run;
data want;
set have;
by UID EnrollmentStatus PartialCode;
retain lag_rs;
lag_rs = lag( RegisterStatus);
if not first.PartialCode and missing(RegisterStatus)
then RegisterStatus = lag_rs;
run;
flip it over twice.
data have;
infile cards expandtabs truncover;
input UID EnrollmentStatus & $20. RegisterStatus & $20. PartialCode $;
cards;
192 Secondary Access CC3 LRSD 00-01
192 Secondary Access . 01-02
192 Secondary Access CC4 Post Sec 02-03
193 Secondary Access . 00-01
193 Secondary Access CC5 PstSec Spon 02-03
193 Secondary Access CC5 PstSec Spon 04-05
;
run;
data temp;
set have;
by uid;
if first.uid then n=0;
n+1;
run;
data temp1;
set temp;
by uid;
length temp $ 40;
retain temp;
if first.uid then call missing(temp);
if not missing(RegisterStatus) then temp=RegisterStatus;
drop RegisterStatus;
run;
proc sort data=temp1;
by uid descending n;
run;
data want;
set temp1;
by uid;
length RegisterStatus $ 40;
retain RegisterStatus;
if first.uid then call missing(RegisterStatus);
if not missing(temp) then RegisterStatus=temp;
drop temp;
run;
proc sort data=want;
by uid n;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.