If performance is an issue, consider a double do loop:
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;
run;
data want;
do until (last.id);
set have;
by id;
if date1 < date2
then do;
_date1 = date1;
_date2 = date2;
end;
end;
do until (last.id);
set have;
by id;
if date1 = _date1 and date2 = _date2
then flag = 'Y';
else flag = ' ';
output;
end;
drop _date1 _date2;
run;
proc print data=want noobs;
run;
Result:
Id visits val date1 date2 flag 101 1 2.4 02JAN2019 05JAN2019 Y 101 2 4.0 05JAN2019 05JAN2019 101 3 5.0 10JAN2019 05JAN2019 102 1 6.0 05JAN2019 15JAN2019 102 2 7.0 12JAN2019 15JAN2019 Y 102 3 8.0 15JAN2019 15JAN2019 103 1 2.3 17JAN2019 20JAN2019 103 2 6.0 18JAN2019 20JAN2019 103 3 8.0 19JAN2019 20JAN2019 Y
Please post data in usable form: as data-step using datalines-statement.
"The flag should be populated ONLY for last non missing 'val' variable for each subjects" -- how are "subjects" identified?
As @andreas_lds has said, post test data in a datastep using he code window - its the {i} above post area.
At a guess I would say reverse sort the data, so the date is descending. Then something like:
data want; set have; retain flag; if first.id then flag=""; if lag(id)=id and lag(flag) ne "Y" and date1 < date2 then flag="Y"; run;
Then sort the data back again. You could also do something in sql like:
proc sql; create table want as select a.*, b.flag from have a left join (select *,"Y" as flag from have group by id having date1 <= date2) b on a.id=b.id and a.date1=b.date1; quit;
Something like this, perhaps?
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;run;
data flagged;
set have;
where date1<date2;
by id;
if last.id;
retain flag 'Y';
run;
data want;
merge have flagged;
by id date1 date2;
run;
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;run;
data temp;
set have;
by id;
if first.id or missing(val) or date1 ge date2 then group+1;
run;
data want;
set temp;
by group;
if last.group and not missing(val) and date1 lt date2 then flag='Y';
run;
If performance is an issue, consider a double do loop:
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;
run;
data want;
do until (last.id);
set have;
by id;
if date1 < date2
then do;
_date1 = date1;
_date2 = date2;
end;
end;
do until (last.id);
set have;
by id;
if date1 = _date1 and date2 = _date2
then flag = 'Y';
else flag = ' ';
output;
end;
drop _date1 _date2;
run;
proc print data=want noobs;
run;
Result:
Id visits val date1 date2 flag 101 1 2.4 02JAN2019 05JAN2019 Y 101 2 4.0 05JAN2019 05JAN2019 101 3 5.0 10JAN2019 05JAN2019 102 1 6.0 05JAN2019 15JAN2019 102 2 7.0 12JAN2019 15JAN2019 Y 102 3 8.0 15JAN2019 15JAN2019 103 1 2.3 17JAN2019 20JAN2019 103 2 6.0 18JAN2019 20JAN2019 103 3 8.0 19JAN2019 20JAN2019 Y
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;
run;
proc sql;
create table want(drop=t) as
select *,date1<date2 as t,ifc(calculated t and max(date1)=date1,'Y',' ') as Flag
from have
group by id,t
order by id,date1,date2;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.