If performance is an issue, consider a double do loop:
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;
run;
data want;
do until (last.id);
set have;
by id;
if date1 < date2
then do;
_date1 = date1;
_date2 = date2;
end;
end;
do until (last.id);
set have;
by id;
if date1 = _date1 and date2 = _date2
then flag = 'Y';
else flag = ' ';
output;
end;
drop _date1 _date2;
run;
proc print data=want noobs;
run;
Result:
Id visits val date1 date2 flag 101 1 2.4 02JAN2019 05JAN2019 Y 101 2 4.0 05JAN2019 05JAN2019 101 3 5.0 10JAN2019 05JAN2019 102 1 6.0 05JAN2019 15JAN2019 102 2 7.0 12JAN2019 15JAN2019 Y 102 3 8.0 15JAN2019 15JAN2019 103 1 2.3 17JAN2019 20JAN2019 103 2 6.0 18JAN2019 20JAN2019 103 3 8.0 19JAN2019 20JAN2019 Y
Please post data in usable form: as data-step using datalines-statement.
"The flag should be populated ONLY for last non missing 'val' variable for each subjects" -- how are "subjects" identified?
As @andreas_lds has said, post test data in a datastep using he code window - its the {i} above post area.
At a guess I would say reverse sort the data, so the date is descending. Then something like:
data want; set have; retain flag; if first.id then flag=""; if lag(id)=id and lag(flag) ne "Y" and date1 < date2 then flag="Y"; run;
Then sort the data back again. You could also do something in sql like:
proc sql; create table want as select a.*, b.flag from have a left join (select *,"Y" as flag from have group by id having date1 <= date2) b on a.id=b.id and a.date1=b.date1; quit;
Something like this, perhaps?
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;run;
data flagged;
set have;
where date1<date2;
by id;
if last.id;
retain flag 'Y';
run;
data want;
merge have flagged;
by id date1 date2;
run;
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;run;
data temp;
set have;
by id;
if first.id or missing(val) or date1 ge date2 then group+1;
run;
data want;
set temp;
by group;
if last.group and not missing(val) and date1 lt date2 then flag='Y';
run;
If performance is an issue, consider a double do loop:
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;
run;
data want;
do until (last.id);
set have;
by id;
if date1 < date2
then do;
_date1 = date1;
_date2 = date2;
end;
end;
do until (last.id);
set have;
by id;
if date1 = _date1 and date2 = _date2
then flag = 'Y';
else flag = ' ';
output;
end;
drop _date1 _date2;
run;
proc print data=want noobs;
run;
Result:
Id visits val date1 date2 flag 101 1 2.4 02JAN2019 05JAN2019 Y 101 2 4.0 05JAN2019 05JAN2019 101 3 5.0 10JAN2019 05JAN2019 102 1 6.0 05JAN2019 15JAN2019 102 2 7.0 12JAN2019 15JAN2019 Y 102 3 8.0 15JAN2019 15JAN2019 103 1 2.3 17JAN2019 20JAN2019 103 2 6.0 18JAN2019 20JAN2019 103 3 8.0 19JAN2019 20JAN2019 Y
data have;
input Id visits val date1: date9. date2 date9.;
format date1 date2 date9.;
cards;
101 1 2.4 02jan2019 05jan2019
101 2 4 05jan2019 05jan2019
101 3 5 10jan2019 05jan2019
102 1 6 05jan2019 15jan2019
102 2 7 12jan2019 15jan2019
102 3 8 15jan2019 15jan2019
103 1 2.3 17jan2019 20jan2019
103 2 6 18jan2019 20jan2019
103 3 8 19jan2019 20jan2019
;
run;
proc sql;
create table want(drop=t) as
select *,date1<date2 as t,ifc(calculated t and max(date1)=date1,'Y',' ') as Flag
from have
group by id,t
order by id,date1,date2;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.