Since lag function does not go with proc sql, is there an equivalent one for sum function. I appreciate if any one who can help to sum the doses between each unique category dates (N.B. for each unique id, the sum of doses (of date_exposed var) between each unique catgeory_dates)?
Here is the data have and want -
data have;
input ID category date_category date_exposed dose ;
format date_category date_exposed date9.;
cards;
1 0 10JUL2017 10Jan2016 15
1 0 10JUL2017 21Jan2016 7.5
1 0 10JUL2017 22Jan2016 10
1 0 10JUL2017 27Jan2016 100
1 2 10JUL2018 5AUG2017 100
1 2 10JUL2018 15AUG2017 80
1 2 10JUL2018 25AUG2017 10
1 2 10JUL2018 25JUN2018 100
1 3 10JUL2019 15AUG2018 10
1 3 10JUL2019 15AUG2018 140
1 4 10JUL2020 15AUG2020 140
1 4 10JUL2019 15AUG2020 50
2 0 10JUL2017 10Jan2016 15
2 0 10JUL2017 10Jan2016 15
2 0 10JUL2017 10Jan2016 15
2 1 10JUL2018 10AUG2017 15
2 1 10JUL2018 20AUG2017 5
2 1 10JUL2018 20Jan2018 15
2 2 10JUL2019 20Jan2019 5
2 2 10JUL2019 20Jan2019 12
2 2 10JUL2019 20FEB2019 12
;
data want;
input ID category date_category sum ;
format date_category date9.;
cards;
1 0 10JUL2017 .
1 2 10JUL2018 290
1 3 10JUL2019 150
1 4 10JUL2020 0
2 0 10JUL2017 -
2 1 10JUL2018 35
2 2 10JUL2019 29
;
Thanks,
Testing the code that you post is NOT a crime, it's a basic courtesy with regards to those that are supposed to help you.
When reading dates, do not forget to use date informats:
data have;
input ID category date_category :date9. date_exposed :date9. dose;
format date_category date_exposed date9.;
cards;
1 0 10JUL2017 10Jan2016 15
1 0 10JUL2017 21Jan2016 7.5
1 0 10JUL2017 22Jan2016 10
1 0 10JUL2017 27Jan2016 100
1 2 10JUL2018 5AUG2017 100
1 2 10JUL2018 15AUG2017 80
1 2 10JUL2018 25AUG2017 10
1 2 10JUL2018 25JUN2018 100
1 3 10JUL2019 15AUG2018 10
1 3 10JUL2019 15AUG2018 140
1 4 10JUL2020 15AUG2020 140
1 4 10JUL2019 15AUG2020 50
2 0 10JUL2017 10Jan2016 15
2 0 10JUL2017 10Jan2016 15
2 0 10JUL2017 10Jan2016 15
2 1 10JUL2018 10AUG2017 15
2 1 10JUL2018 20AUG2017 5
2 1 10JUL2018 20Jan2018 15
2 2 10JUL2019 20Jan2019 5
2 2 10JUL2019 20Jan2019 12
2 2 10JUL2019 20FEB2019 12
;
Based on this data, this data step creates your intended result:
data want;
set have;
by id category;
retain last_date sum;
if first.id
then do;
last_date = .;
sum = .;
end;
if last_date ne . and last_date le date_exposed le date_category then sum + dose;
if last.category;
last_date = date_category;
output;
sum = 0;
keep id category date_category sum;
run;
proc print data=want noobs;
run;
Result:
ID category date_category sum 1 0 10JUL2017 . 1 2 10JUL2018 290 1 3 10JUL2019 150 1 4 10JUL2019 0 2 0 10JUL2017 . 2 1 10JUL2018 35 2 2 10JUL2019 29
The LAG function is based on the concept of FIFO queues - a concept PROC SQL will not support.
Yes, there is the undocumented MONOTONIC function in PROC SQL (see MONOTONIC function in PROC-SQL), roughly analogous to _N_ in a data step. So you might be able, using intermediate views or tables, to code something like this:
proc sql noprint;
create view a as select date, snydjcm, monotonic() as a_sequence from sashelp.citiday;
create view b as select date, snydjcm, monotonic() as b_sequence from sashelp.citiday;
create table want as
select a.date, a.snydjcm as dj_current, b.date as prior_date, b.snydjcm as dj_prior
from a left join b on a_sequence=b_sequence+1;
;
quit;
But even then you are relying on PROC SQL reading in the data in the same order that a data step would read it in. That is NOT guaranteed, so monotonic() should not be used for production code.
Or, if the data has some ordering variable, with no repeated values (say DATE), you could do a join of A and B, with a group by a.date, where b.date<a.date, and having b.date=max(b.date). Extremely expensive. It's so anti-scalable a concept I won't write the code.
I think that you need describe in considerable detail why this is the desired output:
1 4 10JUL2020 0
for Id=1 and category = 4. You also show as input:
1 4 10JUL2019 15AUG2020 50
and there is no output for that date. So I suspect the 10Jul2019 was intended to be 10Jul2020 but that still raises the question of why the result for 10Jul2020 would be 0.
Assuming the 10Jul2019 is a typo with an incorrect 0 for sum in the Want set I might suggest:
Summarizing and then setting the first value to missing for the Id.
data have; input ID category date_category :date9. date_exposed :date9. dose ; format date_category date_exposed date9.; cards; 1 0 10JUL2017 10Jan2016 15 1 0 10JUL2017 21Jan2016 7.5 1 0 10JUL2017 22Jan2016 10 1 0 10JUL2017 27Jan2016 100 1 2 10JUL2018 5AUG2017 100 1 2 10JUL2018 15AUG2017 80 1 2 10JUL2018 25AUG2017 10 1 2 10JUL2018 25JUN2018 100 1 3 10JUL2019 15AUG2018 10 1 3 10JUL2019 15AUG2018 140 1 4 10JUL2020 15AUG2020 140 1 4 10JUL2019 15AUG2020 50 2 0 10JUL2017 10Jan2016 15 2 0 10JUL2017 10Jan2016 15 2 0 10JUL2017 10Jan2016 15 2 1 10JUL2018 10AUG2017 15 2 1 10JUL2018 20AUG2017 5 2 1 10JUL2018 20Jan2018 15 2 2 10JUL2019 20Jan2019 5 2 2 10JUL2019 20Jan2019 12 2 2 10JUL2019 20FEB2019 12 ; proc summary data=have nway; class ID category date_category; var dose; output out=summary (drop=_type_ _freq_) sum=sum; run; data want; set summary; by id; if first.id then call missing(sum); run;
Testing the code that you post is NOT a crime, it's a basic courtesy with regards to those that are supposed to help you.
When reading dates, do not forget to use date informats:
data have;
input ID category date_category :date9. date_exposed :date9. dose;
format date_category date_exposed date9.;
cards;
1 0 10JUL2017 10Jan2016 15
1 0 10JUL2017 21Jan2016 7.5
1 0 10JUL2017 22Jan2016 10
1 0 10JUL2017 27Jan2016 100
1 2 10JUL2018 5AUG2017 100
1 2 10JUL2018 15AUG2017 80
1 2 10JUL2018 25AUG2017 10
1 2 10JUL2018 25JUN2018 100
1 3 10JUL2019 15AUG2018 10
1 3 10JUL2019 15AUG2018 140
1 4 10JUL2020 15AUG2020 140
1 4 10JUL2019 15AUG2020 50
2 0 10JUL2017 10Jan2016 15
2 0 10JUL2017 10Jan2016 15
2 0 10JUL2017 10Jan2016 15
2 1 10JUL2018 10AUG2017 15
2 1 10JUL2018 20AUG2017 5
2 1 10JUL2018 20Jan2018 15
2 2 10JUL2019 20Jan2019 5
2 2 10JUL2019 20Jan2019 12
2 2 10JUL2019 20FEB2019 12
;
Based on this data, this data step creates your intended result:
data want;
set have;
by id category;
retain last_date sum;
if first.id
then do;
last_date = .;
sum = .;
end;
if last_date ne . and last_date le date_exposed le date_category then sum + dose;
if last.category;
last_date = date_category;
output;
sum = 0;
keep id category date_category sum;
run;
proc print data=want noobs;
run;
Result:
ID category date_category sum 1 0 10JUL2017 . 1 2 10JUL2018 290 1 3 10JUL2019 150 1 4 10JUL2019 0 2 0 10JUL2017 . 2 1 10JUL2018 35 2 2 10JUL2019 29
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.