BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Abimal_Zippi
Fluorite | Level 6

Since lag function does not go with proc sql, is there an equivalent one for sum function. I appreciate if any one who can help to sum the doses between each unique  category dates (N.B. for each unique id, the sum of doses (of date_exposed var) between each unique catgeory_dates)?

Here is the data have and want -

data have;
input ID category date_category date_exposed dose ;
format date_category date_exposed date9.;
cards;
1   0 10JUL2017 10Jan2016 15 
1   0 10JUL2017 21Jan2016 7.5
1   0 10JUL2017 22Jan2016  10
1   0 10JUL2017 27Jan2016  100
1   2 10JUL2018 5AUG2017  100
1   2 10JUL2018 15AUG2017  80
1   2 10JUL2018 25AUG2017  10
1   2 10JUL2018 25JUN2018  100
1   3 10JUL2019 15AUG2018  10
1   3 10JUL2019 15AUG2018  140
1   4 10JUL2020 15AUG2020  140
1   4 10JUL2019 15AUG2020  50
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   1 10JUL2018 10AUG2017  15
2   1 10JUL2018 20AUG2017  5
2   1 10JUL2018 20Jan2018  15
2   2 10JUL2019 20Jan2019  5
2   2 10JUL2019 20Jan2019  12
2   2 10JUL2019 20FEB2019  12
;
data want;
input ID category date_category sum ;
format date_category date9.;
cards;
1   0 10JUL2017  . 
1   2 10JUL2018  290
1   3 10JUL2019 150
1   4 10JUL2020  0
2   0 10JUL2017  -
2   1 10JUL2018  35
2   2 10JUL2019  29
;

Thanks, 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Testing the code that you post is NOT a crime, it's a basic courtesy with regards to those that are supposed to help you. 

When reading dates, do not forget to use date informats:

data have;
input ID category date_category :date9. date_exposed :date9. dose;
format date_category date_exposed date9.;
cards;
1   0 10JUL2017 10Jan2016 15 
1   0 10JUL2017 21Jan2016 7.5
1   0 10JUL2017 22Jan2016  10
1   0 10JUL2017 27Jan2016  100
1   2 10JUL2018 5AUG2017  100
1   2 10JUL2018 15AUG2017  80
1   2 10JUL2018 25AUG2017  10
1   2 10JUL2018 25JUN2018  100
1   3 10JUL2019 15AUG2018  10
1   3 10JUL2019 15AUG2018  140
1   4 10JUL2020 15AUG2020  140
1   4 10JUL2019 15AUG2020  50
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   1 10JUL2018 10AUG2017  15
2   1 10JUL2018 20AUG2017  5
2   1 10JUL2018 20Jan2018  15
2   2 10JUL2019 20Jan2019  5
2   2 10JUL2019 20Jan2019  12
2   2 10JUL2019 20FEB2019  12
;

Based on this data, this data step creates your intended result:

data want;
set have;
by id category;
retain last_date sum;
if first.id
then do;
  last_date = .;
  sum = .;
end;
if last_date ne . and last_date le date_exposed le date_category then sum + dose;
if last.category;
last_date = date_category;
output;
sum = 0;
keep id category date_category sum;
run;

proc print data=want noobs;
run;

Result:

ID	category	date_category	sum
1	0	10JUL2017	.
1	2	10JUL2018	290
1	3	10JUL2019	150
1	4	10JUL2019	0
2	0	10JUL2017	.
2	1	10JUL2018	35
2	2	10JUL2019	29

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

The LAG function is based on the concept of FIFO queues - a concept PROC SQL will not support.

 

Yes, there is the undocumented MONOTONIC function in PROC SQL (see MONOTONIC function in PROC-SQL), roughly analogous to _N_ in a data step.  So you might be able, using intermediate views or tables, to code something like this:

 

proc sql noprint;
  create view a as select date, snydjcm, monotonic() as a_sequence from sashelp.citiday;
  create view b as select date, snydjcm, monotonic() as b_sequence from sashelp.citiday;

  create table want as 
    select a.date, a.snydjcm as dj_current, b.date as prior_date, b.snydjcm as dj_prior
	from a left join b on a_sequence=b_sequence+1;
   ;
quit;

But even then you are relying on PROC SQL reading in the data in the same order that a data step would read it in.  That is NOT guaranteed, so monotonic() should not be used for production code.

 

Or, if the data has some ordering variable, with no repeated values (say DATE), you could do a join of A and B, with a group by a.date, where b.date<a.date, and having b.date=max(b.date).   Extremely expensive.  It's so anti-scalable a concept I won't write the code.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Abimal_Zippi
Fluorite | Level 6
Thanks for your help with the coding!
ballardw
Super User

I think that you need describe in considerable detail why this is the desired output:

 

1   4 10JUL2020  0

 

for Id=1 and category = 4. You also show as input:

1   4 10JUL2019 15AUG2020  50

and there is no output for that date. So I suspect the 10Jul2019 was intended to be 10Jul2020 but that still raises the question of why the result for 10Jul2020 would be 0.

Assuming the 10Jul2019 is a typo with an incorrect 0 for sum in the Want set I might suggest:

Summarizing and then setting the first value to missing for the Id.

data have;
input ID category date_category :date9. date_exposed :date9. dose ;
format date_category date_exposed date9.;
cards;
1   0 10JUL2017 10Jan2016 15 
1   0 10JUL2017 21Jan2016 7.5
1   0 10JUL2017 22Jan2016  10
1   0 10JUL2017 27Jan2016  100
1   2 10JUL2018 5AUG2017  100
1   2 10JUL2018 15AUG2017  80
1   2 10JUL2018 25AUG2017  10
1   2 10JUL2018 25JUN2018  100
1   3 10JUL2019 15AUG2018  10
1   3 10JUL2019 15AUG2018  140
1   4 10JUL2020 15AUG2020  140
1   4 10JUL2019 15AUG2020  50
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   1 10JUL2018 10AUG2017  15
2   1 10JUL2018 20AUG2017  5
2   1 10JUL2018 20Jan2018  15
2   2 10JUL2019 20Jan2019  5
2   2 10JUL2019 20Jan2019  12
2   2 10JUL2019 20FEB2019  12
;

proc summary data=have nway;
   class ID category date_category;
   var dose;
   output out=summary (drop=_type_ _freq_)
       sum=sum;
run;
data want; 
   set summary;
   by  id;
   if first.id then call missing(sum);
run;

 

Abimal_Zippi
Fluorite | Level 6
Thanks for the note, it should be 10Jul2020 .
Kurt_Bremser
Super User

Testing the code that you post is NOT a crime, it's a basic courtesy with regards to those that are supposed to help you. 

When reading dates, do not forget to use date informats:

data have;
input ID category date_category :date9. date_exposed :date9. dose;
format date_category date_exposed date9.;
cards;
1   0 10JUL2017 10Jan2016 15 
1   0 10JUL2017 21Jan2016 7.5
1   0 10JUL2017 22Jan2016  10
1   0 10JUL2017 27Jan2016  100
1   2 10JUL2018 5AUG2017  100
1   2 10JUL2018 15AUG2017  80
1   2 10JUL2018 25AUG2017  10
1   2 10JUL2018 25JUN2018  100
1   3 10JUL2019 15AUG2018  10
1   3 10JUL2019 15AUG2018  140
1   4 10JUL2020 15AUG2020  140
1   4 10JUL2019 15AUG2020  50
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   0 10JUL2017 10Jan2016  15
2   1 10JUL2018 10AUG2017  15
2   1 10JUL2018 20AUG2017  5
2   1 10JUL2018 20Jan2018  15
2   2 10JUL2019 20Jan2019  5
2   2 10JUL2019 20Jan2019  12
2   2 10JUL2019 20FEB2019  12
;

Based on this data, this data step creates your intended result:

data want;
set have;
by id category;
retain last_date sum;
if first.id
then do;
  last_date = .;
  sum = .;
end;
if last_date ne . and last_date le date_exposed le date_category then sum + dose;
if last.category;
last_date = date_category;
output;
sum = 0;
keep id category date_category sum;
run;

proc print data=want noobs;
run;

Result:

ID	category	date_category	sum
1	0	10JUL2017	.
1	2	10JUL2018	290
1	3	10JUL2019	150
1	4	10JUL2019	0
2	0	10JUL2017	.
2	1	10JUL2018	35
2	2	10JUL2019	29
Abimal_Zippi
Fluorite | Level 6
Thanks for your note and your help with the coding.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1181 views
  • 3 likes
  • 4 in conversation