The LAG function is not working for this dataset for the variable dsp_display_prev and dsp_display_cmli_id.
Please see the data set attached and here is the code:
I've reject the .csv you've shared and it doesn't contain any variables names starting with total_* Looks like I've got "confused" when testing my code. Just remove this portion of the code as you've already done.
I still do not understand why my original code could not match the dsp_display_cmli_id to the prev variable?
That's because there are TWO variables. The one you use for aggregation is called dsp_total_display_imps, the one you initialize with zero and use in your retain and keep list is called total_dsp_display_imps.
Help us help you:
Tell us exactly what you see that makes you think that the LAG function is not working.
Your example data only shows one value of LUID, so anything related to multiple values of that variable can't be seen.
Since you say the output is wrong, then show, for that given example what should be correct.
When I use this data set created using your CSV values (and this is how you should share data) I do not see anything that the LAG function is not working. I see suspect accumulators but the LAG looks correct.
data UTP_REDBULL_DF_2; infile datalines dlm=','; input luid :$10. display_cmli_id :$10. dsp_display_cmli_id :$10. sp_cmli_id :$10.; datalines; 1027774874,6261781089,3g6gzrq,1744739 1027774874,6261781089,3g6gzrq,1744739 1027774874,6261781089,3g6gzrq,1744739 1027774874,6261781089,3g6gzrq,1744739 1027774874,6261781089,3g6gzrq,1744739 1027774874,6261781089,56vcsen,1744739 1027774874,6261781089,56vcsen,1744739 1027774874,6261781089,56vcsen,1744739 1027774874,6261781089,56vcsen,1744739 1027774874,6261781089,56vcsen,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,b9rq6w4,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dkha69c,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,dryvhem,1744739 1027774874,6261781089,im7u5uh,1744739 1027774874,6261781089,im7u5uh,1744739 1027774874,6261781089,im7u5uh,1744739 1027774874,6261781089,im7u5uh,1744739 1027774874,6261781089,im7u5uh,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,q7c5l1c,1744739 1027774874,6261781089,v2libkt,1744739 1027774874,6261781089,v2libkt,1744739 1027774874,6261781089,v2libkt,1744739 1027774874,6261781089,v2libkt,1744739 1027774874,6261781089,v2libkt,1744739 ;
Your example data also only has one value for dsp_display_cmli_id and one value for sp_cmli_id so it is hard to see a likely problem with LAG associated with those variables.
It may also help to describe a bit more about what this is supposed to do.
The lag() function is working as designed.
I believe what caused an unexpected result for you was due to the total_* variables already existing in source. Because these variables exist already in source the retain statement on these variables doesn't have the effect you thought it would.
Does below code return what you're after?
data UTP_REDBULL_DF_3;
set UTP_REDBULL_DF_2(rename=(
total_dsp_display_imps=_src_total_dsp_display_imps
total_display_imps =_src_total_display_imps
total_sp_clicks =_src_total_sp_clicks
));
by luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
;
if first.luid then
do;
total_dsp_display_imps=0;
total_display_imps=0;
total_sp_clicks=0;
end;
retain
total_dsp_display_imps
total_display_imps
total_sp_clicks
;
*-- Counting total impressions and clicks by luid --*;
dsp_display_prev=lag(dsp_display_cmli_id);
if first.luid or upcase(dsp_display_cmli_id) ^=upcase(dsp_display_prev) then
total_dsp_display_imps+1;
display_prev =lag(display_cmli_id);
if first.luid or display_cmli_id ^=display_prev then
total_display_imps+1;
sp_prev =lag(sp_cmli_id);
if first.luid or sp_cmli_id ^=sp_prev then
total_sp_clicks+1;
keep luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_prev
display_prev
sp_prev
total_display_imps
total_dsp_display_imps
total_sp_clicks
;
run;
I also assume that you intended in below code section to call your variable total_dsp_display_imps and not dsp_total_...
Below a variant for above code that should return the same result.
data UTP_REDBULL_DF_3;
set UTP_REDBULL_DF_2(drop=total_:);
by luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
;
if first.luid then
do;
total_dsp_display_imps=0;
total_display_imps=0;
total_sp_clicks=0;
end;
*-- Counting total impressions and clicks by luid --*;
_check= upcase(dsp_display_cmli_id) ^=upcase(lag(dsp_display_cmli_id));
if first.luid or _check then
total_dsp_display_imps+1;
_check= display_cmli_id ^=lag(display_cmli_id);
if first.luid or _check then
total_display_imps+1;
_check= sp_cmli_id ^=lag(sp_cmli_id);
if first.luid or _check then
total_sp_clicks+1;
keep luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_prev
display_prev
sp_prev
total_display_imps
total_dsp_display_imps
total_sp_clicks
;
run;
Thank you, the latest code worked (below). I still do not understand why my original code could not match the dsp_display_cmli_id to the prev variable? Also what is
(drop=total_:);
Meant to do? I took it out since it was giving me an error.
@Kaushansky wrote:
Also what is
(drop=total_:);
Meant to do? I took it out since it was giving me an error.
Patrick had guessed that maybe your dataset UTP_REDBULL_DF_2
already has variables named total_dsp_display_imps etc. Is that correct? If so, you don't want to read them in the DATA step, because they will collide with the variables you are creating in the step as accumulators. Therefore drop= will not read in any variables that start with total_ . But if this was throwing an error when you ran it, that suggests these variables don't actually exist in UTP_REDBULL_DF_2. So Patrick's guess as to the cause of the problem may have been wrong, but sounds like the code Patrick wrote is right. : )
data UTP_REDBULL_DF_3;
set UTP_REDBULL_DF_2(drop=total_:);
I've reject the .csv you've shared and it doesn't contain any variables names starting with total_* Looks like I've got "confused" when testing my code. Just remove this portion of the code as you've already done.
I still do not understand why my original code could not match the dsp_display_cmli_id to the prev variable?
That's because there are TWO variables. The one you use for aggregation is called dsp_total_display_imps, the one you initialize with zero and use in your retain and keep list is called total_dsp_display_imps.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.