BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Kaushansky
Obsidian | Level 7

The LAG function is not working for this dataset for the variable dsp_display_prev and dsp_display_cmli_id.

Please see the data set attached and here is the code:

 

proc sort data=UTP_REDBULL_DF_2; 
by luid
dsp_display_cmli_id  
display_cmli_id  
sp_cmli_id  
;
 
data UTP_REDBULL_DF_3; set UTP_REDBULL_DF_2;
by luid
dsp_display_cmli_id  
display_cmli_id  
sp_cmli_id  
;
 
if first.luid then do;
 
total_dsp_display_imps=0;
total_display_imps=0;
total_sp_clicks=0;
 
end;
retain
total_dsp_display_imps
total_display_imps
total_sp_clicks
;
 
*-- Counting total impressions and clicks by luid --*;
dsp_display_prev=lag(dsp_display_cmli_id); if upcase(dsp_display_cmli_id) ^=upcase(dsp_display_prev) then dsp_total_display_imps+1;
    display_prev =lag(display_cmli_id); if display_cmli_id ^=display_prev then total_display_imps+1;
    sp_prev =lag(sp_cmli_id); if sp_cmli_id ^=sp_prev then total_sp_clicks+1;
 
keep  luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
 
    dsp_display_prev
    display_prev
    sp_prev
 
total_display_imps  
total_dsp_display_imps  
total_sp_clicks  
;  
run;
 
proc print data=UTP_REDBULL_DF_3;
var luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
 
    dsp_display_prev
    display_prev
    sp_prev
 
total_display_imps  
total_dsp_display_imps  
total_sp_clicks  
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Kaushansky 

I've reject the .csv you've shared and it doesn't contain any variables names starting with total_* Looks like I've got "confused" when testing my code. Just remove this portion of the code as you've already done.

 

I still do not understand why my original code could not match the dsp_display_cmli_id to the prev variable?

That's because there are TWO variables. The one you use for aggregation is called dsp_total_display_imps, the one you initialize with zero and use in your retain and keep list is called total_dsp_display_imps.

 

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

Help us help you:

 

  1. Please provide sample data in the form of a working DATA step, not a csv file.  This will  increase the number of respondents who will develop code to help resolve your problem. 
  2. Please show what you got from your program.
  3. Please show what you expected to get, and describe how it is different from what you got.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

Tell us exactly what you see that makes you think that the LAG function is not working.

 

Your example data only shows one value of LUID, so anything related to multiple values of that variable can't be seen.

Since you say the output is wrong, then show, for that given example what should be correct.

When I use this data set created using your CSV values (and this is how you should share data) I do not see anything that the LAG function is not working. I see suspect accumulators but the LAG looks correct.

data UTP_REDBULL_DF_2;
   infile datalines dlm=',';
   input luid :$10. display_cmli_id :$10. dsp_display_cmli_id :$10. sp_cmli_id :$10.;
datalines;
1027774874,6261781089,3g6gzrq,1744739
1027774874,6261781089,3g6gzrq,1744739
1027774874,6261781089,3g6gzrq,1744739
1027774874,6261781089,3g6gzrq,1744739
1027774874,6261781089,3g6gzrq,1744739
1027774874,6261781089,56vcsen,1744739
1027774874,6261781089,56vcsen,1744739
1027774874,6261781089,56vcsen,1744739
1027774874,6261781089,56vcsen,1744739
1027774874,6261781089,56vcsen,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,b9rq6w4,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dkha69c,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,dryvhem,1744739
1027774874,6261781089,im7u5uh,1744739
1027774874,6261781089,im7u5uh,1744739
1027774874,6261781089,im7u5uh,1744739
1027774874,6261781089,im7u5uh,1744739
1027774874,6261781089,im7u5uh,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,q7c5l1c,1744739
1027774874,6261781089,v2libkt,1744739
1027774874,6261781089,v2libkt,1744739
1027774874,6261781089,v2libkt,1744739
1027774874,6261781089,v2libkt,1744739
1027774874,6261781089,v2libkt,1744739
;

Your example data also only has one value for dsp_display_cmli_id and one value for sp_cmli_id so it is hard to see a likely problem with LAG associated with those variables.

 

 

 

It may also help to describe a bit more about what this is supposed to do.

Patrick
Opal | Level 21

The lag() function is working as designed.

I believe what caused an unexpected result for you was due to the total_* variables already existing in source. Because these variables exist already in source the retain statement on these variables doesn't have the effect you thought it would.

 

Does below code return what you're after?

data UTP_REDBULL_DF_3;
  set UTP_REDBULL_DF_2(rename=(
                          total_dsp_display_imps=_src_total_dsp_display_imps
                          total_display_imps    =_src_total_display_imps
                          total_sp_clicks       =_src_total_sp_clicks
                          ));
  by luid
    dsp_display_cmli_id  
    display_cmli_id  
    sp_cmli_id  
  ;

  if first.luid then
    do;
      total_dsp_display_imps=0;
      total_display_imps=0;
      total_sp_clicks=0;
    end;

  retain
    total_dsp_display_imps
    total_display_imps
    total_sp_clicks
  ;

  *-- Counting total impressions and clicks by luid --*;
  dsp_display_prev=lag(dsp_display_cmli_id);
  if first.luid or upcase(dsp_display_cmli_id) ^=upcase(dsp_display_prev) then
    total_dsp_display_imps+1;

  display_prev =lag(display_cmli_id);
  if first.luid or display_cmli_id ^=display_prev then
    total_display_imps+1;

  sp_prev =lag(sp_cmli_id);
  if first.luid or sp_cmli_id ^=sp_prev then
    total_sp_clicks+1;

  keep  luid
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id

    dsp_display_prev
    display_prev
    sp_prev

    total_display_imps  
    total_dsp_display_imps  
    total_sp_clicks  
  ;
run;

 

I also assume that you intended in below code section to call your variable total_dsp_display_imps and not dsp_total_...

Patrick_0-1696038051735.png

Below a variant for above code that should return the same result.

data UTP_REDBULL_DF_3;
  set UTP_REDBULL_DF_2(drop=total_:);
  by luid
    dsp_display_cmli_id  
    display_cmli_id  
    sp_cmli_id  
  ;

  if first.luid then
    do;
      total_dsp_display_imps=0;
      total_display_imps=0;
      total_sp_clicks=0;
    end;

  *-- Counting total impressions and clicks by luid --*;
  _check= upcase(dsp_display_cmli_id) ^=upcase(lag(dsp_display_cmli_id));
  if first.luid or _check  then
    total_dsp_display_imps+1;

  _check= display_cmli_id ^=lag(display_cmli_id);
  if first.luid or _check then
    total_display_imps+1;

  _check= sp_cmli_id ^=lag(sp_cmli_id);
  if first.luid or _check then
    total_sp_clicks+1;

  keep  luid
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id

    dsp_display_prev
    display_prev
    sp_prev

    total_display_imps  
    total_dsp_display_imps  
    total_sp_clicks  
  ;
run;

 

Kaushansky
Obsidian | Level 7

Thank you, the latest code worked (below). I still do not understand why my original code could not match the dsp_display_cmli_id to the prev variable? Also what is 

(drop=total_:);

Meant to do? I took it out since it was giving me an error.

 

Quentin
Super User

@Kaushansky wrote:

 Also what is 

(drop=total_:);

Meant to do? I took it out since it was giving me an error.

 


Patrick had guessed that maybe your dataset UTP_REDBULL_DF_2 already has variables named total_dsp_display_imps etc.  Is that correct? If so, you don't want to read them in the DATA step, because they will collide with the variables you are creating in the step as accumulators.  Therefore drop= will not read in any variables that start with total_  . But if this was throwing an error when you ran it, that suggests these variables don't actually exist in UTP_REDBULL_DF_2. So Patrick's guess as to the cause of the problem may have been wrong, but sounds like the code Patrick wrote is right. : )

 

 

data UTP_REDBULL_DF_3;
  set UTP_REDBULL_DF_2(drop=total_:);

 

 

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Patrick
Opal | Level 21

@Kaushansky 

I've reject the .csv you've shared and it doesn't contain any variables names starting with total_* Looks like I've got "confused" when testing my code. Just remove this portion of the code as you've already done.

 

I still do not understand why my original code could not match the dsp_display_cmli_id to the prev variable?

That's because there are TWO variables. The one you use for aggregation is called dsp_total_display_imps, the one you initialize with zero and use in your retain and keep list is called total_dsp_display_imps.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 958 views
  • 5 likes
  • 5 in conversation