BookmarkSubscribeRSS Feed
Kaushansky
Obsidian | Level 7

I am trying to use the following 4 columns to create and count new variables, using First. and Last. but I see that First. and Last. are somehow the same for the sorted variables as you can see in the temp variables and so I cannot use them to differentiate a calculation. Take a look at the PROC PRINT :

 

Kaushansky_0-1694800244826.png

 

17 REPLIES 17
PaigeMiller
Diamond | Level 26

Your problem statement explains what doesn't work the way you want it to work. We also need to know what you are trying to do. Please explain.

--
Paige Miller
Kaushansky
Obsidian | Level 7

For example: 

I would like to only count 1 each time each one of the IDs is a first. E.g. First_search should only count as 1 since all of the ID values are the same, but the first. does not work in this case.

PaigeMiller
Diamond | Level 26

@Kaushansky wrote:

 

I would like to only count 1 each time each one of the IDs is a first. E.g. First_search should only count as 1 since all of the ID values are the same, but the first. does not work in this case.


FIRST. does not seem to be the right tool in this case. Perhaps you want something like this:

 

data want;
    set have;
    prev=lag(fp_clmi_id);
    if fp_clmi_id ^= prev then first_search=1;
    else first_search=0;
    drop prev;
run;
--
Paige Miller
Kaushansky
Obsidian | Level 7

Using this logic some of it works but not the dsp_total_display_imps counter?

 

See the code snippet and result:

dsp_display_prev=lag(dsp_display_cmli_id); if dsp_display_cmli_id ^= dsp_display_prev then dsp_total_display_imps+1;
    display_prev =lag(display_cmli_id); if display_cmli_id ^= display_prev then total_display_imps+1;
    sp_prev =lag(sp_cmli_id); if sp_cmli_id ^= sp_prev then total_sp_clicks+1;

 

Kaushansky_0-1694802972263.png

 

 

PaigeMiller
Diamond | Level 26

Please show us the entire SAS data step, not a portion of it. Please format the data step for readability with at most one SAS command on a line, and indented properly. Please paste the SAS code into the window that appears when you click on the "little running man" icon.

 

Please provide data as SAS data step code (instructions and examples) and not as screen captures.

--
Paige Miller
Kaushansky
Obsidian | Level 7
data &dataframe._1; set &dataframe(where=(luid=&qa_luid));
by luid 
order_id 
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_imps_event_ts
display_imps_event_ts 
sp_clicks_event_ts
trans_event_ts 
;
if first.luid then do;
 
total_dsp_display_imps=0;
total_display_imps=0;
total_sp_clicks=0;
 
retain
total_dsp_display_imps
total_display_imps
total_sp_clicks
;
end;
 
*-- Counting total impressions and clicks by luid --*;
dsp_display_prev=lag(dsp_display_cmli_id); if dsp_display_cmli_id ^=dsp_display_prev then dsp_total_display_imps+1;
    display_prev =lag(display_cmli_id); if display_cmli_id ^=display_prev then total_display_imps+1;
    sp_prev =lag(sp_cmli_id); if sp_cmli_id ^=sp_prev then total_sp_clicks+1;
 
 
keep  luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
 
    dsp_display_prev
    display_prev
    sp_prev
 
total_display_imps  
total_dsp_display_imps  
total_sp_clicks  
;  
run;
 
proc print data=&dataframe._1;
var luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
 
    dsp_display_prev
    display_prev
    sp_prev
 
total_display_imps  
total_dsp_display_imps  
total_sp_clicks  
 
;
where luid=&qa_luid;
run;
 
Kaushansky_0-1694808533846.png

 

PaigeMiller
Diamond | Level 26

I'm not going to work with your data as a screen capture. I need the data provided as I explained.

 

Furthermore, you still have multiple SAS commands on a single line.

--
Paige Miller
Patrick
Opal | Level 21

@Kaushansky Below your code in a better readable manner using menu Patrick_0-1694837769532.png

To "fix" your code and provide some tested code requires sample data to work with. Please provide such sample data that works with your code (best via a SAS data step that creates the data) so we can spend the time solving your problem. Don't expect that anyone is going to create such data for you based on a screenshot.

data &dataframe._1;
  set &dataframe(where=(luid=&qa_luid));
  by luid 
    order_id 
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id
    dsp_display_imps_event_ts
    display_imps_event_ts 
    sp_clicks_event_ts
    trans_event_ts 
  ;

  if first.luid then
    do;
      total_dsp_display_imps=0;
      total_display_imps=0;
      total_sp_clicks=0;
      retain
        total_dsp_display_imps
        total_display_imps
        total_sp_clicks
      ;
    end;

  *-- Counting total impressions and clicks by luid --*;
  dsp_display_prev=lag(dsp_display_cmli_id);

  if dsp_display_cmli_id ^=dsp_display_prev then
    dsp_total_display_imps+1;
  display_prev =lag(display_cmli_id);

  if display_cmli_id ^=display_prev then
    total_display_imps+1;
  sp_prev =lag(sp_cmli_id);

  if sp_cmli_id ^=sp_prev then
    total_sp_clicks+1;
  keep  luid
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id

    dsp_display_prev
    display_prev
    sp_prev

    total_display_imps  
    total_dsp_display_imps  
    total_sp_clicks  
  ;
run;

proc print data=&dataframe._1;
  var luid
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id

    dsp_display_prev
    display_prev
    sp_prev

    total_display_imps  
    total_dsp_display_imps  
    total_sp_clicks  

  ;
  where luid=&qa_luid;
run;
Kaushansky
Obsidian | Level 7

Attached is the data used to process the code above that does not work.

 

data UTP_REDBULL_DF_1;
  set UTP_REDBULL_DF_1;
  by luid 
    order_id 
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id
    dsp_display_imps_event_ts
    display_imps_event_ts 
    sp_clicks_event_ts
    trans_event_ts 
  ;

  if first.luid then
    do;
      total_dsp_display_imps=0;
      total_display_imps=0;
      total_sp_clicks=0;
      retain
        total_dsp_display_imps
        total_display_imps
        total_sp_clicks
      ;
    end;

  *-- Counting total impressions and clicks by luid --*;
  dsp_display_prev=lag(dsp_display_cmli_id);

  if dsp_display_cmli_id ^=dsp_display_prev then
    dsp_total_display_imps+1;
  display_prev =lag(display_cmli_id);

  if display_cmli_id ^=display_prev then
    total_display_imps+1;
  sp_prev =lag(sp_cmli_id);

  if sp_cmli_id ^=sp_prev then
    total_sp_clicks+1;
  keep  luid
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id

    dsp_display_prev
    display_prev
    sp_prev

    total_display_imps  
    total_dsp_display_imps  
    total_sp_clicks  
  ;
run;

proc print data=&dataframe._1;
  var luid
    dsp_display_cmli_id
    display_cmli_id
    sp_cmli_id

    dsp_display_prev
    display_prev
    sp_prev

    total_display_imps  
    total_dsp_display_imps  
    total_sp_clicks  

  ;
  where luid=&qa_luid;
run;
Patrick
Opal | Level 21

Please provide data that works with your code.

Patrick_0-1694914913916.png

Patrick_1-1694914938887.png

 

Also be aware that many people won't download Excels. A simple alternative way to provide such data is: Use the EG import wizard and select to embed the data. Then copy the generated code with the data in a .sas file and share it as an attachment.

Patrick_2-1694915067928.png

 

Just based on what you've already shared see if below can take you a step closer to a resolution.

What data volumes are you dealing with? How many rows do you have in your source data?

And... is the result you're after a table or a report?

proc sort data=UTP_REDBULL_DF_1 out=UTP_REDBULL_DF_2;
  by luid 
    display_cmli_id
    dsp_display_cmli_id
    sp_cmli_id
    ;
run;
 
data UTP_REDBULL_DF_2;
  set UTP_REDBULL_DF_2;
  by luid 
    display_cmli_id
    dsp_display_cmli_id
    sp_cmli_id
    ;

  *-- Counting total impressions and clicks by luid --*;
  if first.display_cmli_id      then dist_cnt_display_cmli_id+1;
  if first.dsp_display_cmli_id  then dist_cnt_dsp_display_cmli_id+1;
  if first.sp_cmli_id           then dist_cnt_sp_cmli_id+1;

  if last.luid then 
    do;
      output;
      call missing(of dist_cnt_:);
    end;
run;

 

Kaushansky
Obsidian | Level 7

Okay, I will be more explicit.

The variable dsp_total_display_imps should count every time the PREVIOUS record is not the same? However, that does not work in this code. Please tell me what I am doing wrong. Attached is the dataset and below is the code.

 

data UTP_REDBULL_DF_2; set UTP_REDBULL_DF_1
(where=(luid=&qa_luid)
keep=luid dsp_display_cmli_id display_cmli_id sp_cmli_id);
by luid;
if first.luid then do;
 
total_dsp_display_imps=0;
total_display_imps=0;
total_sp_clicks=0;
 
end;
retain
total_dsp_display_imps
total_display_imps
total_sp_clicks
;
 
*-- Counting total impressions and clicks by luid --*;
dsp_display_prev=lag(dsp_display_cmli_id); if dsp_display_cmli_id ^=dsp_display_prev then dsp_total_display_imps+1;
    display_prev =lag(display_cmli_id); if display_cmli_id ^=display_prev then total_display_imps+1;
    sp_prev =lag(sp_cmli_id); if sp_cmli_id ^=sp_prev then total_sp_clicks+1;
 
 
keep  luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
 
    dsp_display_prev
    display_prev
    sp_prev
 
total_display_imps  
total_dsp_display_imps  
total_sp_clicks  
;  
run;
 
 
Tom
Super User Tom
Super User

Simplify, simplify, simplify.    Please make some SIMPLE data that CLEARLY shows the problem you are trying to solve and the result you expect.  Do not include more observations than you need to demonstrate the issue.  Do not include more variables than you need.  Use short simple names for the variables.  Post the data as a data step so others can have your example to play with.

 

I do not know what 

The variable dsp_total_display_imps should count every time the PREVIOUS record is not the same?

means.

 

If you want to count every time the WHOLE RECORD changes then include EVERY VARIABLE in the BY statement and just check the FIRST. flag for the last one.

 

So this example data has three distinct records (when processed in this current order).

data have ;
  input luid :$10. var1 :$10. var2 :$8. var3 :$8. ;
cards;
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 b9rq6w4 1744739
;

And this code can find them.

data want ;
  set have ;
  by luid var1 var2 var3 var4 notsorted;
  distinct_records + first.var4;
run;

Result:

                                                         distinct_
Obs       luid          var1        var2       var3       records

 1     1027774874    6261781089    3g6gzrq    1744739        1
 2     1027774874    6261781089    3g6gzrq    1744739        1
 3     1027774874    6261781089    56vcsen    1744739        2
 4     1027774874    6261781089    56vcsen    1744739        2
 5     1027774874    6261781089    b9rq6w4    1744739        3

Is that what you are trying to do?  Find out that this value of LUID has 3 distinct records?  

If so then perhaps you want this data step?

data want ;
  set have ;
  by luid var1 var2 var3  notsorted;
  distinct_records + first.var3;
  if last.luid;
  keep luid distinct_records;
run;

Results

                     distinct_
Obs       luid        records

 1     1027774874        3

 

 

 

Kaushansky
Obsidian | Level 7
Per my earlier example. My question is why doesn’t the LAG function work properly. As you can see in the data step and the output. The dsp_display_prev=lag(dsp_display_cmli_id); if dsp_display_cmli_id ^=dsp_display_prev then dsp_total_display_imps+1;

dsp_total_display_imps Should be adding +1 when the previous record is different BUT it is not and all the values are zero? The solution you provided below does not solve it. My ask is as follows:

I would like each of the new variables (e.g. dsp_total_display_imps ) to count the unique records for each of the CLMI variables in the file.

Tom
Super User Tom
Super User

I would like each of the new variables (e.g. dsp_total_display_imps ) to count the unique records for each of the CLMI variables in the file.

So that is a different question and cannot be handled by either BY group processing nor by just checking if the current value is different than the previous value.

 

You can do this with SQL by using the DISTINCT keyword and the COUNT() function (although that will not count the missing values).

proc sql;
create table want as
select count(distinct var1) as n_var1
     , count(distinct var2) as n_var2
     , count(distinct_var3) as n_var3
;
quit;

Or if you want to do it with a data step will require a hash object (or perhaps a series of hash objects).

 

If the variables are all of the same type you might be able to use arrays and a single hash object to make the coding easier/shorter.

For example if the variables that you want to count the distinct values are all character with lengths less than or equal to 20 the code might look like this:

data want;
  set have end=eof;
  array orig [3] $20 var1 var2 var3;
  array new [3] n_var1 n_var2 n_var3;
  retain n_var1 0 n_var2 0 n_var3 0;
  length value $20;
  if _n_=1 then do;
     declare hash h();
     h.definekey('index','value');
     h.definedata('index');
     h.definedone();
  end;
  do index=1 to dim(orig);
    value=orig[index];
    if h.add() then new[index]+1;
  end;
  keep n_var1 n_var2 n_var3;
  if eof;
run;

So using the example data from before:

data have ;
  input luid :$10. var1 :$10. var2 :$8. var3 :$8. ;
cards;
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 b9rq6w4 1744739
;

The result is 

Obs    n_var1    n_var2    n_var3

 1        4         2         4

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2657 views
  • 4 likes
  • 4 in conversation