I am trying to use the following 4 columns to create and count new variables, using First. and Last. but I see that First. and Last. are somehow the same for the sorted variables as you can see in the temp variables and so I cannot use them to differentiate a calculation. Take a look at the PROC PRINT :
Your problem statement explains what doesn't work the way you want it to work. We also need to know what you are trying to do. Please explain.
For example:
I would like to only count 1 each time each one of the IDs is a first. E.g. First_search should only count as 1 since all of the ID values are the same, but the first. does not work in this case.
@Kaushansky wrote:
I would like to only count 1 each time each one of the IDs is a first. E.g. First_search should only count as 1 since all of the ID values are the same, but the first. does not work in this case.
FIRST. does not seem to be the right tool in this case. Perhaps you want something like this:
data want;
set have;
prev=lag(fp_clmi_id);
if fp_clmi_id ^= prev then first_search=1;
else first_search=0;
drop prev;
run;
Using this logic some of it works but not the dsp_total_display_imps counter?
See the code snippet and result:
Please show us the entire SAS data step, not a portion of it. Please format the data step for readability with at most one SAS command on a line, and indented properly. Please paste the SAS code into the window that appears when you click on the "little running man" icon.
Please provide data as SAS data step code (instructions and examples) and not as screen captures.
I'm not going to work with your data as a screen capture. I need the data provided as I explained.
Furthermore, you still have multiple SAS commands on a single line.
@Kaushansky Below your code in a better readable manner using menu
To "fix" your code and provide some tested code requires sample data to work with. Please provide such sample data that works with your code (best via a SAS data step that creates the data) so we can spend the time solving your problem. Don't expect that anyone is going to create such data for you based on a screenshot.
data &dataframe._1;
set &dataframe(where=(luid=&qa_luid));
by luid
order_id
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_imps_event_ts
display_imps_event_ts
sp_clicks_event_ts
trans_event_ts
;
if first.luid then
do;
total_dsp_display_imps=0;
total_display_imps=0;
total_sp_clicks=0;
retain
total_dsp_display_imps
total_display_imps
total_sp_clicks
;
end;
*-- Counting total impressions and clicks by luid --*;
dsp_display_prev=lag(dsp_display_cmli_id);
if dsp_display_cmli_id ^=dsp_display_prev then
dsp_total_display_imps+1;
display_prev =lag(display_cmli_id);
if display_cmli_id ^=display_prev then
total_display_imps+1;
sp_prev =lag(sp_cmli_id);
if sp_cmli_id ^=sp_prev then
total_sp_clicks+1;
keep luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_prev
display_prev
sp_prev
total_display_imps
total_dsp_display_imps
total_sp_clicks
;
run;
proc print data=&dataframe._1;
var luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_prev
display_prev
sp_prev
total_display_imps
total_dsp_display_imps
total_sp_clicks
;
where luid=&qa_luid;
run;
Attached is the data used to process the code above that does not work.
data UTP_REDBULL_DF_1;
set UTP_REDBULL_DF_1;
by luid
order_id
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_imps_event_ts
display_imps_event_ts
sp_clicks_event_ts
trans_event_ts
;
if first.luid then
do;
total_dsp_display_imps=0;
total_display_imps=0;
total_sp_clicks=0;
retain
total_dsp_display_imps
total_display_imps
total_sp_clicks
;
end;
*-- Counting total impressions and clicks by luid --*;
dsp_display_prev=lag(dsp_display_cmli_id);
if dsp_display_cmli_id ^=dsp_display_prev then
dsp_total_display_imps+1;
display_prev =lag(display_cmli_id);
if display_cmli_id ^=display_prev then
total_display_imps+1;
sp_prev =lag(sp_cmli_id);
if sp_cmli_id ^=sp_prev then
total_sp_clicks+1;
keep luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_prev
display_prev
sp_prev
total_display_imps
total_dsp_display_imps
total_sp_clicks
;
run;
proc print data=&dataframe._1;
var luid
dsp_display_cmli_id
display_cmli_id
sp_cmli_id
dsp_display_prev
display_prev
sp_prev
total_display_imps
total_dsp_display_imps
total_sp_clicks
;
where luid=&qa_luid;
run;
Please provide data that works with your code.
Also be aware that many people won't download Excels. A simple alternative way to provide such data is: Use the EG import wizard and select to embed the data. Then copy the generated code with the data in a .sas file and share it as an attachment.
Just based on what you've already shared see if below can take you a step closer to a resolution.
What data volumes are you dealing with? How many rows do you have in your source data?
And... is the result you're after a table or a report?
proc sort data=UTP_REDBULL_DF_1 out=UTP_REDBULL_DF_2;
by luid
display_cmli_id
dsp_display_cmli_id
sp_cmli_id
;
run;
data UTP_REDBULL_DF_2;
set UTP_REDBULL_DF_2;
by luid
display_cmli_id
dsp_display_cmli_id
sp_cmli_id
;
*-- Counting total impressions and clicks by luid --*;
if first.display_cmli_id then dist_cnt_display_cmli_id+1;
if first.dsp_display_cmli_id then dist_cnt_dsp_display_cmli_id+1;
if first.sp_cmli_id then dist_cnt_sp_cmli_id+1;
if last.luid then
do;
output;
call missing(of dist_cnt_:);
end;
run;
Okay, I will be more explicit.
The variable dsp_total_display_imps should count every time the PREVIOUS record is not the same? However, that does not work in this code. Please tell me what I am doing wrong. Attached is the dataset and below is the code.
Simplify, simplify, simplify. Please make some SIMPLE data that CLEARLY shows the problem you are trying to solve and the result you expect. Do not include more observations than you need to demonstrate the issue. Do not include more variables than you need. Use short simple names for the variables. Post the data as a data step so others can have your example to play with.
I do not know what
The variable dsp_total_display_imps should count every time the PREVIOUS record is not the same?
means.
If you want to count every time the WHOLE RECORD changes then include EVERY VARIABLE in the BY statement and just check the FIRST. flag for the last one.
So this example data has three distinct records (when processed in this current order).
data have ;
input luid :$10. var1 :$10. var2 :$8. var3 :$8. ;
cards;
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 b9rq6w4 1744739
;
And this code can find them.
data want ;
set have ;
by luid var1 var2 var3 var4 notsorted;
distinct_records + first.var4;
run;
Result:
distinct_ Obs luid var1 var2 var3 records 1 1027774874 6261781089 3g6gzrq 1744739 1 2 1027774874 6261781089 3g6gzrq 1744739 1 3 1027774874 6261781089 56vcsen 1744739 2 4 1027774874 6261781089 56vcsen 1744739 2 5 1027774874 6261781089 b9rq6w4 1744739 3
Is that what you are trying to do? Find out that this value of LUID has 3 distinct records?
If so then perhaps you want this data step?
data want ;
set have ;
by luid var1 var2 var3 notsorted;
distinct_records + first.var3;
if last.luid;
keep luid distinct_records;
run;
Results
distinct_ Obs luid records 1 1027774874 3
I would like each of the new variables (e.g. dsp_total_display_imps ) to count the unique records for each of the CLMI variables in the file.
So that is a different question and cannot be handled by either BY group processing nor by just checking if the current value is different than the previous value.
You can do this with SQL by using the DISTINCT keyword and the COUNT() function (although that will not count the missing values).
proc sql;
create table want as
select count(distinct var1) as n_var1
, count(distinct var2) as n_var2
, count(distinct_var3) as n_var3
;
quit;
Or if you want to do it with a data step will require a hash object (or perhaps a series of hash objects).
If the variables are all of the same type you might be able to use arrays and a single hash object to make the coding easier/shorter.
For example if the variables that you want to count the distinct values are all character with lengths less than or equal to 20 the code might look like this:
data want;
set have end=eof;
array orig [3] $20 var1 var2 var3;
array new [3] n_var1 n_var2 n_var3;
retain n_var1 0 n_var2 0 n_var3 0;
length value $20;
if _n_=1 then do;
declare hash h();
h.definekey('index','value');
h.definedata('index');
h.definedone();
end;
do index=1 to dim(orig);
value=orig[index];
if h.add() then new[index]+1;
end;
keep n_var1 n_var2 n_var3;
if eof;
run;
So using the example data from before:
data have ;
input luid :$10. var1 :$10. var2 :$8. var3 :$8. ;
cards;
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 3g6gzrq 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 56vcsen 1744739
1027774874 6261781089 b9rq6w4 1744739
;
The result is
Obs n_var1 n_var2 n_var3 1 4 2 4
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.