I'm having issues with string comparison with when it comes to lag function.
Say I have the following dataset:
DATA test;
input id code $;
cards;
1 abc
1 abc
2 xyz
2 xyy
2 xyy
2 zzz
3 ccc
3 ccc
;
PROC SORT;
by id code;
run;
Then I want to calculate how many distinct code values there are within each ID values, I use this program:
DATA test2;
SET test;
by id code;
RETAIN code_cnt 1;
ff = first.id;
lagged = lag1(code);
if first.id then code_cnt = 1;
else do;
if code ~= lag1(code) then code_cnt = code_cnt + 1;
end;
PROC PRINT;
run;
The resulting dataset looks like this:
Obs id code code_cnt ff lagged
1 | abc | 1 | 1 | |
1 | abc | 2 | 0 | abc |
2 | xyy | 1 | 1 | abc |
2 | xyy | 2 | 0 | xyy |
2 | xyz | 3 | 0 | xyy |
2 | zzz | 4 | 0 | xyz |
3 | ccc | 1 | 1 | zzz |
3 | ccc | 2 | 0 | ccc |
What is wrong?
You can't use lag in IF/THEN conditions, the behaviour isn't what you expect. It's a common issue when first starting out with the lag function. If you google the terms you'll find many paper/posts on the issue.
DATA test2;
SET test;
by id code;
RETAIN code_cnt 1;
ff = first.id;
lagged = lag1(code);
if first.id then code_cnt = 1;
else do;
if code ne lagged then code_cnt = code_cnt + 1;
end;
PROC PRINT;
run;
As mentioned PROC SQL is a better method here:
proc sql;
create table want as
select id, count(distinct code) as code_cnt
from test;
quit;
Or a SAS proc method:
proc sort data=test out=test2 nodupkey;
by id code;
run;
proc freq data=test2;
table ID/out=want_freq;
run;
proc print data=want_freq;
run;
EDIT: One more option - based on using FIRST since you've sorted and are using BY variables:
PROC SORT;
by id code;
run;
data want;
set test;
by id code;
if first.id then code_cnt=0;
if first.code then code_cnt+1;
*if last.id then output;
run;
I agree that proc sql is generally a better solution for most types of aggregation. But, if it must be in a data step for some reason, then it would look something like this:
DATA test;
input id code $;
cards;
1 abc
1 abc
2 xyz
2 xyy
2 xyy
2 zzz
3 ccc
3 ccc
;
PROC SORT;
by id code;
run;
DATA test2;
SET test;
by id code;
retain code_cnt;
if first.id then code_cnt = 0;
code_cnt = code_cnt + 1;
if last.id then output;
run;
PROC PRINT;
run;
This is probably what the proc sql will be doing in the background as well.
You can't use lag in IF/THEN conditions, the behaviour isn't what you expect. It's a common issue when first starting out with the lag function. If you google the terms you'll find many paper/posts on the issue.
DATA test2;
SET test;
by id code;
RETAIN code_cnt 1;
ff = first.id;
lagged = lag1(code);
if first.id then code_cnt = 1;
else do;
if code ne lagged then code_cnt = code_cnt + 1;
end;
PROC PRINT;
run;
As mentioned PROC SQL is a better method here:
proc sql;
create table want as
select id, count(distinct code) as code_cnt
from test;
quit;
Or a SAS proc method:
proc sort data=test out=test2 nodupkey;
by id code;
run;
proc freq data=test2;
table ID/out=want_freq;
run;
proc print data=want_freq;
run;
EDIT: One more option - based on using FIRST since you've sorted and are using BY variables:
PROC SORT;
by id code;
run;
data want;
set test;
by id code;
if first.id then code_cnt=0;
if first.code then code_cnt+1;
*if last.id then output;
run;
Thanks, your solutions (mostly) worked. In the PROC SQL you wanna do GROUP BY id as well.
The lag() function builds a FIFO chain that receives a value everytime the function is called. So calling the function in a conditional branch or a do loop will lead to unexpected behaviour.
@CJ_Jackson wrote:
thanks. This is a weird behavior. So if I first saved the lag1(code) as a new variable then I would be able to use that value in a conditional then, correct?
Yes, exactly as @Reeza did with the variable LAGGED in the first data step of the solution.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.