I would like to count the number of runs in a dataset, but I believe that my code has a hole in it.
I had thought to use:
proc sort data=stuff; by key id;
run;
data runs;
set stuff; by key;
retain runs;
if first.key then runs=1;
else if id ne lag(id) then runs+1;
if last.key then output;
run;
However, it appears that lag(id) is not set to missing when the BY variable key changes. Am I looking at this wrong? Is there a faster/better way to do it?
Thanks!
Never use lag() within an if condition as it only updates the lag value when executed - so the function needs to execute in every single iteration of the data step.
and as already posted: there is no need to use lag()
You don't need lag.
BY KEY ID;
if first.key then runs=0;
if first.id then runs+1;
if last.key then output;
Just need a slight change
data runs;
set stuff;
by key;
delta=id-lag(id);
if first.key then runs=1;
if not first.key then do;
if delta ne 0 then runs+1;
end;
if last.key then output;
run;
Never use lag() within an if condition as it only updates the lag value when executed - so the function needs to execute in every single iteration of the data step.
and as already posted: there is no need to use lag()
Hi:
Patrick is correct. Please refer to this Tech Support note that demonstrates that using lag conditionally will probably result in undesired results: 24665 - Use the LAG function to conditionally carry information down a data set
If you run the code in that note, the approach that creates variable is the approach to use. As the note indicates (in the code), the LAG function stores values on the queue only when it is called, so you must call LAG unconditionally (outside the IF condition) to get the correct lagged value. And this paper has a longer explanation of the usage http://support.sas.com/resources/papers/proceedings09/055-2009.pdf.
cynthia
At the risk of being obnoxious, I dare say 'never say never' . I can't think of any practical usage of conditional lag(), except I have had an rather unique request from a customer. say I have some balance record for checking accounts, they want to know the amount of the LAST over-draft if there is an existing one. Of course, there are many ways to do it, but lag() is also handy.
data have;
input balance @@;
cards;
100 -120 200 210 25 -35 400 100 20 -200
;
run;
data want;
set have;
if balance <0 then
last_overdraft=lag(balance);
run;
I guess the take-home message is that Being extremely cautious when using lag() under condition, use it only if you know what you are expecting.
My2cents,
Haikuo
@Patrick - Thanks so much for your input on lag function. This is very helpful!
Thanks to everybody who replied -- this is very useful information.
And just to pile on with more LAG anecdotes and literature, I offer this blog post:
Pitfalls of the LAG function - The SAS Dummy
Echoing the other responses: if thinking through LAG makes your head hurt (as it does mine) and you can avoid using it...do. FIRST-dot and LAST-dot are powerful DATA step features that can solve many scenarios with elegance.
Chris
No need for headaches for using lag it is more the common queue function http://en.wikipedia.org/wiki/Queue_(abstract_data_type)
With the implementation of that you see some problems as of the pull/push and sizing. SAS development could have done this in a better way.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.