DATA Step, Macro, Functions and more

Lag function malfunction

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Lag function malfunction

[ Edited ]

Hello Everyone,

 

I thought I had posted this on Friday, but I can’t find it anywhere, so I am reposting.

 

I have the following table sorted by count_cluster. I want to create a new variable named Max_Salary_previous_grade, which will conditionally take the value of max salary OF THE PREVIOUS ROW whenever the number of the count_cluster is repeated. I used the following code with the lag function, but I get weird results L

 

I listed below my code and results for your convenience.

 

Does anybody know what’s wrong with the lag function? Do you have any ideas of how could I do this differently?

 

Thank you,

 

 

 

Data data_set;

input count_cluster     level Annual_Salary     Max_salary;

datalines;

1     1     56768 56768

1     3     67810 70741

2     3     49238 49957

3     3     47122 50000

4     1     34360 34360

4     2     49790 49790

4     3     47942 53220

5     3     162736      162736

6     2     51661 51661

7     3     69607 69607

run;

 

 

data data_set;

set data_set;

if count_cluster=lag1(count_cluster) then Max_Salary_previous_grade=lag1(max_salary); else Max_Salary_previous_grade=0;

run;

 

proc print data=data_set;

run;

 

count_cluster level Annual_Salary Max_salary Max_Salary_previous_grade
1 1 56768 56768 0
1 3 67810 70741 . (This should have been 56768, but it is missing)
2 3 49238 49957 0
3 3 47122 50000 0
4 1 34360 34360 0
4 2 49790 49790 70741 (This should have been 34360)
4 3 47942 53220 49790 (this is correct)
5 3 162736 162736 0
6 2 51661 51661 0
7 3 69607 69607 0

Accepted Solutions
Solution
‎09-25-2015 06:23 AM
Super User
Posts: 11,343

Re: Lag function malfunction

There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.

 

One quick solution:

 

data data_set;

set data_set;

Lcount = lag(coutn_cluster);

LSalary = lag(max_salary);

if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;

else Max_Salary_previous_grade=0;

drop Lcount Lsalary;

run;

 

And a style comment, though this may have been for examples only:

Habitualy use of

Data setname;

   Set SetName;

...

 

will eventually cause a problem that takes a long time to debug/correct.

View solution in original post


All Replies
Solution
‎09-25-2015 06:23 AM
Super User
Posts: 11,343

Re: Lag function malfunction

There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.

 

One quick solution:

 

data data_set;

set data_set;

Lcount = lag(coutn_cluster);

LSalary = lag(max_salary);

if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;

else Max_Salary_previous_grade=0;

drop Lcount Lsalary;

run;

 

And a style comment, though this may have been for examples only:

Habitualy use of

Data setname;

   Set SetName;

...

 

will eventually cause a problem that takes a long time to debug/correct.

Contributor
Posts: 45

Re: Lag function malfunction

Thank you so much! Smiley Happy

Trusted Advisor
Posts: 1,909

Re: Lag function malfunction

You can't use LAG in an IF statement the way you did.

 

You must use LAG on every row, not in an IF statement.

 

So, perhaps something like this

 

prev_cluster=lag(count_cluster);

prev_salary=lag(max_salary);

if prev_cluster=count_cluster then max_salary_previous_grade = .... (whatever you want);

else max_salary_previous_grade= ... (whatever you want);

 

Trusted Advisor
Posts: 3,212

Re: Lag function malfunction

Posted in reply to PaigeMiller

You can use a lag fucntion within conditions but remember:

--> it is a queue function   and that is quite different to the lag function added in sql 2008

The fucntion in SAS has a name with a wrong expectation 

---->-- ja karman --<-----
Valued Guide
Posts: 765

Re: Lag function malfunction

[ Edited ]

Hi, you can use CONDITIONAL LAGs with an IFN statement (less SAS code needed) ...

 

data new;
set data_set;
Max_Salary_previous_grade = ifn(count_cluster eq lag1(count_cluster),lag1(max_salary),0);
run;

 

new data set ...

                               Max_Salary_
    count_        Annual_ Max_ previous_
Obs cluster level Salary salary grade

1      1      1    56768   56768      0
2      1      3    67810   70741  56768
3      2      3    49238   49957      0
4      3      3    47122   50000      0
5      4      1    34360   34360      0
6      4      2    49790   49790  34360
7      4      3    47942   53220  49790
8      5      3   162736   162736     0
9      6      2    51661    51661     0
10     7      3    69607    69607     0

 

To see why the lags work in that code, read this paper by Howard Schreier  ...

 

Conditional Lags Don't Have to be Treacherous

http://www.howles.com/saspapers/CC33.pdf

 

Contributor
Posts: 45

Re: Lag function malfunction

It worked well in mine... There is a typo which I corrected and it gave me the results

 

Lcount = lag(coutn_cluster);

 

should be 

 

Lcount = lag(count_cluster);

Valued Guide
Posts: 765

Re: Lag function malfunction

the posted solution had a variable misspelled, didn't notice at first, fixed my reply ... my mistake
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 322 views
  • 4 likes
  • 5 in conversation