Hello Everyone,
I thought I had posted this on Friday, but I can’t find it anywhere, so I am reposting.
I have the following table sorted by count_cluster. I want to create a new variable named Max_Salary_previous_grade, which will conditionally take the value of max salary OF THE PREVIOUS ROW whenever the number of the count_cluster is repeated. I used the following code with the lag function, but I get weird results L
I listed below my code and results for your convenience.
Does anybody know what’s wrong with the lag function? Do you have any ideas of how could I do this differently?
Thank you,
Data data_set;
input count_cluster level Annual_Salary Max_salary;
datalines;
1 1 56768 56768
1 3 67810 70741
2 3 49238 49957
3 3 47122 50000
4 1 34360 34360
4 2 49790 49790
4 3 47942 53220
5 3 162736 162736
6 2 51661 51661
7 3 69607 69607
run;
data data_set;
set data_set;
if count_cluster=lag1(count_cluster) then Max_Salary_previous_grade=lag1(max_salary); else Max_Salary_previous_grade=0;
run;
proc print data=data_set;
run;
count_cluster | level | Annual_Salary | Max_salary | Max_Salary_previous_grade |
1 | 1 | 56768 | 56768 | 0 |
1 | 3 | 67810 | 70741 | . (This should have been 56768, but it is missing) |
2 | 3 | 49238 | 49957 | 0 |
3 | 3 | 47122 | 50000 | 0 |
4 | 1 | 34360 | 34360 | 0 |
4 | 2 | 49790 | 49790 | 70741 (This should have been 34360) |
4 | 3 | 47942 | 53220 | 49790 (this is correct) |
5 | 3 | 162736 | 162736 | 0 |
6 | 2 | 51661 | 51661 | 0 |
7 | 3 | 69607 | 69607 | 0 |
There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.
One quick solution:
data data_set;
set data_set;
Lcount = lag(coutn_cluster);
LSalary = lag(max_salary);
if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;
else Max_Salary_previous_grade=0;
drop Lcount Lsalary;
run;
And a style comment, though this may have been for examples only:
Habitualy use of
Data setname;
Set SetName;
...
will eventually cause a problem that takes a long time to debug/correct.
There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.
One quick solution:
data data_set;
set data_set;
Lcount = lag(coutn_cluster);
LSalary = lag(max_salary);
if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;
else Max_Salary_previous_grade=0;
drop Lcount Lsalary;
run;
And a style comment, though this may have been for examples only:
Habitualy use of
Data setname;
Set SetName;
...
will eventually cause a problem that takes a long time to debug/correct.
Thank you so much! 🙂
You can't use LAG in an IF statement the way you did.
You must use LAG on every row, not in an IF statement.
So, perhaps something like this
prev_cluster=lag(count_cluster);
prev_salary=lag(max_salary);
if prev_cluster=count_cluster then max_salary_previous_grade = .... (whatever you want);
else max_salary_previous_grade= ... (whatever you want);
You can use a lag fucntion within conditions but remember:
--> it is a queue function and that is quite different to the lag function added in sql 2008
The fucntion in SAS has a name with a wrong expectation
Hi, you can use CONDITIONAL LAGs with an IFN statement (less SAS code needed) ...
data new;
set data_set;
Max_Salary_previous_grade = ifn(count_cluster eq lag1(count_cluster),lag1(max_salary),0);
run;
new data set ...
Max_Salary_
count_ Annual_ Max_ previous_
Obs cluster level Salary salary grade
1 1 1 56768 56768 0
2 1 3 67810 70741 56768
3 2 3 49238 49957 0
4 3 3 47122 50000 0
5 4 1 34360 34360 0
6 4 2 49790 49790 34360
7 4 3 47942 53220 49790
8 5 3 162736 162736 0
9 6 2 51661 51661 0
10 7 3 69607 69607 0
To see why the lags work in that code, read this paper by Howard Schreier ...
Conditional Lags Don't Have to be Treacherous
http://www.howles.com/saspapers/CC33.pdf
It worked well in mine... There is a typo which I corrected and it gave me the results
Lcount = lag(coutn_cluster);
should be
Lcount = lag(count_cluster);
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.