Hello Everyone,
I thought I had posted this on Friday, but I can’t find it anywhere, so I am reposting.
I have the following table sorted by count_cluster. I want to create a new variable named Max_Salary_previous_grade, which will conditionally take the value of max salary OF THE PREVIOUS ROW whenever the number of the count_cluster is repeated. I used the following code with the lag function, but I get weird results L
I listed below my code and results for your convenience.
Does anybody know what’s wrong with the lag function? Do you have any ideas of how could I do this differently?
Thank you,
Data data_set;
input count_cluster level Annual_Salary Max_salary;
datalines;
1 1 56768 56768
1 3 67810 70741
2 3 49238 49957
3 3 47122 50000
4 1 34360 34360
4 2 49790 49790
4 3 47942 53220
5 3 162736 162736
6 2 51661 51661
7 3 69607 69607
run;
data data_set;
set data_set;
if count_cluster=lag1(count_cluster) then Max_Salary_previous_grade=lag1(max_salary); else Max_Salary_previous_grade=0;
run;
proc print data=data_set;
run;
count_cluster | level | Annual_Salary | Max_salary | Max_Salary_previous_grade |
1 | 1 | 56768 | 56768 | 0 |
1 | 3 | 67810 | 70741 | . (This should have been 56768, but it is missing) |
2 | 3 | 49238 | 49957 | 0 |
3 | 3 | 47122 | 50000 | 0 |
4 | 1 | 34360 | 34360 | 0 |
4 | 2 | 49790 | 49790 | 70741 (This should have been 34360) |
4 | 3 | 47942 | 53220 | 49790 (this is correct) |
5 | 3 | 162736 | 162736 | 0 |
6 | 2 | 51661 | 51661 | 0 |
7 | 3 | 69607 | 69607 | 0 |
There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.
One quick solution:
data data_set;
set data_set;
Lcount = lag(coutn_cluster);
LSalary = lag(max_salary);
if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;
else Max_Salary_previous_grade=0;
drop Lcount Lsalary;
run;
And a style comment, though this may have been for examples only:
Habitualy use of
Data setname;
Set SetName;
...
will eventually cause a problem that takes a long time to debug/correct.
There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.
One quick solution:
data data_set;
set data_set;
Lcount = lag(coutn_cluster);
LSalary = lag(max_salary);
if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;
else Max_Salary_previous_grade=0;
drop Lcount Lsalary;
run;
And a style comment, though this may have been for examples only:
Habitualy use of
Data setname;
Set SetName;
...
will eventually cause a problem that takes a long time to debug/correct.
Thank you so much! 🙂
You can't use LAG in an IF statement the way you did.
You must use LAG on every row, not in an IF statement.
So, perhaps something like this
prev_cluster=lag(count_cluster);
prev_salary=lag(max_salary);
if prev_cluster=count_cluster then max_salary_previous_grade = .... (whatever you want);
else max_salary_previous_grade= ... (whatever you want);
You can use a lag fucntion within conditions but remember:
--> it is a queue function and that is quite different to the lag function added in sql 2008
The fucntion in SAS has a name with a wrong expectation
Hi, you can use CONDITIONAL LAGs with an IFN statement (less SAS code needed) ...
data new;
set data_set;
Max_Salary_previous_grade = ifn(count_cluster eq lag1(count_cluster),lag1(max_salary),0);
run;
new data set ...
Max_Salary_
count_ Annual_ Max_ previous_
Obs cluster level Salary salary grade
1 1 1 56768 56768 0
2 1 3 67810 70741 56768
3 2 3 49238 49957 0
4 3 3 47122 50000 0
5 4 1 34360 34360 0
6 4 2 49790 49790 34360
7 4 3 47942 53220 49790
8 5 3 162736 162736 0
9 6 2 51661 51661 0
10 7 3 69607 69607 0
To see why the lags work in that code, read this paper by Howard Schreier ...
Conditional Lags Don't Have to be Treacherous
http://www.howles.com/saspapers/CC33.pdf
It worked well in mine... There is a typo which I corrected and it gave me the results
Lcount = lag(coutn_cluster);
should be
Lcount = lag(count_cluster);
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.