BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Greek
Obsidian | Level 7

Hello Everyone,

 

I thought I had posted this on Friday, but I can’t find it anywhere, so I am reposting.

 

I have the following table sorted by count_cluster. I want to create a new variable named Max_Salary_previous_grade, which will conditionally take the value of max salary OF THE PREVIOUS ROW whenever the number of the count_cluster is repeated. I used the following code with the lag function, but I get weird results L

 

I listed below my code and results for your convenience.

 

Does anybody know what’s wrong with the lag function? Do you have any ideas of how could I do this differently?

 

Thank you,

 

 

 

Data data_set;

input count_cluster     level Annual_Salary     Max_salary;

datalines;

1     1     56768 56768

1     3     67810 70741

2     3     49238 49957

3     3     47122 50000

4     1     34360 34360

4     2     49790 49790

4     3     47942 53220

5     3     162736      162736

6     2     51661 51661

7     3     69607 69607

run;

 

 

data data_set;

set data_set;

if count_cluster=lag1(count_cluster) then Max_Salary_previous_grade=lag1(max_salary); else Max_Salary_previous_grade=0;

run;

 

proc print data=data_set;

run;

 

count_cluster level Annual_Salary Max_salary Max_Salary_previous_grade
1 1 56768 56768 0
1 3 67810 70741 . (This should have been 56768, but it is missing)
2 3 49238 49957 0
3 3 47122 50000 0
4 1 34360 34360 0
4 2 49790 49790 70741 (This should have been 34360)
4 3 47942 53220 49790 (this is correct)
5 3 162736 162736 0
6 2 51661 51661 0
7 3 69607 69607 0
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.

 

One quick solution:

 

data data_set;

set data_set;

Lcount = lag(coutn_cluster);

LSalary = lag(max_salary);

if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;

else Max_Salary_previous_grade=0;

drop Lcount Lsalary;

run;

 

And a style comment, though this may have been for examples only:

Habitualy use of

Data setname;

   Set SetName;

...

 

will eventually cause a problem that takes a long time to debug/correct.

View solution in original post

7 REPLIES 7
ballardw
Super User

There are lots of complexities as to which stream of values is involved with the lag function when used in IF statements.

 

One quick solution:

 

data data_set;

set data_set;

Lcount = lag(coutn_cluster);

LSalary = lag(max_salary);

if count_cluster=Lcount then Max_Salary_previous_grade=LSalary;

else Max_Salary_previous_grade=0;

drop Lcount Lsalary;

run;

 

And a style comment, though this may have been for examples only:

Habitualy use of

Data setname;

   Set SetName;

...

 

will eventually cause a problem that takes a long time to debug/correct.

Greek
Obsidian | Level 7

Thank you so much! 🙂

PaigeMiller
Diamond | Level 26

You can't use LAG in an IF statement the way you did.

 

You must use LAG on every row, not in an IF statement.

 

So, perhaps something like this

 

prev_cluster=lag(count_cluster);

prev_salary=lag(max_salary);

if prev_cluster=count_cluster then max_salary_previous_grade = .... (whatever you want);

else max_salary_previous_grade= ... (whatever you want);

 

--
Paige Miller
jakarman
Barite | Level 11

You can use a lag fucntion within conditions but remember:

--> it is a queue function   and that is quite different to the lag function added in sql 2008

The fucntion in SAS has a name with a wrong expectation 

---->-- ja karman --<-----
MikeZdeb
Rhodochrosite | Level 12

Hi, you can use CONDITIONAL LAGs with an IFN statement (less SAS code needed) ...

 

data new;
set data_set;
Max_Salary_previous_grade = ifn(count_cluster eq lag1(count_cluster),lag1(max_salary),0);
run;

 

new data set ...

                               Max_Salary_
    count_        Annual_ Max_ previous_
Obs cluster level Salary salary grade

1      1      1    56768   56768      0
2      1      3    67810   70741  56768
3      2      3    49238   49957      0
4      3      3    47122   50000      0
5      4      1    34360   34360      0
6      4      2    49790   49790  34360
7      4      3    47942   53220  49790
8      5      3   162736   162736     0
9      6      2    51661    51661     0
10     7      3    69607    69607     0

 

To see why the lags work in that code, read this paper by Howard Schreier  ...

 

Conditional Lags Don't Have to be Treacherous

http://www.howles.com/saspapers/CC33.pdf

 

Greek
Obsidian | Level 7

It worked well in mine... There is a typo which I corrected and it gave me the results

 

Lcount = lag(coutn_cluster);

 

should be 

 

Lcount = lag(count_cluster);

MikeZdeb
Rhodochrosite | Level 12
the posted solution had a variable misspelled, didn't notice at first, fixed my reply ... my mistake

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1430 views
  • 4 likes
  • 5 in conversation