Hi
I have a table (simplified for this) which has 3 columns. An ID column, a date column and a score column. I'll call this table "Have"
e.g.
ID date score
A 01/01/2015 20
A 02/01/2015 30
B 01/01/2015 20
B 02/01/2015 20
B 02/01/2015 30
I want to add a column which identifies whether the scores have increased over time. I have tried a lag function like so;
data want;
set have;
length flag $20.;
if ID= lag(ID) and score> lag(score) then flag = 'Increased';
else if ID= lag(ID) and score< lag(score) then flag = 'Decreased';
else if ID= lag(ID) then flag = 'No Change'
else flag = '';
run;
This seems to work on all cases except when the dates are the same, So I end up with the following (note the entry at the bottom should have increased)
ID date score flag
A 01/01/2015 20
A 02/01/2015 30 Increased
B 01/01/2015 20
B 02/01/2015 20 No change
B 02/01/2015 30
Is there something I'm doing wrong? (first time using the lag function)
You want to use BY ID; to process the data in groups. You can do it with LAG but that is too hard. Then you are just left with DIF functioin instead of LAG. Set DIF to missing at first.id.
data score;
input ID:$1. date:mmddyy. score;
format date mmddyy.;
cards;
A 01/01/2015 20
A 02/01/2015 30
B 01/01/2015 20
B 02/01/2015 20
B 02/01/2015 30
C 01/01/2015 20
C 02/01/2015 30
C 01/01/2015 20
C 02/01/2015 20
C 02/01/2015 30
;;;;
run;
data score2;
set score;
by id;
delta = sign(dif(score));
if first.id then delta=.;
run;
proc print;
run;
Your code have nothing to do with the date variable.
You forgot the semicolon in the third if statement.
The missing semicolon is a typo when copying across.
That's why I'm confused though, my query has nothing to do with the date variable so why hasn't the flag been populated for the bottom row? When I look at my full table it is the same every time the date is the same. I have sorted the data using ID and date but that's because I want to see changes over time for each person.
You want to use BY ID; to process the data in groups. You can do it with LAG but that is too hard. Then you are just left with DIF functioin instead of LAG. Set DIF to missing at first.id.
data score;
input ID:$1. date:mmddyy. score;
format date mmddyy.;
cards;
A 01/01/2015 20
A 02/01/2015 30
B 01/01/2015 20
B 02/01/2015 20
B 02/01/2015 30
C 01/01/2015 20
C 02/01/2015 30
C 01/01/2015 20
C 02/01/2015 20
C 02/01/2015 30
;;;;
run;
data score2;
set score;
by id;
delta = sign(dif(score));
if first.id then delta=.;
run;
proc print;
run;
Thanks, I did not know the dif function.
Still not sure why my original didn't work but this does so thanks.
You may also want to consider a slight modification to the code,
data have; input ID $ date $ score; datalines; A 01/01/2015 20 A 02/01/2015 30 B 01/01/2015 20 B 02/01/2015 20 B 02/01/2015 30 ; run; data want; set have; length flag $20.; if id=lag(id) then do; select; when(score > lag(score)) flag="Increased"; when(score < lag(score)) flag="Decreased"; otherwise flag="No change"; end; end; run;
The above illustrates the select clause which is useful for multiple if statements and when wrapped in the if statement clearly shows it only happens when id matches previous instance.
Curious question.
What is the difference between
if id=lag(id) then do; select; when(score > lag(score)) flag="Increased"; when(score < lag(score)) flag="Decreased"; otherwise flag="No change";
and
if id=lag(id) then do; if(score > lag(score)) flag="Increased"; else if(score < lag(score)) flag="Decreased"; else flag="No change";
Well, there is not a lot of difference between the two logically as posted there - but then you have changed the statements when you posted this (note the then is missing from your if's here). In your previous example:
if ID= lag(ID) and score> lag(score) then flag = 'Increased'; else if ID= lag(ID) and score< lag(score) then flag = 'Decreased'; else if ID= lag(ID) then flag = 'No Change' else flag = '';
The id=lag(id) part was duplicated several times. That is one point. The second point is on readability, so for two options then its not much difference, however when you have lots of conditions, it is far easier to read the select statement rather than a lot of if/if else statements, you can also follow down the page where the breaks.
@RW9 your program doesn't work. You are doing conditional lag which doesn't work here..
Yes, it worked for the small test data I had, so didn't look at it further. I only really wanted to show the select statement.
@RW9: Your code introduces conditional execution of LAG functions in a twofold way:
The error pointed out by data_null__ is due to item 1. To see the impact of item 2, you can add an 11th observation to data_null__'s sample data with ID='C' and score=20. This observation will be incorrectly flagged "No change" (rather than "Decreased") due to item 2.
With your sample data, however, this error is not visible in the output. In contrast, with data_null__'s sample data it would become apparent, as shown below.
(EDIT: The sort order of this test data set might not be what you want in practice, but it doesn't interfere with my explanations of the LAG function.)
The third observation of ID C is incorrectly flagged "No Change," because for the previous observation the first IF condition was met. As a consequence, the queue behind the second lag(score) was not touched. So, when the third obs. of ID C is processed, it still contains the value from the first obs. of ID C. The condition 'C'='C' and 20=20 (comparing values from the first and third observation of ID C) yields the incorrect flag.
I think, two very important points for understanding the LAG function are that
Please note that the DIF function is closely related to the LAG function and therefore requires similar attention.
As mohamed_zaki has mentioned, variable DATE has no impact on variable FLAG as it is not involved in its definition.
[Edit 2019-04-30: Reattached screenshot, which had been deleted by mistake.]
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.