In my dataset, i have variables subject day, grade.
data have; input subject days grade; datalines; 1234 0 0 / *retain */ 1234 8 0 1234 10 0 1234 12 0 5678 0 0 / *retain */ 5678 8 0 5678 10 1 /* retain */ 5678 12 1 5678 15 2 /* retain*/
5678 22 2
5678 35 2
5678 42 3 /* retain*/
run;
I want to retain the first observation every time the grade changes.
I need the output to look as below,
subject days grade
1234 0 0
5678 0 0 5678 10 1 5678 15 2
5678 42 3
I tried doing,
data want;
set have;
by subject days;
retain grade;
if NOT first.subject then prev_grade = lag(grade);
IF grade ne prev_grade then output;
run;
The result was almost ok except for the first subject, where there is only one grade.
Any suggestions?
Since you are looking for observations where GRADE changes, use that in the BY statement:
data want;
set have;
by subject grade notsorted;
if first.grade;
run;
Since you are looking for observations where GRADE changes, use that in the BY statement:
data want;
set have;
by subject grade notsorted;
if first.grade;
run;
I think the sample data i shared is pretty simple. But the real data has many changes in grade and it needs to be captured in the sequence it happens in regard with my days. The code you suggested is capturing the overall grade change. Please let me know if i am not being clear. I will try to work a more detailed example of data.
I'll have to see the updated question. Remember, I'm not suggesting that you sort the data ... just leave it in its current order. That's the purpose of adding "notsorted" on the BY statement. The data isn't sorted by GRADE, but you can still detect when GRADE changes.
Yes you are right. I sorted before your suggested step and it messed it up. Thanks. Your code works.
Here is a more detailed example.
data have; input subject days grade; datalines; 1234 0 0 1234 1 0 1234 8 0 1234 15 0 1234 22 1 1234 29 1 1234 36 2 1234 43 2 1234 45 2 1234 52 2 1234 58 0 1234 64 0 1234 67 1 1234 71 0 1234 78 1 1234 85 1 1234 87 0 1234 92 0 1234 99 0 1234 106 0 1234 108 1 1234 113 1 1234 120 1 1234 127 1 1234 134 0 1234 141 0 1234 149 2 1234 155 0 1234 162 0 1234 169 1 1234 176 1 1234 184 2 1234 190 0 1234 197 1 1234 205 1 1234 211 0 1234 218 1 1234 225 1 1234 232 1 5678 0 0 5678 1 0 5678 8 0 5678 15 0 5678 22 0 5678 29 0 5678 36 0 5678 43 0 5678 50 0 5678 53 0 5678 57 0 5678 59 0 run;
I expect the output to be,
subject days grade
1234 0 0
1234 22 1
1234 36 2
1234 58 0
1234 67 1
1234 71 0
1234 78 1
1234 87 0
1234 108 1
1234 134 0
1234 149 2
1234 155 0
1234 169 1
1234 184 2
1234 190 0
1234 197 1
1234 211 0
1234 218 1
5678 0 0
data want;
set have;
if lag(grade) ne grade then output;
run;
Try the lag function and conditionally output.
@bobpep212 wrote:
data want; set have; if lag(grade) ne grade then output; run;
Try the lag function and conditionally output.
Since the values are nested with subject you would also need to test if subject changed. In the sample data your test would not find the second subject's first value since the first subject ends with GRADE=0 and the second subject starts with GRADE=0.
Either test that using similar LAG() function:
data want;
set have;
if lag(subject) ne subject OR lag(grade) ne grade then output;
run;
Or use BY group for SUBJECT.
data want;
set have;
by subject;
if first.subject OR lag(grade) ne grade then output;
run;
Thank you very much. Very useful suggestions. I tried them and they worked too.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.