BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gpv2000
Calcite | Level 5

In my dataset, i have variables subject day, grade.

data have;

   input subject days grade;

   datalines;
   
   1234    0     0   / *retain  */
   1234    8     0
   1234   10     0
   1234   12     0
   5678    0     0   / *retain  */
   5678    8     0
   5678   10     1   /* retain */
   5678   12     1
   5678   15     2  /* retain*/
5678 22 2
5678 35 2
5678 42 3 /* retain*/
run;

I want to retain the first observation every time the grade changes.

 

 I need the output to look as below,

 

 subject    days  grade  
1234 0 0

5678 0 0 5678 10 1 5678 15 2

5678 42 3

I tried doing,


data want;
   set have;
  by subject days;
  retain grade;
  if NOT first.subject then prev_grade = lag(grade);
  IF grade ne  prev_grade then output;

run;

 

The result was almost ok except for the first subject, where there is only one grade. 

Any suggestions?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Since you are looking for observations where GRADE changes, use that in the BY statement:

 

data want;
   set have;
  by subject grade notsorted;

  if first.grade;

run;

View solution in original post

8 REPLIES 8
Astounding
PROC Star

Since you are looking for observations where GRADE changes, use that in the BY statement:

 

data want;
   set have;
  by subject grade notsorted;

  if first.grade;

run;

gpv2000
Calcite | Level 5

I  think the sample data i shared is pretty simple. But the real data has many changes in grade and it needs to be captured in the sequence it happens in regard with my days. The code you suggested is capturing the overall grade change. Please let me know if i am not being clear. I will try to work a more detailed example of data.

Astounding
PROC Star

I'll have to see the updated question.  Remember, I'm not suggesting that you sort the data ... just leave it in its current order.  That's the purpose of adding "notsorted" on the BY statement.  The data isn't sorted by GRADE, but you can still detect when GRADE changes.

gpv2000
Calcite | Level 5

Yes you are right. I sorted before your suggested step and it messed it up. Thanks. Your code works.

gpv2000
Calcite | Level 5

Here is a more detailed example.

 

 

data have;

   input subject days grade;

datalines;

 1234              0	           0
 1234              1	           0
 1234             8	           0
 1234             15	           0
 1234             22	           1
 1234             29	           1
 1234             36	           2
 1234             43	           2
 1234             45	           2
 1234              52	           2
 1234             58	           0
 1234             64	           0
 1234             67	           1
 1234             71	           0
 1234             78	           1
 1234             85	           1
 1234             87	           0
 1234             92	           0
 1234             99	           0
 1234            106	           0
 1234            108	           1
 1234            113	           1
 1234            120	           1
 1234            127	           1
 1234            134	           0
 1234            141	           0
 1234           149	           2
 1234             155	           0
 1234            162	           0
 1234            169	           1
 1234             176	           1
 1234            184	           2
 1234            190	           0
 1234            197	           1
 1234            205	           1
 1234            211	           0
 1234            218	           1
 1234            225	           1
 1234            232	           1

 5678              0	           0
 5678              1	           0
 5678              8	           0
 5678             15	           0
 5678             22	           0
 5678             29	           0
 5678             36	           0
 5678             43	           0
 5678             50	           0
 5678             53	           0
 5678             57	           0
 5678             59	           0
run;

I expect the output to be,

 

subject days grade

1234     0        0
1234    22       1
1234    36       2
1234    58       0
1234    67       1
1234   71        0
1234   78        1
1234   87        0
1234  108      1
1234  134      0
1234  149      2
1234  155     0
1234  169     1
1234  184     2
1234  190     0
1234  197     1
1234  211     0
1234  218    1

5678   0        0

 

bobpep212
Quartz | Level 8
data want;
set have;
if lag(grade) ne grade then output;
run;

Try the lag function and conditionally output.

Tom
Super User Tom
Super User

@bobpep212 wrote:
data want;
set have;
if lag(grade) ne grade then output;
run;

Try the lag function and conditionally output.


Since the values are nested with subject you would also need to test if subject changed.  In the sample data your test would not find the second subject's first value since the first subject ends with GRADE=0 and the second subject starts with GRADE=0.

Either test that using similar LAG() function:

data want;
  set have;
  if lag(subject) ne subject OR lag(grade) ne grade then output;
run;

Or use BY group for SUBJECT.

data want;
  set have;
  by subject;
  if first.subject OR lag(grade) ne grade then output;
run;
gpv2000
Calcite | Level 5

Thank you very much. Very useful suggestions. I tried them and they worked too.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1805 views
  • 0 likes
  • 4 in conversation