DATA Step, Macro, Functions and more

Retain first observation

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Retain first observation

In my dataset, i have variables subject day, grade.

data have;

   input subject days grade;

   datalines;
   
   1234    0     0   / *retain  */
   1234    8     0
   1234   10     0
   1234   12     0
   5678    0     0   / *retain  */
   5678    8     0
   5678   10     1   /* retain */
   5678   12     1
   5678   15     2  /* retain*/
5678 22 2
5678 35 2
5678 42 3 /* retain*/
run;

I want to retain the first observation every time the grade changes.

 

 I need the output to look as below,

 

 subject    days  grade  
1234 0 0

5678 0 0 5678 10 1 5678 15 2

5678 42 3

I tried doing,


data want;
   set have;
  by subject days;
  retain grade;
  if NOT first.subject then prev_grade = lag(grade);
  IF grade ne  prev_grade then output;

run;

 

The result was almost ok except for the first subject, where there is only one grade. 

Any suggestions?


Accepted Solutions
Solution
a week ago
Super User
Posts: 6,751

Re: Retain first observation

Since you are looking for observations where GRADE changes, use that in the BY statement:

 

data want;
   set have;
  by subject grade notsorted;

  if first.grade;

run;

View solution in original post


All Replies
Solution
a week ago
Super User
Posts: 6,751

Re: Retain first observation

Since you are looking for observations where GRADE changes, use that in the BY statement:

 

data want;
   set have;
  by subject grade notsorted;

  if first.grade;

run;

Occasional Contributor
Posts: 18

Re: Retain first observation

Posted in reply to Astounding

I  think the sample data i shared is pretty simple. But the real data has many changes in grade and it needs to be captured in the sequence it happens in regard with my days. The code you suggested is capturing the overall grade change. Please let me know if i am not being clear. I will try to work a more detailed example of data.

Super User
Posts: 6,751

Re: Retain first observation

I'll have to see the updated question.  Remember, I'm not suggesting that you sort the data ... just leave it in its current order.  That's the purpose of adding "notsorted" on the BY statement.  The data isn't sorted by GRADE, but you can still detect when GRADE changes.

Occasional Contributor
Posts: 18

Re: Retain first observation

Posted in reply to Astounding

Yes you are right. I sorted before your suggested step and it messed it up. Thanks. Your code works.

Occasional Contributor
Posts: 18

Re: Retain first observation

Here is a more detailed example.

 

 

data have;

   input subject days grade;

datalines;

 1234              0	           0
 1234              1	           0
 1234             8	           0
 1234             15	           0
 1234             22	           1
 1234             29	           1
 1234             36	           2
 1234             43	           2
 1234             45	           2
 1234              52	           2
 1234             58	           0
 1234             64	           0
 1234             67	           1
 1234             71	           0
 1234             78	           1
 1234             85	           1
 1234             87	           0
 1234             92	           0
 1234             99	           0
 1234            106	           0
 1234            108	           1
 1234            113	           1
 1234            120	           1
 1234            127	           1
 1234            134	           0
 1234            141	           0
 1234           149	           2
 1234             155	           0
 1234            162	           0
 1234            169	           1
 1234             176	           1
 1234            184	           2
 1234            190	           0
 1234            197	           1
 1234            205	           1
 1234            211	           0
 1234            218	           1
 1234            225	           1
 1234            232	           1

 5678              0	           0
 5678              1	           0
 5678              8	           0
 5678             15	           0
 5678             22	           0
 5678             29	           0
 5678             36	           0
 5678             43	           0
 5678             50	           0
 5678             53	           0
 5678             57	           0
 5678             59	           0
run;

I expect the output to be,

 

subject days grade

1234     0        0
1234    22       1
1234    36       2
1234    58       0
1234    67       1
1234   71        0
1234   78        1
1234   87        0
1234  108      1
1234  134      0
1234  149      2
1234  155     0
1234  169     1
1234  184     2
1234  190     0
1234  197     1
1234  211     0
1234  218    1

5678   0        0

 

Occasional Contributor
Posts: 11

Re: Retain first observation

data want;
set have;
if lag(grade) ne grade then output;
run;

Try the lag function and conditionally output.

Super User
Super User
Posts: 8,069

Re: Retain first observation

Posted in reply to bobpep212

@bobpep212 wrote:
data want;
set have;
if lag(grade) ne grade then output;
run;

Try the lag function and conditionally output.


Since the values are nested with subject you would also need to test if subject changed.  In the sample data your test would not find the second subject's first value since the first subject ends with GRADE=0 and the second subject starts with GRADE=0.

Either test that using similar LAG() function:

data want;
  set have;
  if lag(subject) ne subject OR lag(grade) ne grade then output;
run;

Or use BY group for SUBJECT.

data want;
  set have;
  by subject;
  if first.subject OR lag(grade) ne grade then output;
run;
Occasional Contributor
Posts: 18

Re: Retain first observation

Thank you very much. Very useful suggestions. I tried them and they worked too.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 99 views
  • 0 likes
  • 4 in conversation