Using X Number of Latest Values for Calculations

Reply
Occasional Contributor
Posts: 6

Using X Number of Latest Values for Calculations

Hi all,

 

I have a general question for help on efficiency of task I am doing.

 

I have a dataset which has a set of scores for each student.

 

STUDENTID   SCORE

1                       5

1                       4

1                       4

1                       3

1                       3

1                       2

1                       5

2                       1

2                       1

2                       1

2                       2

2                       3

2                       1

3                       4

3                       4

 

What I want to do is create a new dataset which will contain the STUDYID, HIGHEST SCORE, AVERAGE SCORE. this is fine if using all data, however my requirements are to only use, the last 3 valid observations (i.e. not null) per student.

 

If less than 3 datapoints exists, then HIGHEST and AVERAGE are considered missing.

 

i.e.

 

STUDYID  HIGHEST   AVERAGE

1                5                3.33

2                3                2

3                .                 .

 

I am really just trying to find a way to select thos last 3 values per student to use.

 

Kind regards,

 

Super User
Posts: 5,351

Re: Using X Number of Latest Values for Calculations

You will need to process the data twice.  One time counts the number of records, and the other calculates statistics.  The easiest way might be to use a single DATA step:

 

data want;

   n_records = 0;

   do until (last.ID);

      set have;

      by ID;

     where score > .;

      n_records + 1;

   end;

   n_records2 = 0;

   useful_records = min(3, n_records);

   do until (last.ID);

      set have;

      by ID;

      where score > .;

      n_records2 + 1;

      if (useful_records >= 3) and (n_records2 >= n_records-2) then do;

         highest = max(highest, score);

         average = sum(average, score);

      end;

   end;

   if average > . then average = average/3;

   drop n_records n_records2;

run;

     

The first DO loop counts, and the second calculates.

Valued Guide
Posts: 765

Re: Using X Number of Latest Values for Calculations

Hi, this works with your data ...

 

data x;
input studentid score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4
;

 

data y (keep=studentid average highest);
array z(100);
do j=1 by 1 until(last.studentid);
   set x;
   by studentid;
   z(j) = score;
end;
if j ge 3 then do;
   highest = max(of z(j) z(j-1) z(j-2));
   average = round(mean(of z(j) z(j-1) z(j-2)),.01);;
end;
run;

 

data set Y ...


Obs    studentid    highest    average

 1         1           5         3.33
 2         2           3         2.00
 3         3           .          .

Super User
Posts: 5,351

Re: Using X Number of Latest Values for Calculations

Much simpler.  Here's a wrinkle that removes any worries about the number of incoming scores:

 

array z {0:2} z0-z2;

 

Then:

 

z{mod(j,3)} = score;

 

Then compute max and average based on z{*}.

Valued Guide
Posts: 765

Re: Using X Number of Latest Values for Calculations

[ Edited ]

Hi, OK but no need to bother with the z(0:2) part ...

 

data y (keep=studentid average highest);
array z(3);
do j=1 by 1 until(last.studentid);
   set x;
   by studentid;
   z(mod(j,3)+1) = score;

end;

if j ge 3 then do;
   highest = max(of z(*));
   average = round(mean(of z(*)),.01);;
end;
run;

Respected Advisor
Posts: 4,801

Re: Using X Number of Latest Values for Calculations

[ Edited ]

The simpler the better

 


data have;
input StudentId score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4 
;

data want;
do until(last.studentId);
    set have; by studentId;
    where score is not missing;
    v2 = v1;
    v1 = v0;
    v0 = score;
    end;
if not missing(v2) then do;
    Highest = max(of v:);
    Average = mean(of v:);
    end;
keep studentId average highest;
format average 4.1;
run;

proc print data=want noobs; run;

thanks @MikeZdeb

PG
Ask a Question
Discussion stats
  • 5 replies
  • 301 views
  • 0 likes
  • 4 in conversation