## Using X Number of Latest Values for Calculations

Occasional Contributor
Posts: 6

# Using X Number of Latest Values for Calculations

Hi all,

I have a general question for help on efficiency of task I am doing.

I have a dataset which has a set of scores for each student.

STUDENTID   SCORE

1                       5

1                       4

1                       4

1                       3

1                       3

1                       2

1                       5

2                       1

2                       1

2                       1

2                       2

2                       3

2                       1

3                       4

3                       4

What I want to do is create a new dataset which will contain the STUDYID, HIGHEST SCORE, AVERAGE SCORE. this is fine if using all data, however my requirements are to only use, the last 3 valid observations (i.e. not null) per student.

If less than 3 datapoints exists, then HIGHEST and AVERAGE are considered missing.

i.e.

STUDYID  HIGHEST   AVERAGE

1                5                3.33

2                3                2

3                .                 .

I am really just trying to find a way to select thos last 3 values per student to use.

Kind regards,

Super User
Posts: 6,757

## Re: Using X Number of Latest Values for Calculations

You will need to process the data twice.  One time counts the number of records, and the other calculates statistics.  The easiest way might be to use a single DATA step:

data want;

n_records = 0;

do until (last.ID);

set have;

by ID;

where score > .;

n_records + 1;

end;

n_records2 = 0;

useful_records = min(3, n_records);

do until (last.ID);

set have;

by ID;

where score > .;

n_records2 + 1;

if (useful_records >= 3) and (n_records2 >= n_records-2) then do;

highest = max(highest, score);

average = sum(average, score);

end;

end;

if average > . then average = average/3;

drop n_records n_records2;

run;

The first DO loop counts, and the second calculates.

Valued Guide
Posts: 765

## Re: Using X Number of Latest Values for Calculations

Hi, this works with your data ...

data x;
input studentid score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4
;

data y (keep=studentid average highest);
array z(100);
do j=1 by 1 until(last.studentid);
set x;
by studentid;
z(j) = score;
end;
if j ge 3 then do;
highest = max(of z(j) z(j-1) z(j-2));
average = round(mean(of z(j) z(j-1) z(j-2)),.01);;
end;
run;

data set Y ...

Obs    studentid    highest    average

1         1           5         3.33
2         2           3         2.00
3         3           .          .

Super User
Posts: 6,757

## Re: Using X Number of Latest Values for Calculations

Much simpler.  Here's a wrinkle that removes any worries about the number of incoming scores:

array z {0:2} z0-z2;

Then:

z{mod(j,3)} = score;

Then compute max and average based on z{*}.

Valued Guide
Posts: 765

## Re: Using X Number of Latest Values for Calculations

[ Edited ]
Posted in reply to Astounding

Hi, OK but no need to bother with the z(0:2) part ...

data y (keep=studentid average highest);
array z(3);
do j=1 by 1 until(last.studentid);
set x;
by studentid;
z(mod(j,3)+1) = score;

end;

if j ge 3 then do;
highest = max(of z(*));
average = round(mean(of z(*)),.01);;
end;
run;

Posts: 5,523

## Re: Using X Number of Latest Values for Calculations

[ Edited ]

The simpler the better

``````
data have;
input StudentId score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4
;

data want;
do until(last.studentId);
set have; by studentId;
where score is not missing;
v2 = v1;
v1 = v0;
v0 = score;
end;
if not missing(v2) then do;
Highest = max(of v:);
Average = mean(of v:);
end;
keep studentId average highest;
format average 4.1;
run;

proc print data=want noobs; run;``````

thanks @MikeZdeb

PG
Discussion stats
• 5 replies
• 321 views
• 0 likes
• 4 in conversation