BookmarkSubscribeRSS Feed
wjugon85
Calcite | Level 5

Hi all,

 

I have a general question for help on efficiency of task I am doing.

 

I have a dataset which has a set of scores for each student.

 

STUDENTID   SCORE

1                       5

1                       4

1                       4

1                       3

1                       3

1                       2

1                       5

2                       1

2                       1

2                       1

2                       2

2                       3

2                       1

3                       4

3                       4

 

What I want to do is create a new dataset which will contain the STUDYID, HIGHEST SCORE, AVERAGE SCORE. this is fine if using all data, however my requirements are to only use, the last 3 valid observations (i.e. not null) per student.

 

If less than 3 datapoints exists, then HIGHEST and AVERAGE are considered missing.

 

i.e.

 

STUDYID  HIGHEST   AVERAGE

1                5                3.33

2                3                2

3                .                 .

 

I am really just trying to find a way to select thos last 3 values per student to use.

 

Kind regards,

 

5 REPLIES 5
Astounding
PROC Star

You will need to process the data twice.  One time counts the number of records, and the other calculates statistics.  The easiest way might be to use a single DATA step:

 

data want;

   n_records = 0;

   do until (last.ID);

      set have;

      by ID;

     where score > .;

      n_records + 1;

   end;

   n_records2 = 0;

   useful_records = min(3, n_records);

   do until (last.ID);

      set have;

      by ID;

      where score > .;

      n_records2 + 1;

      if (useful_records >= 3) and (n_records2 >= n_records-2) then do;

         highest = max(highest, score);

         average = sum(average, score);

      end;

   end;

   if average > . then average = average/3;

   drop n_records n_records2;

run;

     

The first DO loop counts, and the second calculates.

MikeZdeb
Rhodochrosite | Level 12

Hi, this works with your data ...

 

data x;
input studentid score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4
;

 

data y (keep=studentid average highest);
array z(100);
do j=1 by 1 until(last.studentid);
   set x;
   by studentid;
   z(j) = score;
end;
if j ge 3 then do;
   highest = max(of z(j) z(j-1) z(j-2));
   average = round(mean(of z(j) z(j-1) z(j-2)),.01);;
end;
run;

 

data set Y ...


Obs    studentid    highest    average

 1         1           5         3.33
 2         2           3         2.00
 3         3           .          .

Astounding
PROC Star

Much simpler.  Here's a wrinkle that removes any worries about the number of incoming scores:

 

array z {0:2} z0-z2;

 

Then:

 

z{mod(j,3)} = score;

 

Then compute max and average based on z{*}.

MikeZdeb
Rhodochrosite | Level 12

Hi, OK but no need to bother with the z(0:2) part ...

 

data y (keep=studentid average highest);
array z(3);
do j=1 by 1 until(last.studentid);
   set x;
   by studentid;
   z(mod(j,3)+1) = score;

end;

if j ge 3 then do;
   highest = max(of z(*));
   average = round(mean(of z(*)),.01);;
end;
run;

PGStats
Opal | Level 21

The simpler the better

 


data have;
input StudentId score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4 
;

data want;
do until(last.studentId);
    set have; by studentId;
    where score is not missing;
    v2 = v1;
    v1 = v0;
    v0 = score;
    end;
if not missing(v2) then do;
    Highest = max(of v:);
    Average = mean(of v:);
    end;
keep studentId average highest;
format average 4.1;
run;

proc print data=want noobs; run;

thanks @MikeZdeb

PG

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1382 views
  • 0 likes
  • 4 in conversation