BookmarkSubscribeRSS Feed
wjugon85
Calcite | Level 5

Hi all,

 

I have a general question for help on efficiency of task I am doing.

 

I have a dataset which has a set of scores for each student.

 

STUDENTID   SCORE

1                       5

1                       4

1                       4

1                       3

1                       3

1                       2

1                       5

2                       1

2                       1

2                       1

2                       2

2                       3

2                       1

3                       4

3                       4

 

What I want to do is create a new dataset which will contain the STUDYID, HIGHEST SCORE, AVERAGE SCORE. this is fine if using all data, however my requirements are to only use, the last 3 valid observations (i.e. not null) per student.

 

If less than 3 datapoints exists, then HIGHEST and AVERAGE are considered missing.

 

i.e.

 

STUDYID  HIGHEST   AVERAGE

1                5                3.33

2                3                2

3                .                 .

 

I am really just trying to find a way to select thos last 3 values per student to use.

 

Kind regards,

 

5 REPLIES 5
Astounding
PROC Star

You will need to process the data twice.  One time counts the number of records, and the other calculates statistics.  The easiest way might be to use a single DATA step:

 

data want;

   n_records = 0;

   do until (last.ID);

      set have;

      by ID;

     where score > .;

      n_records + 1;

   end;

   n_records2 = 0;

   useful_records = min(3, n_records);

   do until (last.ID);

      set have;

      by ID;

      where score > .;

      n_records2 + 1;

      if (useful_records >= 3) and (n_records2 >= n_records-2) then do;

         highest = max(highest, score);

         average = sum(average, score);

      end;

   end;

   if average > . then average = average/3;

   drop n_records n_records2;

run;

     

The first DO loop counts, and the second calculates.

MikeZdeb
Rhodochrosite | Level 12

Hi, this works with your data ...

 

data x;
input studentid score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4
;

 

data y (keep=studentid average highest);
array z(100);
do j=1 by 1 until(last.studentid);
   set x;
   by studentid;
   z(j) = score;
end;
if j ge 3 then do;
   highest = max(of z(j) z(j-1) z(j-2));
   average = round(mean(of z(j) z(j-1) z(j-2)),.01);;
end;
run;

 

data set Y ...


Obs    studentid    highest    average

 1         1           5         3.33
 2         2           3         2.00
 3         3           .          .

Astounding
PROC Star

Much simpler.  Here's a wrinkle that removes any worries about the number of incoming scores:

 

array z {0:2} z0-z2;

 

Then:

 

z{mod(j,3)} = score;

 

Then compute max and average based on z{*}.

MikeZdeb
Rhodochrosite | Level 12

Hi, OK but no need to bother with the z(0:2) part ...

 

data y (keep=studentid average highest);
array z(3);
do j=1 by 1 until(last.studentid);
   set x;
   by studentid;
   z(mod(j,3)+1) = score;

end;

if j ge 3 then do;
   highest = max(of z(*));
   average = round(mean(of z(*)),.01);;
end;
run;

PGStats
Opal | Level 21

The simpler the better

 


data have;
input StudentId score @@;
datalines;
1  5 1  4 1  4 1  3 1  3 1  2 1  5 2  1
2  1 2  1 2  2 2  3 2  1 3  4 3  4 
;

data want;
do until(last.studentId);
    set have; by studentId;
    where score is not missing;
    v2 = v1;
    v1 = v0;
    v0 = score;
    end;
if not missing(v2) then do;
    Highest = max(of v:);
    Average = mean(of v:);
    end;
keep studentId average highest;
format average 4.1;
run;

proc print data=want noobs; run;

thanks @MikeZdeb

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 599 views
  • 0 likes
  • 4 in conversation