I have a data set that contains the variables "eligibility" and "predicted eligibility". These variables are assigned values 1 and 0. I wish to find the percentage of the times that the two variables match (i.e eligibility=0 and predicted eligibility=0 + eligibility=1 and predicted eligibility=1 all divided by the total)-this is to be called the match rate. The eligibility variable always remains the same, however the predicted eligibility variable depends on a probablity (called prob) that is used in assigning the predicted eligibility as being either 0 (if found_predict>=prob) or 1 (if found_predict<prob), where found_predict is a constant variable. I want to set up an array in a do loop that finds the matching rate for each probability. That is, i want code that looks at values for prob between 0.3 and 0.6 say (in increments of 0.01) and determines the match rate for each.

Example data please?

It is a bit confusing to figure out what you are trying to do.  It is conceivable that arrays are a good tool for the job, or an irrelevant tool for the job.  At any rate, here are a couple of pieces of the puzzle.  To create your flag indicating whether actual eligibility matches predicted:


data want;

set have;

match = (eligibility = predicted_eligibility);



To get its average value across observations (many ways to do this):


proc summary data=want;

   var match;

   output out=stats (keep=match_rate) mean=match_rate;


