Hi,
I am learning from SAS by Example. Here is a relatively simple problem that I do not have the answer for, but someone on here probably knows.
Essentially, each student has taken up to 5 quizzes, with a score from 1 to 5, but some quizzes are missing for some students. I have learned from Learning SAS by Example how to exclude Max Min, mean etc. calculations using the N function if, for example, the total number of missing quizzes is more than 7. However, what if I want to exclude calculating the max, min, mean, etc for any given student who hasn't taken at least 4 quizes (ie. who has 2 or more quizzes missing). .
All my programming is below. Any help would be very much appreciated. Thanks!
----------------------------------------------------
Here is the original data file:
----------------------------------------
libname SASdata '/folders/myfolders/SASData' ;
Data SASData.surveyMy ;
Input Subj$ Gender$ Age Income Q1 Q2 Q3 Q4 Q5 ;
Datalines ;
001 M 23 28000 1 . 1 2 3
002 F 55 76123 4 5 2 1 1
003 M 38 36500 2 2 2 2 1
004 F 67 128000 4 3 2 2 4
005 M 22 23060 3 3 3 4 2
006 M 63 90000 2 3 5 4 3
007 F 45 76100 5 . . 3 3
;
As you can see, there are 7 total missing quizzes. Student 007 has two missing quizzes.
Here is the data that calculates max, min, and mean only if a total of 7 or less quizzes is missing:
libname sasdata '/folders/myfolders/SASData' ;
Data Prob11_4 ;
    Set sasdata.surveymy ;
    If n(of Q1-Q5) lt 7 then Maxscore = Max(of Q1-Q5) ;
    If n(of Q1-Q5) lt 7 then SecondHiScore = Max(Q1-Q5, 2) ;
    If n(of Q1-Q5) lt 7 then Minscore = Min(of Q1-Q5) ;
    If n(of Q1-Q5) lt 7 then QuesAve = Mean(of Q1-Q5) ;
run ;
Now, how do I write code which calculates the Max, Min, and Mean for all students EXCEPT a student which has 2 missing quizzes. In this example, that is student 007.
Thanks for your help!
Instead of runing the same IF test 4 times, run it once. If the test is successful (NMISS>=2) then do the 4 calculations inside a "do group":
    if nmiss(of Q1-Q5)  ge 2 then do;
      Maxscore = Max(of Q1-Q5) ;
      SecondHiScore = Max(Q1-Q5, 2) ;
      Minscore = Min(of Q1-Q5) ;
      QuesAve = Mean(of Q1-Q5) ; 
    end;
But your second stat "SecondHiScore=max(q1-a5,2)" is meaningless. It just adds the value 2 to your list of values for X1 through X5 Take a look at the documentation on the ORDINAL function.
Sorry. I meant to say, "As you can see. there are 3 missing quizzes". (not 7).
Perhaps you want to use NMISS, the number of missing to exclude.
I was very curious about about why you used n(of Q1-Q5) lt 7. Since there are only 5 variables the number of nonmissing values is always less than 7.
Provide your exact rule for when not to calculate some statistic. One example doesn't provide a rule. Do you require 3 , 4 or 5 values to calculate a valid mean? What value to you want if the mean isn't valid? Missing or some other code value?
You may also be interested in the Largest function.
largest(2,q1,q2,q3,q4,q5) will return the second largest value. Max with the 2 as you used would return 2 if all of the Q values were 1.
Hi,
Thanks for answering.
The reason I used n(of Q1-Q5) lt 7. is that I was following the problem in the Book Learning SAS by example. The great frustration I am having with that book is that it used to come with a CD that contained the practice problems that are referred to in the book, but no longer does. I downloaded files associated with the book from th internet, but that link only contains a few of the problems in the book, and so in other cases I am using other datasets that came with the downloadable link to substitute for those not available to me. I didn't realize that the lt n(of Q1-Q5) lt 7 was referring to the number of errors per observation. I thought that meant the total for all observations. Now it makes sense to use a n(of Q1-Q5) lt 2 to exclude more than 2 missing data points per observation.
I did switch out the largest function in place of the max function. It appears the max function only takes the max, not the second or third etc largest.
IF ANYONE KNOWS WHERE I CAN FIND THE LINK FOR LEARNING SAS BY EXAMPLE THAT INCLUDES ALL THE DATA SETS THAT WERE SUPPOSED TO COME WITH THE BOOK THAT WOULD BE EXTREMELY HELPFUL. I have the book with the red cover. It says it was originally published in 2007.
Thanks!
This is the website I used to download the files. I would say that less than 25% of the files referred to in the book problems can be found at this link:
https://support.sas.com/publishing/authors/extras/60864_code.pdf
https://support.sas.com/downloads/package.htm?pid=1709#
There's 120 files of various types in the link above.
The link you referenced had 58 files.
Instead of runing the same IF test 4 times, run it once. If the test is successful (NMISS>=2) then do the 4 calculations inside a "do group":
    if nmiss(of Q1-Q5)  ge 2 then do;
      Maxscore = Max(of Q1-Q5) ;
      SecondHiScore = Max(Q1-Q5, 2) ;
      Minscore = Min(of Q1-Q5) ;
      QuesAve = Mean(of Q1-Q5) ; 
    end;
But your second stat "SecondHiScore=max(q1-a5,2)" is meaningless. It just adds the value 2 to your list of values for X1 through X5 Take a look at the documentation on the ORDINAL function.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
