Hello everyone, I'd like to ask how to do a grouping from score with array and loop.
I've tried what shown in Solved: Do loops with IF/THEN statements - SAS Support Communities but it's not working for me.
Supposed I have a data like this.
data have ;
input id (score1-score3) ($);
cards;
1 25 50 75
2 100 90 80
3 17 33 72
4 55 43 95
5 76 85 39
;
I want to group each score using array like this.
data want;
	set have;
	array score {3} score1 - score3;
    array grade {3} grade1 - grade3;
    do i = 1 to 3;
	if		score(i)	<= 25	then grade(i) = 'D';
	else if	score(i)	<= 50	then grade(i) = 'C';
	else if	score(i)	<= 75	then grade(i) = 'B';
	else if	score(i)	<= 100	then grade(i) = 'A';
	else	grade(i) = "-";
    end;
    drop i;
run;It shows error " Invalid numeric data, 'D' , at line 86 column 37. "
Did I miss something?
You log will show more than a single message:
10   data want;
11      set have;
12      array score {3} score1 - score3;
13       array grade {3} grade1 - grade3;
14
15       do i = 1 to 3;
16      if    score(i) <= 25 then grade(i) = 'D';
17      else if  score(i) <= 50 then grade(i) = 'C';
18      else if  score(i) <= 75 then grade(i) = 'B';
19      else if  score(i) <= 100   then grade(i) = 'A';
20      else  grade(i) = "-";
21       end;
22       drop i;
23
24   run;
NOTE: Character values have been converted to numeric
      values at the places given by: (Line):(Column).
      16:10   16:30   17:13   17:33   18:13   18:33   19:13   19:36
      20:10
NOTE: Invalid numeric data, 'D' , at line 16 column 41.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
NOTE: Invalid numeric data, 'B' , at line 18 column 44.
id=1 score1=25 score2=50 score3=75 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=1
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
id=2 score1=100 score2=90 score3=80 grade1=. grade2=. grade3=. i=4
_ERROR_=1 _N_=2
NOTE: Invalid numeric data, 'D' , at line 16 column 41.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
NOTE: Invalid numeric data, 'B' , at line 18 column 44.
id=3 score1=17 score2=33 score3=72 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=3
NOTE: Invalid numeric data, 'B' , at line 18 column 44.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
id=4 score1=55 score2=43 score3=95 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=4
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
id=5 score1=76 score2=85 score3=39 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=5
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
The way you defined the Grade array creates numeric variables. So you cannot assign any of 'A', 'B' or other character value to them. If the variables do not have a predefined type then the Array statement creates numeric variables.
You can create character variables and avoid the invalid data message with:
    array grade {3} $ grade1 - grade3 ;
which will create length 8 character variables. The $ as the first item after the array size indicates that you want character variables. If want a length other than 8, such as 1 to hold a single letter, specify the number of characters after the $.
    array grade {3} $ 1 grade1 - grade3 ;
You probably still have a problem if I understand what your code is supposed to do as the missing values for the scores will all end up with letter D. That is because missing is less than 25. Missing is treated as less than any given value. So the first "IF" needs to address the missing. Note that you can test the missing value code by placing a dot in the position of one of the scores in the Have data step.
While the automatic conversions that SAS does can be helpful, you really should address the values so the message about the conversion doesn't appear. To numeric is often not a problem but other characters than digit and decimal may cause other errors. Is there some reason that the Scores are supposed to be character? As a minimum they won't sort well if you have values like 1 to 9 in the data.
You log will show more than a single message:
10   data want;
11      set have;
12      array score {3} score1 - score3;
13       array grade {3} grade1 - grade3;
14
15       do i = 1 to 3;
16      if    score(i) <= 25 then grade(i) = 'D';
17      else if  score(i) <= 50 then grade(i) = 'C';
18      else if  score(i) <= 75 then grade(i) = 'B';
19      else if  score(i) <= 100   then grade(i) = 'A';
20      else  grade(i) = "-";
21       end;
22       drop i;
23
24   run;
NOTE: Character values have been converted to numeric
      values at the places given by: (Line):(Column).
      16:10   16:30   17:13   17:33   18:13   18:33   19:13   19:36
      20:10
NOTE: Invalid numeric data, 'D' , at line 16 column 41.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
NOTE: Invalid numeric data, 'B' , at line 18 column 44.
id=1 score1=25 score2=50 score3=75 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=1
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
id=2 score1=100 score2=90 score3=80 grade1=. grade2=. grade3=. i=4
_ERROR_=1 _N_=2
NOTE: Invalid numeric data, 'D' , at line 16 column 41.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
NOTE: Invalid numeric data, 'B' , at line 18 column 44.
id=3 score1=17 score2=33 score3=72 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=3
NOTE: Invalid numeric data, 'B' , at line 18 column 44.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
id=4 score1=55 score2=43 score3=95 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=4
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'A' , at line 19 column 47.
NOTE: Invalid numeric data, 'C' , at line 17 column 44.
id=5 score1=76 score2=85 score3=39 grade1=. grade2=. grade3=. i=4 _ERROR_=1
_N_=5
NOTE: There were 5 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
The way you defined the Grade array creates numeric variables. So you cannot assign any of 'A', 'B' or other character value to them. If the variables do not have a predefined type then the Array statement creates numeric variables.
You can create character variables and avoid the invalid data message with:
    array grade {3} $ grade1 - grade3 ;
which will create length 8 character variables. The $ as the first item after the array size indicates that you want character variables. If want a length other than 8, such as 1 to hold a single letter, specify the number of characters after the $.
    array grade {3} $ 1 grade1 - grade3 ;
You probably still have a problem if I understand what your code is supposed to do as the missing values for the scores will all end up with letter D. That is because missing is less than 25. Missing is treated as less than any given value. So the first "IF" needs to address the missing. Note that you can test the missing value code by placing a dot in the position of one of the scores in the Have data step.
While the automatic conversions that SAS does can be helpful, you really should address the values so the message about the conversion doesn't appear. To numeric is often not a problem but other characters than digit and decimal may cause other errors. Is there some reason that the Scores are supposed to be character? As a minimum they won't sort well if you have values like 1 to 9 in the data.
Oh, so I need to define it first. Thank you.
I've tried that missing one, and yes it shown as a D. Adding the first IF as missing(score(i)) solve it.
The scores are in a scale of 0-4 like GPA, and it won't be sorted. The grades are just additional information to already written scores in the table, so people who read the output can see which grade their score are.
Why are Score1-Score3 alphanumeric vars?
Instead of using if-else-statement, i would use a self-defined format to translate/recode scores to grades.
data have ;
    input id score1 - score3;
    cards;
1 25 50 75
2 100 90 80
3 17 33 72
4 55 43 95
5 76 85 39
;
proc format;
    value 
        ScoreToGrade 
        . = "-" 
        0 - 25 = "D" 
        26 - 50 = "C"
        51 - 75 = "B"
        76 - 100 = "A"
    ;
run;
data want;
    set have;
    array scores[3] score1 - score3;
    array grades[3] $ 1 grade1 - grade3;
    do i=1 to dim(scores);
        grades[i]=put(scores[i], ScoreToGrade.);
    end;
    drop i;
run;If "want" is used in a report afterwards, you could skip creating it and use the format directly:
proc print data= have;
    format Score1-Score3 ScoreToGrade.;
run;
Why did you define the SCORE variables as CHARACTER?
That is why you got a message like this in the log.
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      162:9    163:11   164:11   165:11
If you have to keep them as character then add an INPUT() function call to avoid those messages:
Example:
if input(score(i),32.) <= 50 then grade(i) = 'C';Why did you define the GRADE variables as NUMERIC? Define them as CHARACTER so they can hold values like 'C'.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
