BookmarkSubscribeRSS Feed
cy_th
Fluorite | Level 6

I want to calculate the mean of 'score raw'. How can i do it? I tried to save it as an array and do it but since some values are missing I think it wont work.Also since the array should have elements of the same data type I think what I have done is incorrect. I have attached the excel file here.

proc import filename="/folders/myfolders/excel_data"
dbms = xlsx out= newdata replace;
run;
data newdata1;
set newdata;
array sc(2) score_raw score_num;
do i=1;
if sc(i) = 'missing' or '' then do;
sc(i)=0;
end;
end;
run;
proc means data=newdata1;
var sc
run;

 

7 REPLIES 7
Astounding
PROC Star

You shouldn't need to change the data at all.  Just compute the mean of the existing values.  PROC MEANS will ignore the missing values automatically.  Wouldn't you get the wrong mean if you first change the missing values to zero?

cy_th
Fluorite | Level 6

Some of the values are listed as 'missing'. So the proc means step shows it as an error.

 
 1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 72         
 73         proc means data=newdata;
 74         var score_raw;
 ERROR: Variable score_raw in list does not match type prescribed for this list.
 75         run;
 
 NOTE: The SAS System stopped processing this step because of errors.
 NOTE: PROCEDURE MEANS used (Total process time):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 76         
 77         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 89         
proc import filename="/folders/myfolders/excel_data"
dbms = xlsx out= newdata replace;
run;
proc means data=newdata;
var score_raw;
run;
Kurt_Bremser
Super User

Usual solution:

Take the useless Excel file format out of the process. Importing it into SAS involves guessing, leading to inconsistent results.

Save your data to a text file, and read that with a data step that does the conversion while importing:

data test;
infile datalines dlm=';' dsd truncover;
input id $ _score_raw :$20.;
if _score_raw ne 'missing'
then score_raw = input(_score_raw,20.);
/* else not necessary, as score_raw will be set to missing anyway */
drop _score_raw;
datalines4;
A;1
B;missing
C;2
;;;; 

so that your analysis variable ends up having the correct type.

Reeza
Super User
Are you calculating an average for a single two fo data or for the entire data or grouped by a variable?

How do you want missing values handled? SAS will automatically exclude them entirely but I notice you're trying to recode them to missing. You cannot change a variable type on the fly in SAS so you can't change the values using the methods above.

1. First import the data
2. Ensure all variable types are correct - numeric or character. If you need to do calculations on it, it needs to be numeric
3. Use PROC MEANS to calculate the mean.

Arrays are not needed here, SAS doesn't treat arrays as objects it's just a shortcut reference to a variable name.
cy_th
Fluorite | Level 6

I am trying to calculate the mean of only one variable. When you say 'you cannot change a variable type' does it mean i cannot replace "missing" with a blank input ? because it means the same. I am unable to calculate the mean because it is mentioned as "missing".D13.png

Reeza
Super User
Here are two options:

1. change missing in Excel to blanks and SAS will likely import the data correctly and you won't have this issue
2. Because of the word 'missing' in the cell, SAS is reading the whole column as numeric, which means that it cannot do math on letters. So you need to convert those to numbers in a data step before doing the proc means. For now, the easiest way is to create a new variable.

ballardw
Super User

@cy_th wrote:

I want to calculate the mean of 'score raw'. How can i do it? I tried to save it as an array and do it but since some values are missing I think it wont work.Also since the array should have elements of the same data type I think what I have done is incorrect. I have attached the excel file here.

proc import filename="/folders/myfolders/excel_data"
dbms = xlsx out= newdata replace;
run;
data newdata1;
set newdata;
array sc(2) score_raw score_num;
do i=1;
if sc(i) = 'missing' or '' then do;
sc(i)=0;
end;
end;
run;
proc means data=newdata1;
var sc
run;

 


In addition to the other comments on reading data as desired you want to think very carefully about changing a missing value to 0 is appropriate before doing any summary statistic calculation. Consider data such as:

value

10

missing

missing

missing

20

missing

missing

missing

 

If I calculate a mean for the non-missing values it would be (10+20)/2 = 15. If I replace the missing with 0 then you are looking at (10+0+0+0+20+0+0+0)/8=> 30/8= 3.75. So, which "mean" do you need? Most of the time leaving "missing" as missing is more appropriate in general. Otherwise you are in the role of imputing values and that should be undertaken carefully.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1880 views
  • 1 like
  • 5 in conversation