I want to calculate the mean of 'score raw'. How can i do it? I tried to save it as an array and do it but since some values are missing I think it wont work.Also since the array should have elements of the same data type I think what I have done is incorrect. I have attached the excel file here.
proc import filename="/folders/myfolders/excel_data"
dbms = xlsx out= newdata replace;
run;
data newdata1;
set newdata;
array sc(2) score_raw score_num;
do i=1;
if sc(i) = 'missing' or '' then do;
sc(i)=0;
end;
end;
run;
proc means data=newdata1;
var sc
run;
You shouldn't need to change the data at all. Just compute the mean of the existing values. PROC MEANS will ignore the missing values automatically. Wouldn't you get the wrong mean if you first change the missing values to zero?
Some of the values are listed as 'missing'. So the proc means step shows it as an error.
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 proc means data=newdata;
74 var score_raw;
ERROR: Variable score_raw in list does not match type prescribed for this list.
75 run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
76
77 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
89
proc import filename="/folders/myfolders/excel_data"
dbms = xlsx out= newdata replace;
run;
proc means data=newdata;
var score_raw;
run;
Usual solution:
Take the useless Excel file format out of the process. Importing it into SAS involves guessing, leading to inconsistent results.
Save your data to a text file, and read that with a data step that does the conversion while importing:
data test;
infile datalines dlm=';' dsd truncover;
input id $ _score_raw :$20.;
if _score_raw ne 'missing'
then score_raw = input(_score_raw,20.);
/* else not necessary, as score_raw will be set to missing anyway */
drop _score_raw;
datalines4;
A;1
B;missing
C;2
;;;;
so that your analysis variable ends up having the correct type.
I am trying to calculate the mean of only one variable. When you say 'you cannot change a variable type' does it mean i cannot replace "missing" with a blank input ? because it means the same. I am unable to calculate the mean because it is mentioned as "missing".
@cy_th wrote:
I want to calculate the mean of 'score raw'. How can i do it? I tried to save it as an array and do it but since some values are missing I think it wont work.Also since the array should have elements of the same data type I think what I have done is incorrect. I have attached the excel file here.
proc import filename="/folders/myfolders/excel_data" dbms = xlsx out= newdata replace; run; data newdata1; set newdata; array sc(2) score_raw score_num; do i=1; if sc(i) = 'missing' or '' then do; sc(i)=0; end; end; run; proc means data=newdata1; var sc run;
In addition to the other comments on reading data as desired you want to think very carefully about changing a missing value to 0 is appropriate before doing any summary statistic calculation. Consider data such as:
value
10
missing
missing
missing
20
missing
missing
missing
If I calculate a mean for the non-missing values it would be (10+20)/2 = 15. If I replace the missing with 0 then you are looking at (10+0+0+0+20+0+0+0)/8=> 30/8= 3.75. So, which "mean" do you need? Most of the time leaving "missing" as missing is more appropriate in general. Otherwise you are in the role of imputing values and that should be undertaken carefully.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.