Why doesn't the below replace the old character value with the right numeric? I have seen the solution in my textbook but I still don't understand as to why I need to name a new array in order to get numeric values for the four variables? The variable subject{I} should be converted to numeric with the input function, there shouldn't be a necessity to make a whole new array no?
Wrong data quiz.cleaned; set quiz.all_score; array subject{4} read math science write; do I= 1 to dim(subject); if subject{I} = 'missing' then subject{i} = ''; subject{I} = input(subject{I}, 30.); end; run;
Right
data quiz.cleaned2;
set quiz.all_score;
array subject{4} read math science write;
array subject_2{4} readN mathN scienceN writeN;
do I= 1 to dim(subject);
if subject{I} = 'missing' then subject{i} = '';
subject_2{I} = subject{I} * 1;
end;
run;
As long as you assign ' ' character value, the variables comprising subject{ } array are declared as type of CHARACTER:
if subject{I} = 'missing' then subject{i} = '';
You cannot change variable type from CHARACTER to NUMERIC as long as it is assigned in a data step.
As long as you assign ' ' character value, the variables comprising subject{ } array are declared as type of CHARACTER:
if subject{I} = 'missing' then subject{i} = '';
You cannot change variable type from CHARACTER to NUMERIC as long as it is assigned in a data step.
You cannot change the type of an existing variable, you need to create a new one. This is (of course, I'm tempted to say) also true for variables addressed through an array.
The first step would only convert the numeric result of the INPUT() function back to a string for storing in the character variable.
data quiz.cleaned2; set quiz.all_score; array subject{4} read math science write; array subject_2{4} readN mathN scienceN writeN; do I= 1 to dim(subject); if subject{I} = 'missing' then subject{i} = ''; subject_2{I} = subject{I} * 1; end; run;
Two guesses
Once a variable is defined as character, which seems to be the case of your subjects from the comparison code you use, the type cannot be changed.
This is a general rule for all SAS variables ( and most languages with any sort of variable typing).
So to create a numeric value for calculations you need a separate variable to hold the numeric version.
OR when you read external data that is supposed to be numeric but may have a code value like "missing" you can use a custom informat.
proc format ; invalue codemissing (Upcase) 'MISSING' = . ; run; data example; informat x codemissing.; input x; datalines; 1 34 missing 345.6 mISSING ;
INVALUE statements in Proc Format create ways to read text into numeric values. In this case it will make the value upper case to compare different possible spellings of missing all with the upper case and assign a missing value to the numeric value. Since the name of the informat created does not use the $ to name it the values are assumed to be numeric and if there is not a specific value listed the BEST format is used.
The custom informat approach would also allow you keep track of different types of missing values. Suppose you have the ones with "missing" you could use the informat to assign a special missing of .M which would refer to a coded missing. If the value were blank in the source file then you would see a basic . for missing.
But neither value would affect calculations of means, sums or other statistics.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.