Hello
I want to use array to calculate same formula for multiple variables at once.
I don't get the desired result and the calculated field _x _y _z _r _q _g get null values.
When I calculate it manually (without array) then calculation is fine.
What is wrong with my array???
proc format ;
VALUE Ratio_Till1_Fmt
-9997 ='(1) Ratio un-defined'
0='(2) 0'
0-<0.1='(3) (0,0.1]'
0.1-<0.2='(4) [0.1,0.2)'
0.2-<0.3='(5) [0.2,0.3)'
0.3-<0.4='(6) [0.3,0.4)'
0.4-<0.5='(7) [0.4,0.5)'
0.5-<0.6='(8) [0.5,0.6)'
0.6-<0.7='(9) [0.6,0.7)'
0.7-<0.8='(10) [0.7,0.8)'
0.8-<0.9='(11) [0.8,0.9)'
0.9-<1.0='(12) [0.9,1.0)'
1.0='(13) 1'
;
Run;
Data have;
input X y Z R q g;
cards;
0 0.1 0.8 0.1 1 -9997
0.2 0.3 0.4 0 0 0.4
;
Run;
data want;
set have;
array _vars(*) x y z r q g;
array _Bvars(*) _x _y _z _r _q _g;
do i=1 to dim(_vars);
_Bvars(i)=put(_vars(i),Ratio_Till1_Fmt.);
end;
drop i;
/*calc_x=put(x,Ratio_Till1_Fmt.);*/
/*calc_y=put(y,Ratio_Till1_Fmt.);*/
/*calc_z=put(z,Ratio_Till1_Fmt.);*/
/*calc_r=put(r,Ratio_Till1_Fmt.);*/
/*calc_q=put(q,Ratio_Till1_Fmt.);*/
/*calc_g=put(g,Ratio_Till1_Fmt.);*/
Run;
You have to define the _Bvars array as character with an appropriate length.
data want;
set have;
array _vars(*) x y z r q g;
array _Bvars(*) $20 _x _y _z _r _q _g;
do i=1 to dim(_vars);
_Bvars(i)=put(_vars(i),Ratio_Till1_Fmt.);
end;
drop i;
Run;
You have to define the _Bvars array as character with an appropriate length.
data want;
set have;
array _vars(*) x y z r q g;
array _Bvars(*) $20 _x _y _z _r _q _g;
do i=1 to dim(_vars);
_Bvars(i)=put(_vars(i),Ratio_Till1_Fmt.);
end;
drop i;
Run;
The clue to the problem was the notes in the log about implicit conversion from character values to numeric values:
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 30:1 NOTE: Invalid numeric data, '(2) 0' , at line 30 column 11. NOTE: Invalid numeric data, '(4) [0.1,0.2)' , at line 30 column 11. NOTE: Invalid numeric data, '(11) [0.8,0.9)' , at line 30 column 11. NOTE: Invalid numeric data, '(4) [0.1,0.2)' , at line 30 column 11. NOTE: Invalid numeric data, '(13) 1' , at line 30 column 11. NOTE: Invalid numeric data, '(1) Ratio un-defined' , at line 30 column 11. X=0 y=0.1 Z=0.8 R=0.1 q=1 g=-9997 _x=. _y=. _z=. _r=. _q=. _g=. i=7 _ERROR_=1 _N_=1 NOTE: Invalid numeric data, '(5) [0.2,0.3)' , at line 30 column 11. NOTE: Invalid numeric data, '(6) [0.3,0.4)' , at line 30 column 11. NOTE: Invalid numeric data, '(7) [0.4,0.5)' , at line 30 column 11. NOTE: Invalid numeric data, '(2) 0' , at line 30 column 11. NOTE: Invalid numeric data, '(2) 0' , at line 30 column 11. NOTE: Invalid numeric data, '(7) [0.4,0.5)' , at line 30 column 11. X=0.2 y=0.3 Z=0.4 R=0 q=0 g=0.4 _x=. _y=. _z=. _r=. _q=. _g=. i=7 _ERROR_=1 _N_=2 NOTE: There were 2 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 2 observations and 12 variables.
I think of those notes as errors.
SAS variables are strongly typed, in the sense that a variable is either numeric or character.
But in the DATA step language, you are not required to explicitly define the type of each variable. If you don't define the type, the compiler will decide the type for you, based on rules for the statements/functions/etc used to create the variable.
When you use an array statement to create new variables, and do not explicitly define the variable type, the variables are created as numeric variable.
Then in your assignment statement, you are assigning a character value to a numeric value:
_Bvars(i)=put(_vars(i),Ratio_Till1_Fmt.);
SAS will try to do an implicit character to numeric conversion for you, (which causes the first NOTE which I think should be an error), and then when that conversion fails it will return a missing value (and generate the remaining notes which I think should be errors).
When you don't use an array, the compiler sees:
calc_x=put(x,Ratio_Till1_Fmt.);
And says "oh, this statement is creating a new variable calc_x, it should be a character variable because Ronein is assigning a character value to it, so I'll make calc_x a character variable. And I'll give calc_x a length of $20, because that is the longest length that can be returned by the format Ratio_Till1_Fmt."
Why do you need new character variables to do this? Why not just use the numeric formatted values in array _vars?
Character variables sort alphabetically, numeric variable sort numerically. With character variables, you need the (5) to get things to sort properly; with numeric variables the results can be sorted into desired numerical order easily.
Also, I think it is really poor design and unprofessional to have categories (5) [0.2,0.3) instead of [0.2,0.3). No one cares that you have to use (5) to get things to sort properly, it just confuses people viewing your results, not a good thing. I hate seeing presentations where the months in the table are (1) Jan, (2) Feb and so on. There are many other ways to get things to sort properly. If you leave the values as numeric, which is the best practice for numbers, and don't stick (5) in front of the categories in the format, many procedures have an option ORDER=INTERNAL which forces your values to sort properly; plus other methods as well.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.