I’m trying to understand some basic DATA step functions using fabricated data (See below). Among other things, I’m trying to write an array that converts character variables to numeric.
DATA new;
LENGTH var_5 $ 2;
/* Change the length of var_5 */
/* LENGTH statment will change the length of var_5 to 2 characters by truncating any characters after 2 */
SET output (DROP=var_2);
/* Drop var_2 */
/* DROP option (not statement). var_2 will not be read into the "new" data set */
/* Change character variables to numeric */
/* var_3_new = INPUT(var_3, 2.); */
/* var_4_new = INPUT(var_4, 2.); */
/* Trying to change character to numeric using an array */
ARRAY array_from(*)
var_3
var_4
;
ARRAY array_to(*)
var_3_new
var_4_new
;
DO i=1 to dim(array_from);
array_to(i) = INPUT(array_from(i),2.);
/* What does the "2." refer to here? */
END;
DROP
var_3
var_4
i;
RENAME
var_5 = state
/* Rename var_5 to "state" */
/* Renaming the new variables so they simply replace the old variables */
var_3_new = var_3
var_4_new = var_4;
RUN;
The array seems to work, but I would appreciate any feedback. If it works, it works because I got lucky.
Also, the newly converted numeric variables do not have Formats or Informats (please screen PROC CONTENTS screenshot). Is this a problem? How would I add them?
Original PROC CONTENTS:
New PROC CONTENTS:
The log states: “NOTE: Invalid argument to function INPUT at line 95 column 19.” (See below)
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 DATA new; 70 71 LENGTH var_5 $ 2; 72 /* Change the length of var_5 */ 73 /* LENGTH statment will change the length of var_5 to 2 characters by truncating any characters after 2 */ 74 75 SET output (DROP=var_2); 76 /* Drop var_2 */ 77 /* DROP option (not statement). var_2 will not be read into the "new" data set */ 78 79 /* Change character variables to numeric */ 80 /* var_3_new = INPUT(var_3, 2.); */ 81 /* var_4_new = INPUT(var_4, 2.); */ 82 83 /* Trying to change character to numeric using an array */ 84 ARRAY array_from(*) 85 var_3 86 var_4 87 ; 88 89 ARRAY array_to(*) 90 var_3_new 91 var_4_new 92 ; 93 94 DO i=1 to dim(array_from); 95 array_to(i) = INPUT(array_from(i),2.); 96 /* What does the "2." refer to here? */ 97 END; 98 99 DROP 100 var_3 101 var_4 102 i; 103 104 RENAME 105 var_5 = state 106 /* Rename var_5 to "state" */ 107 108 /* Renaming the new variables so they simply replace the old variables */ 109 var_3_new = var_3 110 var_4_new = var_4; 111 112 RUN; WARNING: Multiple lengths were specified for the variable var_5 by input data set(s). This can cause truncation of data. NOTE: Invalid argument to function INPUT at line 95 column 19. var_5=Mi id=1 var_1=1.2 var_3=NA var_4=1 var_3_new=. var_4_new=1 i=3 _ERROR_=1 _N_=1 NOTE: Invalid argument to function INPUT at line 95 column 19. var_5=NY id=3 var_1=. var_3= var_4=NA var_3_new=. var_4_new=. i=3 _ERROR_=1 _N_=3 NOTE: Invalid argument to function INPUT at line 95 column 19. var_5=CA id=4 var_1=4.3445 var_3=NA var_4=0 var_3_new=. var_4_new=0 i=3 _ERROR_=1 _N_=4 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values. Each place is given by: (Number of times) at (Line):(Column). 3 at 95:19 NOTE: There were 5 observations read from the data set WORK.OUTPUT. NOTE: The data set WORK.NEW has 5 observations and 5 variables.
This statement is repeated 3 times. Each time for an observation that has a character value.
What is this message telling me?
The log also states: “Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.”
This message occurs for the same 3 observations.
What mathematical operation is being referred to here? Is this an error?
Thanks for any assistance you can provide.
Create table output(id float, var_1 float, var_2 varchar(1), var_3 varchar(3), var_4 varchar(2), var_5 varchar(11)); Insert into output(id, var_1, var_2, var_3, var_4, var_5) Values(1, 1.2, 'M', 'NA', '1', 'Mississippi'); Insert into output(id, var_1, var_2, var_3, var_4, var_5) Values(2, 2.388888, 'F', '3', '0', 'NJ'); Insert into output(id, var_1, var_2, var_3, var_4, var_5) Values(3, ., '', '', 'NA', 'NY'); Insert into output(id, var_1, var_2, var_3, var_4, var_5) Values(4, 4.3445, 'F', 'NA', '0', 'CA'); Insert into output(id, var_1, var_2, var_3, var_4, var_5) Values(5, 3.1, 'M', '3.1', '1', 'OR');
... View more