Hello all,
I am having some trouble converting my character variables to numerics. I have about 13 variables to convert so I was trying to use an array.
I currently have this array set up:
Array X[13] $ zipcode survdate q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 COVIDscore;
Array Y[13] Newzipcode Newsurvdate Newq1 Newq2 Newq3 Newq4 Newq5 Newq6 Newq7 Newq8 Newq9 Newq10 NewCOVIDscore;
Do I=1 to 13;
Y(I) = input(X(I),best32.);
End;
Drop I
But I keep getting warning statements in my log:
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to
missing values.
Each place is given by: (Number of times) at (Line):(Column).
4052 at 686:18
686 Y(I) = input(X(I),best32.);
Let's look at some details of your existing data:
zipcode=92365-4856 q1=1 q2=1 q3=1 q4=1 q5=1 q6=1 q7=0 q8=0 q9=0 q10=0 COVIDscore=86 survdate=11/02/2020
Zipcode = 92365-4856 is not a number. So attempting to convert it using any of the numeric informats fails. The - either has to precede digits to indicate negative values, -12345 OR in scientific notation to precede the exponent to indicate decimal places 1.5E-5 for example.
f you want Zipcode to be numeric you need to make a decision of what you expect for a numeric value and write specific code for that. Possibly : newzip = input(scan(zipcode,1,'-'),f5.) would be one way to create a five-digit numeric zip.
Similarly survdate is not a simple number because of the / characters and can't be dealt with the same numeric informat. That would take a date informat such as mmddyy10. (and should have a date format associated so people can understand what is stored).
I
You need to look at that messages before that in the log. Basically it is saying the string does not contain a number.
28 data test; 29 array X $32 string; 30 array Y number; 31 input string ; 32 y[1]=input(x[1],32.); 33 cards; NOTE: Invalid argument to function INPUT at line 32 column 8. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 35 abc _I_=. string=abc number=. _ERROR_=1 _N_=2 NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values. Each place is given by: (Number of times) at (Line):(Column). 1 at 32:8 NOTE: The data set WORK.TEST has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 36 ;
If you don't care about the invalid strings and just want to create missing values without SAS writing notes into the SAS log you can use the ? or ?? modifier in front of the informat in the
37 data test; 38 array X $32 string; 39 array Y number; 40 input string ; 41 y[1]=input(x[1],?32.); 42 cards; RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 44 abc _I_=. string=abc number=. _ERROR_=1 _N_=2 NOTE: The data set WORK.TEST has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 45 ; 46 data test; 47 array X $32 string; 48 array Y number; 49 input string ; 50 y[1]=input(x[1],??32.); 51 cards; NOTE: The data set WORK.TEST has 2 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 54 ;
Note there is not a separate informat named BEST. If you ask for it you just get the normal numeric informat.
The log messages above what I initially posted are:
id=3276 city=Otside La County age=50-59 gender= zipcode=92365-4856 q1=1 q2=1 q3=1 q4=1 q5=1 q6=1 q7=0 q8=0 q9=0 q10=0 COVIDscore=86
survdate=11/02/2020 Newgender= Newcity=Outside of Los Angeles Newq1=. Newq2=. Newq3=. Newq4=. Newq5=. Newq6=. Newq7=. Newq8=.
Newq9=. Newq10=. NewCOVIDscore=. Newsurvdate=. I=14 var1=. var2=. var3=1 var4=1 var5=1 var6=1 var7=1 var8=1 var9=0 var10=0 var11=0
var12=0 var13=86 _ERROR_=1 _N_=19
NOTE: Invalid argument to function INPUT at line 686 column 18.
NOTE: Invalid argument to function INPUT at line 686 column 18.
It does that a bunch of times. Do you think its because the format of my zipcode and survdate? (zipcode in the dataset is 9 digits but I was instructed to change it to 5, and survdate in the dataset is MMDDYY10, but I was instructed to change it to weekday.
Let's look at some details of your existing data:
zipcode=92365-4856 q1=1 q2=1 q3=1 q4=1 q5=1 q6=1 q7=0 q8=0 q9=0 q10=0 COVIDscore=86 survdate=11/02/2020
Zipcode = 92365-4856 is not a number. So attempting to convert it using any of the numeric informats fails. The - either has to precede digits to indicate negative values, -12345 OR in scientific notation to precede the exponent to indicate decimal places 1.5E-5 for example.
f you want Zipcode to be numeric you need to make a decision of what you expect for a numeric value and write specific code for that. Possibly : newzip = input(scan(zipcode,1,'-'),f5.) would be one way to create a five-digit numeric zip.
Similarly survdate is not a simple number because of the / characters and can't be dealt with the same numeric informat. That would take a date informat such as mmddyy10. (and should have a date format associated so people can understand what is stored).
I
Ah I see, so for the variables zipcode and survdate I would have to use the input function to convert it to a new variable. I cannot use the array for those 2? But for q1-q10 I can use the array?
@shortyofhb wrote:
Ah I see, so for the variables zipcode and survdate I would have to use the input function to convert it to a new variable. I cannot use the array for those 2? But for q1-q10 I can use the array?
Correct.
SAS is very nice when you have sequentially numbered variables to place in an array. You can use shorthand like: array <name> q1-q10. And for your new variables if you name an array like:
Array newq(10) that will create 10 numeric variables named newq1, newq2, newq3 etc. It is a good idea to understand the individual naming though to get used to thinking of order of definition is important.
Thank you so much for the help!!
So I converted created a new variable for zipcode and survdate with the input function. That clarifies it up alot!
One more quick question (I hope). How would I go about using an array to recode a variable to saying something else? For example, in my Proc Format I have 1="Yes" and 0="No" my instructions are however are to "use an array to recode 0="No" to 2="No."
Would I create an array and have it so when 0 is shown it gets replaced with 2? And then use a format to change 2 to "No"?
Maybe this is homework, if it is ignore the following.
There is just one way to solve problems like yours: fix the data import!
It is a homework assignment unfortunately haha, the dataset is meant to be a mess!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.