I'm using an array to convert character variables to numeric. The code executes w/o errors or warnings, but the notes state " NOTE: Invalid argument to function INPUT at line 80 column 16."
What's wrong w/ this syntax? Thanks for your help.
79 DO i=1 to dim(_char);
80 _num(i) = input(_char(i),6.);
81 END;
DATA conversion_subset; SET dropping_strings; Array _char(*) $ _100400 _100500 _100600 _200100 _200200 _200300 _200400 _200401 _400402 _400403 _400601 _400602 _500800 _500900 _501000; Array _num(*) var1-var15; DO i=1 to dim(_char); _num(i) = input(_char(i),6.); END; DROP _100400 _100500 _100600 _200100 _200200 _200300 _200400 _200401 _400402 _400403 _400601 _400602 _500800 _500900 _501000 i; RENAME VAR1 = _100400 VAR2 = _100500 VAR3 = _100600 VAR4 = _200100 VAR5 = _200200 VAR6 = _200300 VAR7 = _200400 VAR8 = _200401 VAR9 = _400402 VAR10 = _400403 VAR11 = _400601 VAR12 = _400602 VAR13 = _500800 VAR14 = _500900 VAR15 = _501000; RUN;
Post some example test data (in the form of a datastep). I would also check the structure of your data, why have 15 variables which seem to get assigned to a _xxxxx variable name? Where does the _xxxxx even come from? I would suggest for ease of doing anything to the data you normlise it, i.e. have a long dataset rather than a wide. Now this code is just a guess:
data conversion_subset (keep=variable char_result result); set dropping_strings; length variable char_result $100 result 8; array var{15}; array lab{15} ("100400","100500","100600","200100","200200","200300","200400","200401","400402","400403","400601","400602","500800","500900","501000"); do i=1 to dim(var); variable=lab{i}; char_result=var{i}; result=input(var{i},best.); output; end; run;
But what it should do is give you a dataset which looks something like:
VARIABLE CHAR_RESULT RESULT
100400 123 123
100500 abc .
...
You will see its far easier to conver a column of data rather than lots of columns, and you also get the benefit of by group processing. If later on you need a transposed output, then proc transpose at that point. Doing the above will also show you quite clearly where a value has not been converted, and what it contains - see the "abc" and missing result. You can then put data cleaning if statements around the result= step.
The syntax is fine. There may be something wrong with the data. This is saying that one (or more) of the incoming character variables contains text that can't legitimately be converted to numeric.
You can get rid of the message by adding ??:
_num(i) = input(_char(i), ??6.);
That doesn't fix the problem, just covers it up.
As @Astounding says it is likely a data issue.
If I were worried about missing an intended conversion I would do a proc freq on the text variables.
For instance if the data has values that are displayed with accounting rules like (1234) to indicate that the value is negative you may not want that be set to missing as your current data would. Other likely things would be currency symbols or commas as part of the values.
Of course if you have incoming values like NULL or N/A or such and those are the only suspect values then you're golden.
Inspect your data.
The NOTE will also supply the number (_N_) of the current iteration when the transformation error happened, so you know which observation(s) was(were) the culprit.
Post some example test data (in the form of a datastep). I would also check the structure of your data, why have 15 variables which seem to get assigned to a _xxxxx variable name? Where does the _xxxxx even come from? I would suggest for ease of doing anything to the data you normlise it, i.e. have a long dataset rather than a wide. Now this code is just a guess:
data conversion_subset (keep=variable char_result result); set dropping_strings; length variable char_result $100 result 8; array var{15}; array lab{15} ("100400","100500","100600","200100","200200","200300","200400","200401","400402","400403","400601","400602","500800","500900","501000"); do i=1 to dim(var); variable=lab{i}; char_result=var{i}; result=input(var{i},best.); output; end; run;
But what it should do is give you a dataset which looks something like:
VARIABLE CHAR_RESULT RESULT
100400 123 123
100500 abc .
...
You will see its far easier to conver a column of data rather than lots of columns, and you also get the benefit of by group processing. If later on you need a transposed output, then proc transpose at that point. Doing the above will also show you quite clearly where a value has not been converted, and what it contains - see the "abc" and missing result. You can then put data cleaning if statements around the result= step.
add to the loop
if prxmatch("/(?i)([a-z])/",_char(i))<=0
This will by pass any with alphabetic characters in the string
@timeless wrote:
add to the loop
if prxmatch("/(?i)([a-z])/",_char(i))<=0
This will by pass any with alphabetic characters in the string
Using ?? as part of the input statement like already suggested is much more efficient than using a RegEx. The ?? syntax will also ALWAYS work if an informat doesn't apply to an input value where I believe your RegEx wouldn' capture "invalid" strings with digits and blanks only, i.e. something like "999 999"
For some reason ?? gives me an error
ERROR 22-322: Expecting a format name.
ERROR 200-322: The symbol is not recognized and will be ignored.
@timeless wrote:
For some reason ?? gives me an error
ERROR 22-322: Expecting a format name.
ERROR 200-322: The symbol is not recognized and will be ignored.
You still need to give it a format. Just add the ?? before the format specification.
input(xxx,??6.)
That was exactly what I was doing
Please post the log of the whole step that produces the error.
@timeless wrote:
That was exactly what I was doing
Check your program and SAS log more carefully. The only way to get that message is to not include a format specification. If you include an invalid format specification you get a different error message.
718 data _null_; 719 input string $20.; 720 num1=input(string,20.); 721 num2=input(string,??20.); 722 num3=input(string,??); - 22 76 723 num4=input(string,); - 22 76 724 num5=input(string,1234); ---- 85 76 ERROR 22-322: Expecting a format name. ERROR 76-322: Syntax error, statement will be ignored. ERROR 85-322: Expecting a format name. 725 cards; NOTE: The SAS System stopped processing this step because of errors. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.03 seconds
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.