Hi folks,
I have mixed numeric data formatted as character. The data values range from the single digits into the billions. Some of the data is null or missing, some has decimal places, some of the numeric characters have commas, etc...
I'm having a really hard time converting these values to numeric with the correct format. I've tried almost all the numeric formats I can think of within an input statement.
As an example:
value1a = input(value1,24.2);
value1a = input(value1,comma24.2);
value1a = input(value1,??24.2);
value1a = input(value1,??24.);
Some of these formats don't produce results for very large numbers, some don't produce results at all. The closest I've gotten was using input(value1,??24.) but this is adding decimals to all the values turing a value like 1280 into 12.80.
Any thoughts on this?
Thanks!
data test;
infile datalines dsd truncover;
input
value1:$200.
value2:$200.
value3:$200.
value4:$200.
;
datalines;
'1001322', '48,371,804.50', '91662', '2,295,968.99'
'1280', '73,393.39', '143, 3,656.94'
'2', 222, , '59888.99'
'255287', '22,401,559.53', '5432618', '210,129,208.53'
'735034', '34,031,056.78', '89640',
'1992923', '104,877,814.20', '5614063', '1,214,793,943.92'
;;;
It looks like some of the data was read in incorrectly (previously). For example, it looks like on observation 2, VALUE3 should be 143 and VALUE4 should be 3,656.94.
Once that is cleared up, the comma24. informat should work just fine.
Try using the BEST32 informat instead and let SAS make a guess. I suspect it may do a better job.
Note that when you specify a informat of 24.2 it expects all values to have two decimal places, so 100 becomes 1.00
@Ody wrote:
Hi folks,
I have mixed numeric data formatted as character. The data values range from the single digits into the billions. Some of the data is null or missing, some has decimal places, some of the numeric characters have commas, etc...
I'm having a really hard time converting these values to numeric with the correct format. I've tried almost all the numeric formats I can think of within an input statement.
As an example:
value1a = input(value1,24.2);
value1a = input(value1,comma24.2);
value1a = input(value1,??24.2);
value1a = input(value1,??24.);
Some of these formats don't produce results for very large numbers, some don't produce results at all. The closest I've gotten was using input(value1,??24.) but this is adding decimals to all the values turing a value like 1280 into 12.80.
Any thoughts on this?
Thanks!
data test; infile datalines dsd truncover; input value1:$200. value2:$200. value3:$200. value4:$200. ; datalines; '1001322', '48,371,804.50', '91662', '2,295,968.99' '1280', '73,393.39', '143, 3,656.94' '2', 222, , '59888.99' '255287', '22,401,559.53', '5432618', '210,129,208.53' '735034', '34,031,056.78', '89640', '1992923', '104,877,814.20', '5614063', '1,214,793,943.92' ;;;
Thanks for the feedback.
I tried using best32. and while some of the data converted accurately much of it did not.
I seem to be running into an issue where data is missing or the numbers are large with decimals or commas.
Any other suggestions?
Try the comma24. informat. The ?? is optional.
Thanks for the reply,
The format comma24. doesn't seem to work either. I tried that earlier just didnt list it as one of the option I tried.
Here is a copy of my log using the best32. format.
Appreciate any insight into this. Thanks.
data test1;
set test;
value1a = input(value1,best32.);
value2a = input(value2,best32.);
value3a = input(value3,best32.);
value4a = input(value4,best32.);
run;
479 data test1;
480 set test;
481
482 value1a = input(value1,best32.);
483 value2a = input(value2,best32.);
484 value3a = input(value3,best32.);
485 value4a = input(value4,best32.);
486
487 run;
NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 485 column 11.
value1=1001322 value2=48,371,804.50 value3=91662 value4=2,295,968.99 value1a=1001322 value2a=.
value3a=91662 value4a=. _ERROR_=1 _N_=1
NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 484 column 11.
value1=1280 value2=73,393.39 value3=143, 3,656.94 value4= value1a=1280 value2a=. value3a=.
value4a=. _ERROR_=1 _N_=2
NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 485 column 11.
value1=255287 value2=22,401,559.53 value3=5432618 value4=210,129,208.53 value1a=255287
value2a=. value3a=5432618 value4a=. _ERROR_=1 _N_=4
NOTE: Invalid argument to function INPUT at line 483 column 11.
value1=735034 value2=34,031,056.78 value3=89640 value4= value1a=735034 value2a=. value3a=89640
value4a=. _ERROR_=1 _N_=5
NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 485 column 11.
value1=1992923 value2=104,877,814.20 value3=5614063 value4=214,793,943.92 value1a=1992923
value2a=. value3a=5614063 value4a=. _ERROR_=1 _N_=6
NOTE: Mathematical operations could not be performed at the following places. The results of
the operations have been set to missing values.
Each place is given by: (Number of times) at (Line):(Column).
5 at 483:11 1 at 484:11 3 at 485:11
NOTE: There were 6 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST1 has 6 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds
NOTE: Remote submit to EMSVR complete.
It looks like some of the data was read in incorrectly (previously). For example, it looks like on observation 2, VALUE3 should be 143 and VALUE4 should be 3,656.94.
Once that is cleared up, the comma24. informat should work just fine.
data test;
infile datalines dsd truncover;
input
value1:$200.
value2:$200.
value3:$200.
value4:$200.
;
datalines;
'1001322', '48,371,804.50', '91662', '2,295,968.99'
'1280', '73,393.39', '143, 3,656.94'
'2', 222, , '59888.99'
'255287', '22,401,559.53', '5432618', '210,129,208.53'
'735034', '34,031,056.78', '89640',
'1992923', '104,877,814.20', '5614063', '1,214,793,943.92'
;;;
data want;
set test;
v1=input(value1,comma32.);
v2=input(value2,comma32.);
v3=input(value3,comma32.);
v4=input(value4,comma32.);
run;
You're correct. I forget a few single quotes when supplying my test data.
I've changed that and reran the code using both comma24. and comma32. as a format. The results are what I would expect, which is good. This seems to have solved my problem, thanks. Not really sure why when I tried that format earlier I was having issues.
Any way, thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.