DATA Step, Macro, Functions and more

Character to Numeric conversion issues

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 93
Accepted Solution

Character to Numeric conversion issues

Hi folks,

 

I have mixed numeric data formatted as character. The data values range from the single digits into the billions. Some of the data is null or missing, some has decimal places, some of the numeric characters have commas, etc... 

 

I'm having a really hard time converting these values to numeric with the correct format. I've tried almost all the numeric formats I can think of within an input statement. 

 

As an example:

value1a = input(value1,24.2);

value1a = input(value1,comma24.2);

value1a = input(value1,??24.2);

value1a = input(value1,??24.);

 

Some of these formats don't produce results for very large numbers, some don't produce results at all. The closest I've gotten was using input(value1,??24.) but this is adding decimals to all the values turing a value like 1280 into 12.80.

 

Any thoughts on this? 

 

Thanks!

 

 

data test;

infile datalines dsd truncover;

input

value1:$200.

value2:$200.

value3:$200.

value4:$200.     

;

 

datalines;

'1001322', '48,371,804.50', '91662', '2,295,968.99'

'1280', '73,393.39', '143, 3,656.94'

'2', 222, , '59888.99'

'255287', '22,401,559.53', '5432618', '210,129,208.53'

'735034', '34,031,056.78', '89640',

'1992923', '104,877,814.20', '5614063', '1,214,793,943.92'

;;;

 


Accepted Solutions
Solution
‎02-15-2018 02:56 PM
Super User
Posts: 6,899

Re: Character to Numeric conversion issues

It looks like some of the data was read in incorrectly (previously).  For example, it looks like on observation 2, VALUE3 should be 143 and VALUE4 should be 3,656.94. 

 

Once that is cleared up, the comma24. informat should work just fine.

View solution in original post


All Replies
Super User
Posts: 23,928

Re: Character to Numeric conversion issues

Try using the BEST32 informat instead and let SAS make a guess. I suspect it may do a better job.

Note that when you specify a informat of 24.2 it expects all values to have two decimal places, so 100 becomes 1.00

 


Ody wrote:

Hi folks,

 

I have mixed numeric data formatted as character. The data values range from the single digits into the billions. Some of the data is null or missing, some has decimal places, some of the numeric characters have commas, etc... 

 

I'm having a really hard time converting these values to numeric with the correct format. I've tried almost all the numeric formats I can think of within an input statement. 

 

As an example:

value1a = input(value1,24.2);

value1a = input(value1,comma24.2);

value1a = input(value1,??24.2);

value1a = input(value1,??24.);

 

Some of these formats don't produce results for very large numbers, some don't produce results at all. The closest I've gotten was using input(value1,??24.) but this is adding decimals to all the values turing a value like 1280 into 12.80.

 

Any thoughts on this? 

 

Thanks!

 

 

data test;

infile datalines dsd truncover;

input

value1:$200.

value2:$200.

value3:$200.

value4:$200.     

;

 

datalines;

'1001322', '48,371,804.50', '91662', '2,295,968.99'

'1280', '73,393.39', '143, 3,656.94'

'2', 222, , '59888.99'

'255287', '22,401,559.53', '5432618', '210,129,208.53'

'735034', '34,031,056.78', '89640',

'1992923', '104,877,814.20', '5614063', '1,214,793,943.92'

;;;

 


 

Frequent Contributor
Frequent Contributor
Posts: 93

Re: Character to Numeric conversion issues

Thanks for the feedback.

 

I tried using best32. and while some of the data converted accurately much of it did not. 

 

I seem to be running into an issue where data is missing or the numbers are large with decimals or commas.

 

Any other suggestions?

Super User
Posts: 6,899

Re: Character to Numeric conversion issues

[ Edited ]

Try the comma24. informat.  The ?? is optional.

Frequent Contributor
Frequent Contributor
Posts: 93

Re: Character to Numeric conversion issues

Posted in reply to Astounding

Thanks for the reply,

 

The format comma24. doesn't seem to work either. I tried that earlier just didnt list it as one of the option I tried.

 

Here is a copy of my log using the best32. format. 

 

Appreciate any insight into this. Thanks.

 


data test1;
set test;

value1a = input(value1,best32.);
value2a = input(value2,best32.);
value3a = input(value3,best32.);
value4a = input(value4,best32.);

run;


479  data test1;
480  set test;
481
482  value1a = input(value1,best32.);
483  value2a = input(value2,best32.);
484  value3a = input(value3,best32.);
485  value4a = input(value4,best32.);
486
487  run;

NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 485 column 11.
value1=1001322 value2=48,371,804.50 value3=91662 value4=2,295,968.99 value1a=1001322 value2a=.
value3a=91662 value4a=. _ERROR_=1 _N_=1
NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 484 column 11.
value1=1280 value2=73,393.39 value3=143, 3,656.94 value4=  value1a=1280 value2a=. value3a=.
value4a=. _ERROR_=1 _N_=2
NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 485 column 11.
value1=255287 value2=22,401,559.53 value3=5432618 value4=210,129,208.53 value1a=255287
value2a=. value3a=5432618 value4a=. _ERROR_=1 _N_=4
NOTE: Invalid argument to function INPUT at line 483 column 11.
value1=735034 value2=34,031,056.78 value3=89640 value4=  value1a=735034 value2a=. value3a=89640
value4a=. _ERROR_=1 _N_=5
NOTE: Invalid argument to function INPUT at line 483 column 11.
NOTE: Invalid argument to function INPUT at line 485 column 11.
value1=1992923 value2=104,877,814.20 value3=5614063 value4=214,793,943.92 value1a=1992923
value2a=. value3a=5614063 value4a=. _ERROR_=1 _N_=6
NOTE: Mathematical operations could not be performed at the following places. The results of
      the operations have been set to missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      5 at 483:11   1 at 484:11   3 at 485:11
NOTE: There were 6 observations read from the data set WORK.TEST.
NOTE: The data set WORK.TEST1 has 6 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds


NOTE: Remote submit to EMSVR complete.
Solution
‎02-15-2018 02:56 PM
Super User
Posts: 6,899

Re: Character to Numeric conversion issues

It looks like some of the data was read in incorrectly (previously).  For example, it looks like on observation 2, VALUE3 should be 143 and VALUE4 should be 3,656.94. 

 

Once that is cleared up, the comma24. informat should work just fine.

Super User
Posts: 2,037

Re: Character to Numeric conversion issues

data test;

infile datalines dsd truncover;

input

value1:$200.

value2:$200.

value3:$200.

value4:$200.     

;

 

datalines;
'1001322', '48,371,804.50', '91662', '2,295,968.99'
'1280', '73,393.39', '143, 3,656.94'
'2', 222, , '59888.99'
'255287', '22,401,559.53', '5432618', '210,129,208.53'
'735034', '34,031,056.78', '89640',
'1992923', '104,877,814.20', '5614063', '1,214,793,943.92'
;;;


data want;
set test;
v1=input(value1,comma32.);
v2=input(value2,comma32.);
v3=input(value3,comma32.);
v4=input(value4,comma32.);
run;

 
Frequent Contributor
Frequent Contributor
Posts: 93

Re: Character to Numeric conversion issues

Posted in reply to novinosrin

You're correct. I forget a few single quotes when supplying my test data.

 

I've changed that and reran the code using both comma24. and comma32. as a format. The results are what I would expect, which is good. This seems to have solved my problem, thanks. Not really sure why when I tried that format earlier I was having issues.

 

Any way, thanks.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 223 views
  • 2 likes
  • 4 in conversation