Help using Base SAS procedures

Converting character format to numeric format

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Converting character format to numeric format

[ Edited ]

Hello,

 

I am brand new to SAS (am more expierenced with STATA) and am having difficulty converting character values to numeric values. 

 

I am doing survival analysis, and have a column of "time_to_event" that is in days. When I import the data to SAS from an excel sheet, all variables important in character format, which prevents the survival analysis from working. 

 

I have tried the following:

 

Data practice;
set work.import1;
format tte BEST6.;
tte=input(time_to_event, $5.);
run;
 
It generates a new SAS table, but I get an error for the conversion. The log shows the following :

"NOTE: Character values have been converted to numeric values at the places given by: (Line)Smiley SadColumn).
59:6
NOTE: Invalid numeric data, '.617' , at line 59 column 6
time_to_event=617 status=1 location=!!!! sex=Male age_group=<=20 region=Quebec histology=Diffuse astrocytoma tte=. _ERROR_=1 _N_=1"
 
Does anyone have any ideas how to fix this, or know a better way of going about it? There is a forum thread from 2012 about doing thsi conversion, but the all of the provided solutions produce the same error. 

Accepted Solutions
Solution
‎10-05-2015 11:02 PM
Super User
Posts: 5,082

Re: Converting character format to numeric format

OK, here is what I would try.  In the context of a DATA step:

 

data want;

set have;

tte = input(compress(time_to_event, '0A'x), 5.);

run;

 

That should remove the extra character, then convert what is left to numeric.

 

Good luck!

View solution in original post


All Replies
Super User
Posts: 5,082

Re: Converting character format to numeric format

You're very close ... just get rid of the dollar sign:  tte = input(time_to_event, 5.);

Contributor
Posts: 22

Re: Converting character format to numeric format

Hmm... I tried that, and it it ends up with the same result but a different error: 

 

"NOTE: Invalid argument to function INPUT at line 59 column 6.

time_to_event=617 status=1 location=North sex=Male age_group=<=20 region=Quebec histology=Diffuse astrocytoma tte=. _ERROR_=1 _N_=1"
Super User
Posts: 3,105

Re: Converting character format to numeric format

Check the variable type of time_to_event. Your error suggests that it may be numeric already.

Super User
Posts: 5,082

Re: Converting character format to numeric format

Two "next steps" then to help diagnose the problem ...  First, show the area of the log that contains line 59.  It seems unusual that column 6 would begin the invalid value that is mentioned by the error message.  Second, assuming that time_to_event is actually a character field, for _n_=1 display time_to_event in two forms just to confirm what it contains:  put time_to_event time_to_event $hex8.;

 

Contributor
Posts: 22

Re: Converting character format to numeric format

I have attached a screenshot of the error. I also don't really know why is referring to column 7, as column 7 is histology and time_to_event is column 1. 

 

To answer the other person's comment, the properties show time_to_event being:

Type: Char

Format: $5.
Informat: $5.

 

For the put time_to_event time_to_event $hex8.;   I'm not sure where you want me to put this. Running it on its own returns "ERROR 180-322: Statement is not valid or it is used out of proper order." 


Error.png
Super User
Posts: 5,082

Re: Converting character format to numeric format

The hex format will help determine if the decimal point we are seeing is not really a decimal point but some other unprintable character.  Let's see the results you get in this DATA step ("have" should be your incoming SAS data set name):   data _null_; set have; if _n_=1 then put time_to_event time_to_event $hex10.; else stop; run;

Contributor
Posts: 22

Re: Converting character format to numeric format

Here are the results of that data step. 
 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 data _null_;
57 set import;
58 if _n_=1 then put time_to_event time_to_event $hex10.;
59 else stop;
60 run;
 
617 0A36313720
NOTE: There were 2 observations read from the data set WORK.IMPORT.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
 
 
61
62 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
74
Super User
Posts: 5,082

Re: Converting character format to numeric format

OK, that narrows down the scope of the problem considerably.  The "." in ".617" is not a decimal point, but is some other character.  The next question becomes, "How would you like to handle it?"  Should all the numeric values ignore the "." or should the "." be treated as if it were a decimal point?  (It might help simplify the solution if we knew whether the "." always appears at the beginning of each time_to_event, but we can code around it if the "." is only there occasionally.)

Contributor
Posts: 22

Re: Converting character format to numeric format

Interesting, I wonder why that is happening as there doesn't appear to be anything when looking at the table or the orginal excel. 

 

But to answer your question, I do not want any decimal point as all values should be in whole days. I don't know if it is there for all values, as the log only shows the first 20 attempts and there are no decimals in the table (there are 10,000 entries), but all 20 of them are shown in the log have the decimal. 

Solution
‎10-05-2015 11:02 PM
Super User
Posts: 5,082

Re: Converting character format to numeric format

OK, here is what I would try.  In the context of a DATA step:

 

data want;

set have;

tte = input(compress(time_to_event, '0A'x), 5.);

run;

 

That should remove the extra character, then convert what is left to numeric.

 

Good luck!

Contributor
Posts: 22

Re: Converting character format to numeric format

Yes, thank you!

I still don't know why there was some invisible character, but this solved it. Thanks for all your help.

Just for the sake of learning, what does the '0A'x mean/do?
Super User
Posts: 5,082

Re: Converting character format to numeric format

Every character can be represented by a two-character hexadecimal code.  Your earlier experiment wrote out time_to_event in hexadecimal form, and found that the "." could be represented the hexadecimal code 0A.  So we're using that information to remove the "." with the COMPRESS function.

Super User
Posts: 10,500

Re: Converting character format to numeric format


Jordan88 wrote:

I have attached a screenshot of the error. I also don't really know why is referring to column 7, as column 7 is histology and time_to_event is column 1. 

 

 


The error message is referring to the position within the line of code. In this case likely the position of the start of the Input function. It points to the first place in a line that the operation cannont be completed due to any error.

 

If the data actually contains the quotes that is another reason for the  conversion to fail. That is why the error shows invalid numeric value '.615'

Contributor
Posts: 22

Re: Converting character format to numeric format

Oh ok, that makes sense, thank you. 

 

And the data itself does not contain the quotes. I have attached a screenshot of the data.


Error.png
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 641 views
  • 6 likes
  • 5 in conversation