BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jordan88
Obsidian | Level 7

Hello,

 

I am brand new to SAS (am more expierenced with STATA) and am having difficulty converting character values to numeric values. 

 

I am doing survival analysis, and have a column of "time_to_event" that is in days. When I import the data to SAS from an excel sheet, all variables important in character format, which prevents the survival analysis from working. 

 

I have tried the following:

 

Data practice;
set work.import1;
format tte BEST6.;
tte=input(time_to_event, $5.);
run;
 
It generates a new SAS table, but I get an error for the conversion. The log shows the following :

"NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
59:6
NOTE: Invalid numeric data, '.617' , at line 59 column 6
time_to_event=617 status=1 location=!!!! sex=Male age_group=<=20 region=Quebec histology=Diffuse astrocytoma tte=. _ERROR_=1 _N_=1"
 
Does anyone have any ideas how to fix this, or know a better way of going about it? There is a forum thread from 2012 about doing thsi conversion, but the all of the provided solutions produce the same error. 
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, here is what I would try.  In the context of a DATA step:

 

data want;

set have;

tte = input(compress(time_to_event, '0A'x), 5.);

run;

 

That should remove the extra character, then convert what is left to numeric.

 

Good luck!

View solution in original post

15 REPLIES 15
Astounding
PROC Star

You're very close ... just get rid of the dollar sign:  tte = input(time_to_event, 5.);

Jordan88
Obsidian | Level 7

Hmm... I tried that, and it it ends up with the same result but a different error: 

 

"NOTE: Invalid argument to function INPUT at line 59 column 6.

time_to_event=617 status=1 location=North sex=Male age_group=<=20 region=Quebec histology=Diffuse astrocytoma tte=. _ERROR_=1 _N_=1"
SASKiwi
PROC Star

Check the variable type of time_to_event. Your error suggests that it may be numeric already.

Astounding
PROC Star

Two "next steps" then to help diagnose the problem ...  First, show the area of the log that contains line 59.  It seems unusual that column 6 would begin the invalid value that is mentioned by the error message.  Second, assuming that time_to_event is actually a character field, for _n_=1 display time_to_event in two forms just to confirm what it contains:  put time_to_event time_to_event $hex8.;

 

Jordan88
Obsidian | Level 7

I have attached a screenshot of the error. I also don't really know why is referring to column 7, as column 7 is histology and time_to_event is column 1. 

 

To answer the other person's comment, the properties show time_to_event being:

Type: Char

Format: $5.
Informat: $5.

 

For the put time_to_event time_to_event $hex8.;   I'm not sure where you want me to put this. Running it on its own returns "ERROR 180-322: Statement is not valid or it is used out of proper order." 


Error.png
Astounding
PROC Star

The hex format will help determine if the decimal point we are seeing is not really a decimal point but some other unprintable character.  Let's see the results you get in this DATA step ("have" should be your incoming SAS data set name):   data _null_; set have; if _n_=1 then put time_to_event time_to_event $hex10.; else stop; run;

Jordan88
Obsidian | Level 7
Here are the results of that data step. 
 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 data _null_;
57 set import;
58 if _n_=1 then put time_to_event time_to_event $hex10.;
59 else stop;
60 run;
 
617 0A36313720
NOTE: There were 2 observations read from the data set WORK.IMPORT.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
 
 
61
62 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
74
Astounding
PROC Star

OK, that narrows down the scope of the problem considerably.  The "." in ".617" is not a decimal point, but is some other character.  The next question becomes, "How would you like to handle it?"  Should all the numeric values ignore the "." or should the "." be treated as if it were a decimal point?  (It might help simplify the solution if we knew whether the "." always appears at the beginning of each time_to_event, but we can code around it if the "." is only there occasionally.)

Jordan88
Obsidian | Level 7

Interesting, I wonder why that is happening as there doesn't appear to be anything when looking at the table or the orginal excel. 

 

But to answer your question, I do not want any decimal point as all values should be in whole days. I don't know if it is there for all values, as the log only shows the first 20 attempts and there are no decimals in the table (there are 10,000 entries), but all 20 of them are shown in the log have the decimal. 

Astounding
PROC Star

OK, here is what I would try.  In the context of a DATA step:

 

data want;

set have;

tte = input(compress(time_to_event, '0A'x), 5.);

run;

 

That should remove the extra character, then convert what is left to numeric.

 

Good luck!

Jordan88
Obsidian | Level 7
Yes, thank you!

I still don't know why there was some invisible character, but this solved it. Thanks for all your help.

Just for the sake of learning, what does the '0A'x mean/do?
Astounding
PROC Star

Every character can be represented by a two-character hexadecimal code.  Your earlier experiment wrote out time_to_event in hexadecimal form, and found that the "." could be represented the hexadecimal code 0A.  So we're using that information to remove the "." with the COMPRESS function.

ballardw
Super User

@Jordan88 wrote:

I have attached a screenshot of the error. I also don't really know why is referring to column 7, as column 7 is histology and time_to_event is column 1. 

 

 


The error message is referring to the position within the line of code. In this case likely the position of the start of the Input function. It points to the first place in a line that the operation cannont be completed due to any error.

 

If the data actually contains the quotes that is another reason for the  conversion to fail. That is why the error shows invalid numeric value '.615'

Jordan88
Obsidian | Level 7

Oh ok, that makes sense, thank you. 

 

And the data itself does not contain the quotes. I have attached a screenshot of the data.


Error.png

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 3161 views
  • 6 likes
  • 5 in conversation