Hello,
I am brand new to SAS (am more expierenced with STATA) and am having difficulty converting character values to numeric values.
I am doing survival analysis, and have a column of "time_to_event" that is in days. When I import the data to SAS from an excel sheet, all variables important in character format, which prevents the survival analysis from working.
I have tried the following:
OK, here is what I would try. In the context of a DATA step:
data want;
set have;
tte = input(compress(time_to_event, '0A'x), 5.);
run;
That should remove the extra character, then convert what is left to numeric.
Good luck!
You're very close ... just get rid of the dollar sign: tte = input(time_to_event, 5.);
Hmm... I tried that, and it it ends up with the same result but a different error:
"NOTE: Invalid argument to function INPUT at line 59 column 6.
Check the variable type of time_to_event. Your error suggests that it may be numeric already.
Two "next steps" then to help diagnose the problem ... First, show the area of the log that contains line 59. It seems unusual that column 6 would begin the invalid value that is mentioned by the error message. Second, assuming that time_to_event is actually a character field, for _n_=1 display time_to_event in two forms just to confirm what it contains: put time_to_event time_to_event $hex8.;
I have attached a screenshot of the error. I also don't really know why is referring to column 7, as column 7 is histology and time_to_event is column 1.
To answer the other person's comment, the properties show time_to_event being:
Type: Char
Format: $5.
Informat: $5.
For the put time_to_event time_to_event $hex8.; I'm not sure where you want me to put this. Running it on its own returns "ERROR 180-322: Statement is not valid or it is used out of proper order."
The hex format will help determine if the decimal point we are seeing is not really a decimal point but some other unprintable character. Let's see the results you get in this DATA step ("have" should be your incoming SAS data set name): data _null_; set have; if _n_=1 then put time_to_event time_to_event $hex10.; else stop; run;
OK, that narrows down the scope of the problem considerably. The "." in ".617" is not a decimal point, but is some other character. The next question becomes, "How would you like to handle it?" Should all the numeric values ignore the "." or should the "." be treated as if it were a decimal point? (It might help simplify the solution if we knew whether the "." always appears at the beginning of each time_to_event, but we can code around it if the "." is only there occasionally.)
Interesting, I wonder why that is happening as there doesn't appear to be anything when looking at the table or the orginal excel.
But to answer your question, I do not want any decimal point as all values should be in whole days. I don't know if it is there for all values, as the log only shows the first 20 attempts and there are no decimals in the table (there are 10,000 entries), but all 20 of them are shown in the log have the decimal.
OK, here is what I would try. In the context of a DATA step:
data want;
set have;
tte = input(compress(time_to_event, '0A'x), 5.);
run;
That should remove the extra character, then convert what is left to numeric.
Good luck!
Every character can be represented by a two-character hexadecimal code. Your earlier experiment wrote out time_to_event in hexadecimal form, and found that the "." could be represented the hexadecimal code 0A. So we're using that information to remove the "." with the COMPRESS function.
@Jordan88 wrote:
I have attached a screenshot of the error. I also don't really know why is referring to column 7, as column 7 is histology and time_to_event is column 1.
The error message is referring to the position within the line of code. In this case likely the position of the start of the Input function. It points to the first place in a line that the operation cannont be completed due to any error.
If the data actually contains the quotes that is another reason for the conversion to fail. That is why the error shows invalid numeric value '.615'
Oh ok, that makes sense, thank you.
And the data itself does not contain the quotes. I have attached a screenshot of the data.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.