The database that I am working with records lab values as character variables beacuse lab techs sometimes write notes/comments instead of enter a numeric lab results.
I have been converting this variable to numeric by adding a 0 to it (i.e. character_var+0) ...this seems to do the trick, but am wondering if there are any downsides to doing this. I am only interested in the numeric measurements so other than ending up with missing fields where there were notes/comments are there any downsides to doing this?
This is known as implicit type conversion, basically having SAS automatically convert a character to a numeric and you'll likely have a note in the log that this is happening. It is normally considered bad programming practice and should be avoided. The normal character to numeric conversion in SAS is done through the INPUT function. If you know the format of the numeric lab values, you can use INPUT(lab value,informat.). If it is somewhat variable (sometimes 4 characters, sometimes 13, etc.) you can use the BEST. informat. References for the INPUT function and the BEST informat are available in SAS documentation if you want more background.
@K_S wrote:
unfortunately that is not an option, I have no ways of inputting the values
(or are just unaware of how it could be done).
The recommendation was not that you input the values. The recommendation was that you use the SAS function INPUT instead of adding zero to the value.
Some organizations with strict code management policies will require "clean" log results meaning no errors, warnings and sometimes even no notes. This process may cause a violation of that policy.
Personally if that were my data I would probably address this sort of issue at the data read step and either create to variables, If the notes were needed later or read as numeric to begin with and suppress the resulting "invalid data" messages that are going to insue.
I can only agree with the other posters. This uses implicit (i.e. your not specifying it, your letting the system guess it) conversion. Always a bad technique. Always make sure you - the person closest to it - is in complete control. Use the input() function.
While I generally agree, here are some tools to help cope with the situation.
numval = input(charval, ??20.);
This will convert the existing values to their numeric equivalent, if possible. However, adding ?? will suppress messages about invalid data if the original set of characters are not numeric.
For cleaning the data, or perhaps being more rigorous about what can be converted and what can't, you could try:
proc freq data=labdata;
tables charval;
where charval > ' ' and input(charval, ??20.) = .;
run;
This will give you a table of all the values that can't be converted, so you can inspect them and see if there is something you might be able to do with them.
What would be the resulting numeric from this conversion?
Please open new threads for new questions in future.
0 is a number
( is not a number
0.0 is a number
) is not a number
To get the two separate numbers then you need two variables, and string parse the string you have:
data abc;
  input a $ b $;
  c=input(scan(b,"(",1),best.);
  d=input(compress(scan(b,"(",2),")"),best.);
data lines;
Normal 0(0.0)
run;
c takes the first number (i.e. the text before the opening bracket) and converts it to numeric using best format.
d takes the second number (i.e. the text after the opening bracket), removes the closing bracket, then converts to number using best format.
I assume this is from clinical outputs where 0 is the count, and 0.0 is the percentage of population, you should be refering to the underlying data, not the produced output!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
