Hi guys, I have a dataset with a column (i.e., a variable) with some values having .0. The decimal is always 0. I would like to remove .0. To do this, I used the following code: diagnosis_corrected = compress(DIAGNOSIS,'.'), but unfortunately values like 0726 become 726 and I would like they remain 0796. Like this, there are other values starting with 0. How can I remove all decimals while preserving cases like 0796?
data DB;
input ID :$20. DIAGNOSIS $20.;
cards;
0001 234.0
0001 4321.0
0002 V00080
0002 254
0003 876
0003 V98.0
0004 0796
0004 0543
run;
Note: the variable is a character variable
Thank you very much
Use the TRANWRD function
newvariable = tranwrd(oldvariable,".0","");
If you want to remove the decimals, simply assign a format like 8.0 to the variable. This assumes the variable is numeric, and I'm getting the impression that maybe its not numeric, it is character. Which is it?
Convert the character string to actual numbers, then assign a format like 8.0.
@NewUsrStat wrote:
I cannot. The variable must be char consistently with another dataset
I don't know what this means. You can work with numbers as numbers and then convert back to character to be consistent with another data set. But I don't even understand why someone has set up a system of data where numbers are characters, unless these are some sort of ID values, and if these are ID values, how did they get the .0 on the end?
So, basically there are the values of that variable that could be also like: V0008. This is why it should remain char.
I'm not sure I understand. Your explanations are very incomplete. Are you saying that you have a V0008.0 and you have to remove the .0 ??? Or are you saying something else?
Use the TRANWRD function
newvariable = tranwrd(oldvariable,".0","");
Thank you very much for your help. I tried with "tranwrd" but unfortunately 0796 becomes 796 while it should not change.
Please, when you try something and it doesn't work, don't say "it doesn't work" and stop there, with no other information provided. Please, always show us the code. Please always show us the output.
Looks like you are trying to clean up some old ICD-9 diagnosis codes?
The COMPRESS() function call you showed would have only removed the periods. It would not have removed the leading zero, like you showed.
Are you saying that you also need to convert strings that are too short to be valid ICD-9 code be assuming they are missing a leading zero and add it back?
Note that it is not uncommon to see invalid ICD-9 codes appear in real world data. Data entry is hard. And some hospital system might even be consistent at inventing their own variations on valid codes, like add the extra zero at the end on a code that does not actually need the extra zero.
Since there are only a finite number of valid ICD-9 codes it might be easier to make a table with all of the valid codes and also add in every misspelling you have found and determined how to map. For example you might do it by making a format (or even an informat) and use that to help you both clean and detect new strange codes.
For example is 0543 supposed to be:
054.3 Herpetic meningoencephalitis
or perhaps one of these two?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.