Hi,
I have a SAS data set with that has two variables
1) a variable that contains the file number
2) a variable contains a text with a clinical diagnosis.
The file number is found in its own variable but can also be found at the beginning of the diagnosis text.
I would like to extract only what is after the file number in the diagnosis variable. Sometimes, there is a space bar between the file number and the diagnosis description while sometimes, there no space.
Is there a way to tell SAS that when the content of file_number is found in the diagnosis variable, only keep what is after that number?
Here is an exemple of what I have :
file_number | diagnosis |
123456789 | file:123456789The patient has… |
987654321 | file:987654321 The patient has… |
Here is what I want :
file_number | diagnosis |
123456789 | The patient has… |
987654321 | The patient has… |
Can anyone help me with that?
Thank you.
OK, this should do it then:
diagnosis = substr(diagnosis, length(diagnosis) + index(diagnosis, strip(file_number)));
Give it a try just to verify.
Better yet:
diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));
Lots of people can help with that. But the programming would change depending on whether FILE_NUMBER is numeric or character. So let us know that. Also, is FILE_NUMBER always present or could it contain a missing value?
File number is a character and is always present for each observation. Thank you
OK, this should do it then:
diagnosis = substr(diagnosis, length(diagnosis) + index(diagnosis, strip(file_number)));
Give it a try just to verify.
Better yet:
diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));
Thank you Astounding!
I used your idea with a small modification and it worked perfectly. Here is what I did :
diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));
Cheers
Take a look at the TRANWRD function which will replace a word with specified value, in this case, a blank string.
There are several examples at the bottom that illustrate how they can be used.
Yes, here is one way of approaching this question:
*** starting position of filenumber inside the diagnosis;
startFinD = FIND(diagnosis, filenumber);
*** position of the first letter (or blank) of the new diagnosis;
startNewD = startFinD + 9;
*** new diagnosis variable;
diagnosis = STRIP[ SUBSTR(diagnosis, startNewD) ];
data have;
input file_number :$10. diagnosis & $100.;
cards;
123456789 file:123456789The patient has…
987654321 file:987654321 The patient has…
;
data want;
set have;
substr(diagnosis,1,14)=' ';
diagnosis=strip(diagnosis);
run;
This assumes the pattern in diagnosis variable file:10 digit numbers is consistent
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.