Hi,
I have a SAS data set with that has two variables
1) a variable that contains the file number
2) a variable contains a text with a clinical diagnosis.
The file number is found in its own variable but can also be found at the beginning of the diagnosis text.
I would like to extract only what is after the file number in the diagnosis variable. Sometimes, there is a space bar between the file number and the diagnosis description while sometimes, there no space.
Is there a way to tell SAS that when the content of file_number is found in the diagnosis variable, only keep what is after that number?
Here is an exemple of what I have :
file_number | diagnosis |
123456789 | file:123456789The patient has… |
987654321 | file:987654321 The patient has… |
Here is what I want :
file_number | diagnosis |
123456789 | The patient has… |
987654321 | The patient has… |
Can anyone help me with that?
Thank you.
OK, this should do it then:
diagnosis = substr(diagnosis, length(diagnosis) + index(diagnosis, strip(file_number)));
Give it a try just to verify.
Better yet:
diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));
Lots of people can help with that. But the programming would change depending on whether FILE_NUMBER is numeric or character. So let us know that. Also, is FILE_NUMBER always present or could it contain a missing value?
File number is a character and is always present for each observation. Thank you
OK, this should do it then:
diagnosis = substr(diagnosis, length(diagnosis) + index(diagnosis, strip(file_number)));
Give it a try just to verify.
Better yet:
diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));
Thank you Astounding!
I used your idea with a small modification and it worked perfectly. Here is what I did :
diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));
Cheers
Take a look at the TRANWRD function which will replace a word with specified value, in this case, a blank string.
There are several examples at the bottom that illustrate how they can be used.
Yes, here is one way of approaching this question:
*** starting position of filenumber inside the diagnosis;
startFinD = FIND(diagnosis, filenumber);
*** position of the first letter (or blank) of the new diagnosis;
startNewD = startFinD + 9;
*** new diagnosis variable;
diagnosis = STRIP[ SUBSTR(diagnosis, startNewD) ];
data have;
input file_number :$10. diagnosis & $100.;
cards;
123456789 file:123456789The patient has…
987654321 file:987654321 The patient has…
;
data want;
set have;
substr(diagnosis,1,14)=' ';
diagnosis=strip(diagnosis);
run;
This assumes the pattern in diagnosis variable file:10 digit numbers is consistent
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.