BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dera
Obsidian | Level 7

Hi,

 

I have a SAS data set with that has two variables

1) a variable that contains the file number

2) a variable contains a text with a clinical diagnosis.

 

The file number is found in its own variable but can also be found at the beginning of the diagnosis text.

I would like to extract only what is after the file number in the diagnosis variable. Sometimes, there is a space bar between the file number and the diagnosis description while sometimes, there no space.

 

Is there a way to tell SAS that when the content of file_number is found in the diagnosis variable, only keep what is after that number?

 

Here is an exemple of what I have :

 

file_numberdiagnosis
123456789file:123456789The patient has…
987654321file:987654321 The patient has…

 

Here is what I want :

 

file_numberdiagnosis
123456789The patient has…
987654321The patient has…

 

Can anyone help me with that?

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

OK, this should do it then:

 

diagnosis = substr(diagnosis, length(diagnosis) + index(diagnosis, strip(file_number)));

 

Give it a try just to verify.

 

Better yet:

 

diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));

View solution in original post

7 REPLIES 7
Astounding
PROC Star

Lots of people can help with that.  But the programming would change depending on whether FILE_NUMBER is numeric or character.  So let us know that.  Also, is FILE_NUMBER always present or could it contain a missing value?

dera
Obsidian | Level 7

File number is a character and is always present for each observation. Thank you

Astounding
PROC Star

OK, this should do it then:

 

diagnosis = substr(diagnosis, length(diagnosis) + index(diagnosis, strip(file_number)));

 

Give it a try just to verify.

 

Better yet:

 

diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));

dera
Obsidian | Level 7

Thank you Astounding!
I used your idea with a small modification and it worked perfectly. Here is what I did :

diagnosis = substr(diagnosis, length(file_number) + index(diagnosis, strip(file_number)));



Cheers

Reeza
Super User

Take a look at the TRANWRD function which will replace a word with specified value, in this case, a blank string. 

 

https://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p0pgemqcslm9uen1tvr5gcrusgrw.htm...

 

There are several examples at the bottom that illustrate how they can be used.

pink_poodle
Barite | Level 11

Yes, here is one way of approaching this question:

 

*** starting position of filenumber inside the diagnosis;

startFinD = FIND(diagnosis, filenumber); 

*** position of the first letter (or blank) of the new diagnosis;

startNewD = startFinD + 9;

*** new diagnosis variable;

diagnosis = STRIP[ SUBSTR(diagnosis, startNewD) ];

 

novinosrin
Tourmaline | Level 20
data have;
input file_number :$10.	diagnosis  & $100.;
cards;
123456789	file:123456789The patient has…
987654321	file:987654321 The patient has…
;

data want;
set have;
substr(diagnosis,1,14)=' ';
diagnosis=strip(diagnosis);
run;

 This assumes the pattern in diagnosis variable file:10 digit numbers is consistent

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2102 views
  • 1 like
  • 5 in conversation