BookmarkSubscribeRSS Feed
Priyamvada07
Obsidian | Level 7

Hello!

 

In the following SAS dataset

 

DATA EXAMPLE1;

INPUT Names $ 30;

DATALINES;

**RMPQ

Dr. AARON RAY

AARON,RAY MD

RAY,AARON MDS

PMD RAY ARON

AARON MD RAY

SUNSET MED CTR

;

I want to get another variable or dataset in which there are person names and NO medical facility or arbitrary alphanumeric characters.

 

I have used the following to code to create a new subset

DATA EXAMPLE1;

INPUT Names $ 30;

DATALINES;

**RMPQ

Dr. AARON RAY

AARON,RAY MD

RAY,AARON MDS

PMD RAY ARON

AARON MD RAY

AARON RAY

SUNSET MED CTR

;

DATA EXAMPLE_COPY;

SET EXAMPLE1;

IF (INDEXW(NAMES ,'Dr.') OR INDEXW(NAMES,'MD') OR INDEXW(NAMES ,'MDS') OR INDEXW(NAMES ,'PMD')) GT 0 THEN SELECTION = 'YES';

ELSE SELECTION = 'NO';

PROC PRINT DATA=EXAMPLE_COPY;

TITLE "Listing of Data Set w DOCTOR NAMES";

ID NAMES ;

VAR Selection;

RUN;

But this leaves out the person names without MD, MDS etc. It would be preferable if I can select person names out of my variable.

 

My desired output

Names                          Selection

**RMPQ                         NO

Dr. AARON RAY            YES

AARON,RAY MD          YES

RAY,AARON MDS        YES

PMD RAY ARON          YES

AARON MD RAY          YES

AARON RAY                 YES  

SUNSET MED CTR      NO

 

OR any other way of getting person names.

Any help will be appreciated.

 

 

 

 

 

3 REPLIES 3
andreas_lds
Jade | Level 19

What are "person names"? The answer depends on largely on the area you live in. And there are of course names that are more difficult to be recognized as such, like "X Æ A-12".

 

EDIT

Conclusion: even with a long list of words that clearly identify a name a person (or as not-person) it is very, very unlikely that you don't get some false-positives.

Shmuel
Garnet | Level 18

Next code separates the degree from the names.

Please check next code - is this what you want?

DATA EXAMPLE1;
  INPUT Names $char30.;
DATALINES;
**RMPQ
Dr. AARON RAY
AARON,RAY MD
RAY,AARON MDS
PMD RAY ARON
AARON MD RAY
AARON RAY
SUNSET MED CTR
;
run;

DATA EXAMPLE_COPY;
  SET EXAMPLE1;
  length degree $8;
  array dr {4} $ ('DR.', 'MD', 'MDS', 'PMD');
  do i=1 to dim(dr);
  	 degree = dr(i); 
     put _N_= names= degree=;
     if indexw(names,dr(i),' .') then do; 
	    names = compbl(tranwrd(names,strip(dr(i)),' '));
		leave;
	 end;	
   end;
   drop i dr1-dr4;
run;
ballardw
Super User

Do you have a list of likely medical centers in all the various ways that people are likely to enter the name?

It may be easier to create a look up for those than to attempt a filter on person names.

 

And why the heck are the ** in the **RMPQ value?

 

BTW there is no point that I would trust a search for DR or DOCTOR to identify people. Consider DR JONES Memorial Clinic or such names.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 965 views
  • 3 likes
  • 4 in conversation