Hello!
In the following SAS dataset
DATA EXAMPLE1;
INPUT Names $ 30;
DATALINES;
**RMPQ
Dr. AARON RAY
AARON,RAY MD
RAY,AARON MDS
PMD RAY ARON
AARON MD RAY
SUNSET MED CTR
;
I want to get another variable or dataset in which there are person names and NO medical facility or arbitrary alphanumeric characters.
I have used the following to code to create a new subset
DATA EXAMPLE1;
INPUT Names $ 30;
DATALINES;
**RMPQ
Dr. AARON RAY
AARON,RAY MD
RAY,AARON MDS
PMD RAY ARON
AARON MD RAY
AARON RAY
SUNSET MED CTR
;
DATA EXAMPLE_COPY;
SET EXAMPLE1;
IF (INDEXW(NAMES ,'Dr.') OR INDEXW(NAMES,'MD') OR INDEXW(NAMES ,'MDS') OR INDEXW(NAMES ,'PMD')) GT 0 THEN SELECTION = 'YES';
ELSE SELECTION = 'NO';
PROC PRINT DATA=EXAMPLE_COPY;
TITLE "Listing of Data Set w DOCTOR NAMES";
ID NAMES ;
VAR Selection;
RUN;
But this leaves out the person names without MD, MDS etc. It would be preferable if I can select person names out of my variable.
My desired output
Names Selection
**RMPQ NO
Dr. AARON RAY YES
AARON,RAY MD YES
RAY,AARON MDS YES
PMD RAY ARON YES
AARON MD RAY YES
AARON RAY YES
SUNSET MED CTR NO
OR any other way of getting person names.
Any help will be appreciated.
What are "person names"? The answer depends on largely on the area you live in. And there are of course names that are more difficult to be recognized as such, like "X Æ A-12".
EDIT
Conclusion: even with a long list of words that clearly identify a name a person (or as not-person) it is very, very unlikely that you don't get some false-positives.
Next code separates the degree from the names.
Please check next code - is this what you want?
DATA EXAMPLE1;
INPUT Names $char30.;
DATALINES;
**RMPQ
Dr. AARON RAY
AARON,RAY MD
RAY,AARON MDS
PMD RAY ARON
AARON MD RAY
AARON RAY
SUNSET MED CTR
;
run;
DATA EXAMPLE_COPY;
SET EXAMPLE1;
length degree $8;
array dr {4} $ ('DR.', 'MD', 'MDS', 'PMD');
do i=1 to dim(dr);
degree = dr(i);
put _N_= names= degree=;
if indexw(names,dr(i),' .') then do;
names = compbl(tranwrd(names,strip(dr(i)),' '));
leave;
end;
end;
drop i dr1-dr4;
run;
Do you have a list of likely medical centers in all the various ways that people are likely to enter the name?
It may be easier to create a look up for those than to attempt a filter on person names.
And why the heck are the ** in the **RMPQ value?
BTW there is no point that I would trust a search for DR or DOCTOR to identify people. Consider DR JONES Memorial Clinic or such names.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.