I have a variable as participants last name. There are names like:
Jose Cruz Jr
Jose Cruz Jr.
Jose Cruz Jr Jr
Jose Jrare
Also, like these
De la Rosa
Maria De la Rosa
Maria De Rosa
De Rosa
Vorde Rosa
Rosa Deloy
My question is how I can get rid of the Jr or Jr Jr only at the end of the string, but not in the middle as in 'Jose Jrare'? Also, I want to get rid of De in 'De Rosa', but not in 'Vorde Rosa' or 'Rosa Deloy'.
Thank you!
I think this calls for regular expressions. Let me know if you want me to explain what's going on. I may edit and add descriptions.
data have;
infile datalines dsd;
input name :$25.;
datalines;
Jose Cruz Jr
Jose Cruz Jr.
Jose Cruz Jr Jr
Jose Jrare
De la Rosa
Maria De la Rosa
Maria De Rosa
De Rosa
Vorde Rosa
Rosa Deloy
;
run;
data want;
set have;
/* Forgot to add the number. If Jr exists do a separate process. */
if find(name, "Jr") > 0 then do;
/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
/* I group all the Jr's together with parentheses to treat them as one. */
/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
/* I then replace it with a blank by specifying nothing after $/ */
/* Also trimming names so there isn't any trailing blanks. */
new_name = prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name));
end;
/* If it isn't Jr. do another process. */
else do;
/* Again, prxchange or substituting starts with "/s */
/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
/* Replace it with a space / / */
/* Also trimming names so there isn't any trailing blanks. */
new_name = prxchange("s/(\s\De\s|^De)/ /", -1, trim(name));
end;
run;
name | new_name |
---|---|
Jose Cruz Jr | Jose Cruz |
Jose Cruz Jr. | Jose Cruz |
Jose Cruz Jr Jr | Jose Cruz |
Jose Jrare | Jose Jrare |
De la Rosa | la Rosa |
Maria De la Rosa | Maria la Rosa |
Maria De Rosa | Maria Rosa |
De Rosa | Rosa |
Vorde Rosa | Vorde Rosa |
Rosa Deloy | Rosa Deloy |
Ran this on my local copy of SAS and had some formatting issues. Added a left/trim to left align and trim blanks.
data want;
set have;
/* Forgot to add the number. If Jr exists do a separate process. */
if find(name, "Jr") > 0 then do;
/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
/* I group all the Jr's together with parentheses to treat them as one. */
/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
/* I then replace it with a blank by specifying nothing after $/ */
/* Also trimming names so there isn't any trailing blanks. */
new_name = left(trim(prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name))));
end;
/* If it isn't Jr. do another process. */
else do;
/* Again, prxchange or substituting starts with "/s */
/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
/* Replace it with a space / / */
/* Also trimming names so there isn't any trailing blanks. */
new_name = left(trim(prxchange("s/(\s\De\s|^De)/ /", -1, trim(name))));
end;
run;
I think this calls for regular expressions. Let me know if you want me to explain what's going on. I may edit and add descriptions.
data have;
infile datalines dsd;
input name :$25.;
datalines;
Jose Cruz Jr
Jose Cruz Jr.
Jose Cruz Jr Jr
Jose Jrare
De la Rosa
Maria De la Rosa
Maria De Rosa
De Rosa
Vorde Rosa
Rosa Deloy
;
run;
data want;
set have;
/* Forgot to add the number. If Jr exists do a separate process. */
if find(name, "Jr") > 0 then do;
/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
/* I group all the Jr's together with parentheses to treat them as one. */
/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
/* I then replace it with a blank by specifying nothing after $/ */
/* Also trimming names so there isn't any trailing blanks. */
new_name = prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name));
end;
/* If it isn't Jr. do another process. */
else do;
/* Again, prxchange or substituting starts with "/s */
/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
/* Replace it with a space / / */
/* Also trimming names so there isn't any trailing blanks. */
new_name = prxchange("s/(\s\De\s|^De)/ /", -1, trim(name));
end;
run;
name | new_name |
---|---|
Jose Cruz Jr | Jose Cruz |
Jose Cruz Jr. | Jose Cruz |
Jose Cruz Jr Jr | Jose Cruz |
Jose Jrare | Jose Jrare |
De la Rosa | la Rosa |
Maria De la Rosa | Maria la Rosa |
Maria De Rosa | Maria Rosa |
De Rosa | Rosa |
Vorde Rosa | Vorde Rosa |
Rosa Deloy | Rosa Deloy |
Ran this on my local copy of SAS and had some formatting issues. Added a left/trim to left align and trim blanks.
data want;
set have;
/* Forgot to add the number. If Jr exists do a separate process. */
if find(name, "Jr") > 0 then do;
/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
/* I group all the Jr's together with parentheses to treat them as one. */
/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
/* I then replace it with a blank by specifying nothing after $/ */
/* Also trimming names so there isn't any trailing blanks. */
new_name = left(trim(prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name))));
end;
/* If it isn't Jr. do another process. */
else do;
/* Again, prxchange or substituting starts with "/s */
/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
/* Replace it with a space / / */
/* Also trimming names so there isn't any trailing blanks. */
new_name = left(trim(prxchange("s/(\s\De\s|^De)/ /", -1, trim(name))));
end;
run;
Thank you! Please do explain it a bit.
Thank you so much!
Just saw your updated response!
Good thing is I found more abnormal patterns in my data and will probably come back with more questions.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.