BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GingerJJ
Obsidian | Level 7

I have a variable as participants last name. There are names like:

Jose Cruz Jr

Jose Cruz Jr.

Jose Cruz Jr Jr

Jose Jrare

 

Also, like these

De la Rosa

Maria De la Rosa

Maria De Rosa

De Rosa

Vorde Rosa

Rosa Deloy

 

 

 

My question is how I can get rid of the Jr or Jr Jr only at the end of the string, but not in the middle as in 'Jose Jrare'? Also, I want to get rid of De in 'De Rosa', but not in 'Vorde Rosa' or 'Rosa Deloy'.

Thank you!

 

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

I think this calls for regular expressions. Let me know if you want me to explain what's going on. I may edit and add descriptions.

data have;
infile datalines dsd;
input name :$25.;
datalines;
Jose Cruz Jr
Jose Cruz Jr.
Jose Cruz Jr Jr
Jose Jrare
De la Rosa
Maria De la Rosa
Maria De Rosa
De Rosa
Vorde Rosa
Rosa Deloy
;
run;

data want;
	set have;
		/* Forgot to add the number. If Jr exists do a separate process. */
		if find(name, "Jr") > 0 then do;
			/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
			/* I group all the Jr's together with parentheses to treat them as one. */
			/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
			/* I then replace it with a blank by specifying nothing after $/ */
			/* Also trimming names so there isn't any trailing blanks. */
			new_name = prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name));
		end;
			/* If it isn't Jr. do another process. */
			else do;
				/* Again, prxchange or substituting starts with "/s */
				/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
				/* Replace it with a space / / */
				/* Also trimming names so there isn't any trailing blanks. */
				new_name = prxchange("s/(\s\De\s|^De)/ /", -1, trim(name));
			end;
run;
name new_name
Jose Cruz Jr Jose Cruz
Jose Cruz Jr. Jose Cruz
Jose Cruz Jr Jr Jose Cruz
Jose Jrare Jose Jrare
De la Rosa la Rosa
Maria De la Rosa Maria la Rosa
Maria De Rosa Maria Rosa
De Rosa Rosa
Vorde Rosa Vorde Rosa
Rosa Deloy Rosa Deloy

 

Ran this on my local copy of SAS and had some formatting issues. Added a left/trim to left align and trim blanks.

 

data want;
	set have;
		/* Forgot to add the number. If Jr exists do a separate process. */
		if find(name, "Jr") > 0 then do;
			/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
			/* I group all the Jr's together with parentheses to treat them as one. */
			/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
			/* I then replace it with a blank by specifying nothing after $/ */
			/* Also trimming names so there isn't any trailing blanks. */
			new_name = left(trim(prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name))));
		end;
			/* If it isn't Jr. do another process. */
			else do;
				/* Again, prxchange or substituting starts with "/s */
				/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
				/* Replace it with a space / / */
				/* Also trimming names so there isn't any trailing blanks. */
				new_name = left(trim(prxchange("s/(\s\De\s|^De)/ /", -1, trim(name))));
			end;
run;

View solution in original post

4 REPLIES 4
maguiremq
SAS Super FREQ

I think this calls for regular expressions. Let me know if you want me to explain what's going on. I may edit and add descriptions.

data have;
infile datalines dsd;
input name :$25.;
datalines;
Jose Cruz Jr
Jose Cruz Jr.
Jose Cruz Jr Jr
Jose Jrare
De la Rosa
Maria De la Rosa
Maria De Rosa
De Rosa
Vorde Rosa
Rosa Deloy
;
run;

data want;
	set have;
		/* Forgot to add the number. If Jr exists do a separate process. */
		if find(name, "Jr") > 0 then do;
			/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
			/* I group all the Jr's together with parentheses to treat them as one. */
			/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
			/* I then replace it with a blank by specifying nothing after $/ */
			/* Also trimming names so there isn't any trailing blanks. */
			new_name = prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name));
		end;
			/* If it isn't Jr. do another process. */
			else do;
				/* Again, prxchange or substituting starts with "/s */
				/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
				/* Replace it with a space / / */
				/* Also trimming names so there isn't any trailing blanks. */
				new_name = prxchange("s/(\s\De\s|^De)/ /", -1, trim(name));
			end;
run;
name new_name
Jose Cruz Jr Jose Cruz
Jose Cruz Jr. Jose Cruz
Jose Cruz Jr Jr Jose Cruz
Jose Jrare Jose Jrare
De la Rosa la Rosa
Maria De la Rosa Maria la Rosa
Maria De Rosa Maria Rosa
De Rosa Rosa
Vorde Rosa Vorde Rosa
Rosa Deloy Rosa Deloy

 

Ran this on my local copy of SAS and had some formatting issues. Added a left/trim to left align and trim blanks.

 

data want;
	set have;
		/* Forgot to add the number. If Jr exists do a separate process. */
		if find(name, "Jr") > 0 then do;
			/* s/ starts the substition. It has a reciprocated / at the end to stop this process. */
			/* I group all the Jr's together with parentheses to treat them as one. */
			/* I then add the $, which looks specifically for the Jr. group at the end of the string. */
			/* I then replace it with a blank by specifying nothing after $/ */
			/* Also trimming names so there isn't any trailing blanks. */
			new_name = left(trim(prxchange("s/(Jr|Jr.|Jr Jr)$//", -1 , trim(name))));
		end;
			/* If it isn't Jr. do another process. */
			else do;
				/* Again, prxchange or substituting starts with "/s */
				/* I want to find two patterns: space (\s) De space (\s), or (|) a string that starts (^) with De. */
				/* Replace it with a space / / */
				/* Also trimming names so there isn't any trailing blanks. */
				new_name = left(trim(prxchange("s/(\s\De\s|^De)/ /", -1, trim(name))));
			end;
run;
GingerJJ
Obsidian | Level 7

Thank you! Please do explain it a bit.

maguiremq
SAS Super FREQ
Hey @GingerJJ, I did in the second program that I included in my post.

Regular expressions can be a bit unclear at first, but they’re extremely powerful at parsing strings with odd conditions.

This link may help as well.

https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
GingerJJ
Obsidian | Level 7

Thank you so much!

Just saw your updated response!

Good thing is I found more abnormal patterns in my data and will probably come back with more questions.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 630 views
  • 2 likes
  • 2 in conversation