BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
finans_sas
Quartz | Level 8

Hello Everyone:

I have a column with names, but there are extra spaces in the names. I provided a sample dataset below. My goal is to remove the space between single letters and combine them into one word. However, I do not want to merge the single-letters with multi-letter words. Is it possible to write a script such that if it will search in a string and stop where a multi-letter word first appears (for example, the word Supply in the first case) and it will merge all the letters to the left of it? I would really appreciate your help with step.

Thank you so much

 

Actual Desired
A B Supply Inc AB Supply Inc
A B C Machines Inc ABC Machines Inc
A B C D Operations Inc ABCD Operations Inc
A B C D E US Inc ABCDE US Inc

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
 infile datalines truncover;
 input str $40.;
 datalines;
A b c d e  
A B Supply Inc
A B C Machines Inc
A B C D Operations Inc
A B C D E US Inc
;

data want;
  set have;
  length want_str $ 200;
  do i=1 to countw(str,' ');
    temp=scan(str,i,' ');
    if length(temp)=1 then want_str=cats(want_str,temp);
	 else want_str=catx(' ',want_str,temp);
  end;
run;

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

This assumes all the singles letters come first as per your data.

We find the first set of 2 consecutive letters and compress what comes before:

POS    = prxmatch('/[a-z]{2}/i',STR); * The final i makes this match case-insensitive;
NEWSTR = compress(substr(STR,1,POS-2)) || ' ' || substr(STR,POS) ;
finans_sas
Quartz | Level 8

Thank you so much, ChrisNZ for this code! I really appreciate it. When I used it, it was able to combine the letters the way I wanted, but it produces errors when I have names that do not need to be modified. For example, it converts "ABC Supply" to "ABCSupply ABC Supply" whereas the desired outcome here is the same as the actual outcome (ABC Supply). Is there a way to remedy this?

 

On a separate note, sometimes I have single letters in different parts of the names such as ABC U S Supply, which should be changed to ABC US Supply. However, there are not many of those.

 

 

Thank you so much!

andreas_lds
Jade | Level 19

It should be possible to use a single regex with prxchange to get the required result, unfortunately those expressions need the "g" switch which is not supported. See https://stackoverflow.com/questions/4228750/removing-spaces-between-single-letters. Maybe someone else, with more regex experience can fix the problem. Using -1 in prxchange doesn't help because on subsequent iterations the first conditions is not fulfilled any more due to concatenations that took place.

Patrick
Opal | Level 21

@andreas_lds  Not that I believe the data the OP posted are sufficiently representative but for what you mention and the sample data provided a look-ahead could do the job.

data have;
 infile datalines truncover;
 input str $40.;
 datalines;
A b c d e  
A B Supply Inc
A B C Machines Inc
A B C D Operations Inc
A B C D E US Inc
;

data want;
  set have;
  str=prxchange('s/(\w)\s+(?=\w(\s|$))/$1/oi',-1,strip(str));
run;

 

finans_sas
Quartz | Level 8

Thank you so much for your help, Patrick! I will try your suggestion.

Ksharp
Super User
data have;
 infile datalines truncover;
 input str $40.;
 datalines;
A b c d e  
A B Supply Inc
A B C Machines Inc
A B C D Operations Inc
A B C D E US Inc
;

data want;
  set have;
  length want_str $ 200;
  do i=1 to countw(str,' ');
    temp=scan(str,i,' ');
    if length(temp)=1 then want_str=cats(want_str,temp);
	 else want_str=catx(' ',want_str,temp);
  end;
run;
finans_sas
Quartz | Level 8

Thank you so much for your help, Ksharp! I will try your suggestion.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1976 views
  • 5 likes
  • 5 in conversation