Hello Everyone:
I have a column with names, but there are extra spaces in the names. I provided a sample dataset below. My goal is to remove the space between single letters and combine them into one word. However, I do not want to merge the single-letters with multi-letter words. Is it possible to write a script such that if it will search in a string and stop where a multi-letter word first appears (for example, the word Supply in the first case) and it will merge all the letters to the left of it? I would really appreciate your help with step.
Thank you so much
| Actual | Desired | 
| A B Supply Inc | AB Supply Inc | 
| A B C Machines Inc | ABC Machines Inc | 
| A B C D Operations Inc | ABCD Operations Inc | 
| A B C D E US Inc | ABCDE US Inc | 
data have;
 infile datalines truncover;
 input str $40.;
 datalines;
A b c d e  
A B Supply Inc
A B C Machines Inc
A B C D Operations Inc
A B C D E US Inc
;
data want;
  set have;
  length want_str $ 200;
  do i=1 to countw(str,' ');
    temp=scan(str,i,' ');
    if length(temp)=1 then want_str=cats(want_str,temp);
	 else want_str=catx(' ',want_str,temp);
  end;
run;This assumes all the singles letters come first as per your data.
We find the first set of 2 consecutive letters and compress what comes before:
POS    = prxmatch('/[a-z]{2}/i',STR); * The final i makes this match case-insensitive;
NEWSTR = compress(substr(STR,1,POS-2)) || ' ' || substr(STR,POS) ;Thank you so much, ChrisNZ for this code! I really appreciate it. When I used it, it was able to combine the letters the way I wanted, but it produces errors when I have names that do not need to be modified. For example, it converts "ABC Supply" to "ABCSupply ABC Supply" whereas the desired outcome here is the same as the actual outcome (ABC Supply). Is there a way to remedy this?
On a separate note, sometimes I have single letters in different parts of the names such as ABC U S Supply, which should be changed to ABC US Supply. However, there are not many of those.
Thank you so much!
It should be possible to use a single regex with prxchange to get the required result, unfortunately those expressions need the "g" switch which is not supported. See https://stackoverflow.com/questions/4228750/removing-spaces-between-single-letters. Maybe someone else, with more regex experience can fix the problem. Using -1 in prxchange doesn't help because on subsequent iterations the first conditions is not fulfilled any more due to concatenations that took place.
@andreas_lds Not that I believe the data the OP posted are sufficiently representative but for what you mention and the sample data provided a look-ahead could do the job.
data have;
 infile datalines truncover;
 input str $40.;
 datalines;
A b c d e  
A B Supply Inc
A B C Machines Inc
A B C D Operations Inc
A B C D E US Inc
;
data want;
  set have;
  str=prxchange('s/(\w)\s+(?=\w(\s|$))/$1/oi',-1,strip(str));
run;
Thank you so much for your help, Patrick! I will try your suggestion.
data have;
 infile datalines truncover;
 input str $40.;
 datalines;
A b c d e  
A B Supply Inc
A B C Machines Inc
A B C D Operations Inc
A B C D E US Inc
;
data want;
  set have;
  length want_str $ 200;
  do i=1 to countw(str,' ');
    temp=scan(str,i,' ');
    if length(temp)=1 then want_str=cats(want_str,temp);
	 else want_str=catx(' ',want_str,temp);
  end;
run;Thank you so much for your help, Ksharp! I will try your suggestion.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
