BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
genemroz
Pyrite | Level 9

I’m faced with the challenge to modify over 100K concatenated strings of the type:

“Los Lunas/Corona/Corona/White Mountain/White Mountain"

and it should be noted that some source strings may contain 2,3,or 4 each "Corona" or "White Mountain" sub-strings.

The desired string would reduce the number of instances of  “White Mountain” and “Corona” to one each, and should look like this:

“Los Lunas/Corona/White Mountain”

I’ve attached the code I used to accomplish this (with a hat tip to blogs by Leonid Batkhan) that shows all the steps.  But surely there must be a simpler, more efficient, more elegant way to achieve the desired result.  

Thanks in advance for any helpful suggestions or insights.

Happy Holidays!

Gene

data want;

concat_location="Los Lunas/Corona/Corona/White Mountain/White Mountain";

Ccount=count(concat_location,"Corona");
if Ccount>1 then do
/*remove extra "Corona"s  */
mod1= prxchange('s/Corona//',Ccount-1,concat_location); 
/* translate // to blanks */
mod2=translate(mod1,' ','/');
/* compress blanks */
mod3=compbl(mod2);
/* translate blanks to '/' */
mod4=translate(mod3,'/',' ');
/* Trim leading '/' */
P1=findc(mod4,'/','K');
mod5=substr(mod4,P1);
/* Trim trailing '/' */
P2=findc(mod5,'/','K',-length(mod5));
mod6=substr(mod5,1,P2);
end;

if find(mod6,'White/Mountain')then
mod6=tranwrd(mod6,"White/Mountain","White Mountain");

Wcount=count(mod6,"White Mountain");
if Wcount>1 then do
/* remove extra "White Mountain"s */
mod7= prxchange('s/White Mountain//',Wcount-1,mod6); 
/* translate '//'' to blanks */
mod8=translate(mod7,' ','/');
/* compress blanks */
mod9=compbl(mod8);
/* translate blanks to '/' */
mod10=translate(mod9,'/',' ');
/* Trim leading '/' */
P1=findc(mod10,'/','K');
mod11=substr(mod10,P1);
/* Trim trailing '/' */
P2=findc(mod11,'/','K',-length(mod5));
mod12=substr(mod11,1,P2);
end;

if find(mod12,"White/Mountain") then
mod12=tranwrd(mod12,"White/Mountain","White Mountain");
if find(mod12,"Los/Lunas") then
want=tranwrd(mod12,"Los/Lunas","Los Lunas");
run;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Just make a NEW list.

data have;
 old="Los Lunas/Corona/Corona/White Mountain/White Mountain";
run;

data want ;
  set have;
* trick to force data step to define NEW as same length as OLD ;  
  new=old;
* Prime the pump with the first word ;
  new=scan(old,1,'/');
* when next word is not in NEW then append it ;
  do i=2 to countw(old);
    if not findw(new,scan(old,i,'/'),'/','it') then 
       new=catx('/',new,scan(old,i,'/'))
    ;
  end;
  drop i;
run;

Results

 90         data _null_;
 91           set want;
 92           put (_all_) (=/);
 93         run;
 
 old=Los Lunas/Corona/Corona/White Mountain/White Mountain
 new=Los Lunas/Corona/White Mountain
 NOTE: There were 1 observations read from the data set WORK.WANT.

View solution in original post

4 REPLIES 4
SASKiwi
PROC Star

How about this?

data want;
  keep word;
  concat_location = "Los Lunas/Corona/Corona/White Mountain/White Mountain";
  do i = 1 to countw(concat_location,"/");
    word = scan(concat_location, i, "/");
    output;
  end;
run;

proc sort data = want nodupkey;
  by word;
run;

data want;
  keep concat_location_dedup;
  set want end = eof;
  retain concat_location_dedup;
  length concat_location_dedup $ 1024;
  concat_location_dedup = catx("/", concat_location_dedup, word);
  if eof;
run;
Patrick
Opal | Level 21

Below a possible approach.

data have;
  infile datalines truncover;
  input have_string $200.;
  datalines;
Los Lunas/Corona/Corona/White Mountain/White Mountain
;

data want(drop=_:);
  if _n_=1 then
    do;
      length _term $50;
      dcl hash h1();
      h1.defineKey('_term');
      h1.defineDone();
      
    end;
    
  set have;
  
  if 0 then want_string=have_string;
    
  do _i=1 to countc(have_string,'/')+1;
    _term=scan(have_string,_i,'/');
    if h1.check() ne 0 then
      do;
        want_string=catx('/',want_string,_term);
        _rc=h1.add();
      end;
  end;
  
  _rc=h1.clear();
 run;
 
 proc print data=want;
 run;

Patrick_0-1767052952578.png

 

 

Tom
Super User Tom
Super User

Just make a NEW list.

data have;
 old="Los Lunas/Corona/Corona/White Mountain/White Mountain";
run;

data want ;
  set have;
* trick to force data step to define NEW as same length as OLD ;  
  new=old;
* Prime the pump with the first word ;
  new=scan(old,1,'/');
* when next word is not in NEW then append it ;
  do i=2 to countw(old);
    if not findw(new,scan(old,i,'/'),'/','it') then 
       new=catx('/',new,scan(old,i,'/'))
    ;
  end;
  drop i;
run;

Results

 90         data _null_;
 91           set want;
 92           put (_all_) (=/);
 93         run;
 
 old=Los Lunas/Corona/Corona/White Mountain/White Mountain
 new=Los Lunas/Corona/White Mountain
 NOTE: There were 1 observations read from the data set WORK.WANT.
genemroz
Pyrite | Level 9

Once again, after struggling for hours over what seemed an intractable puzzle, all I have to do is post a question here, and three better solutions appear in short order.  I'll mark Tom's as the Accepted because it seems to be the cleverest and also because the ony thing I know about hash is off-topic.  

 

Thanks and Happy Holidays to all!

Gene

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 126 views
  • 2 likes
  • 4 in conversation