BookmarkSubscribeRSS Feed
thanikondharish
Fluorite | Level 6

data s ;
name='hfisfae afwa ffjeudad judsh sewla' ;

run;

 

i want to all positions of  ' a '  letter in above string

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post fully and complete questions in future, as requested a few times.  Show what you want the output to look like, do you want a variable for each position, a row for each position, the last one, the first one?

novinosrin
Tourmaline | Level 20
data s ;
name='hfisfae afwa ffjeudad judsh sewla' ;

run;

%let l=a;
data want;
set s;
do position=1 by 1 until(position=length(name));
l=char(name,position);
if l="&l" then output;
end;
run;

hashman
Ammonite | Level 13

@novinosrin; You could code the loop a bit simpler:

 

 

do position = 1 to length (name) ;

But it's a minor thing ;). I've been thinking more in terms of whether it's necessary to scan through the whole target string to locate a few relatively rare characters. Obviously, if the string contains mostly "a"s (~60 per cent, per my testing), it's faster just to scan all the way, like you do, rather than resort to something else.

 

But suppose that the search-for character is in the minority - as is the case with "a' in 'hfisfae afwa ffjeudad judsh sewla'. Since the algorithms behind the string search functions are way faster than the linear scan, it may be expected that they could offer some advantage. Here's the basic idea:

 

1. Search for the given character.

2. If found, search again from the next higher position.    

3. Otherwise, stop.

 

This way, we progress to the next lookup stage without having to examine every position one at a time. What makes it possible is the FINDC function's ability to begin search from a given position (by contrast, INDEXC cannot do that). Thus:

 

data _null_ ;                                            
  retain ch "a" str "hfisfae afwa ffjeudad judsh sewla" ;
  do pos = findc (str, ch) by 0 while (pos) ;            
    put pos @ ;                                          
    pos = findc (str, ch, pos + 1) ;                     
  end ;                                                  
run ;                                                    

Which duly prints:

6 9 12 20 33

We can now compare the two algorithms against our test string by running each, say, 10m times:

data _null_ ;                                              
  retain c "a" s "hfisfae afwa ffjeudad judsh sewla" N 1E7 
  t = time() ;                                             
  do k = 1 to N ;                                          
    link lscan ;                                           
  end ;                                                    
  t1 = time() - t ;                                        
  t = time() ;                                             
  do k = 1 to N ;                                          
    link findc ;                                           
  end ;                                                    
  t2 = time() - t ;                                        
  put t1= t2= ;                                            
  stop ;                                                   
  findc: do p = findc (s, c) by 0 until (p=0) ;            
           p = findc (s, c, p + 1) ;                       
         end ;                                             
  return ;                                                 
  lscan: do p = 1 to length (s) ;                          
           if char (s, p) = c then ;                       
         end ;                                             
run ;                                                      

As a result, I get time(findc):time(lscan)~1:3.5. It is then reduced to 1:1 when about 20-21 out of 33 characters are "a"s; and after that, your linear search becomes progressively faster. Quod erat inveniendum.

 

Paul D.  

 

 

 

novinosrin
Tourmaline | Level 20

Boss, Hmm Brilliant intuition. Also, I see some similarities of this to golf fun post where all of us were part of. 

 

To be honest, i thought of that idea earlier and I do not know why i didn't attempt it. Well perhaps, meant to learn from the boss 🙂

My initial idea was something like

 

 loop :  1 to countc(var,'a') ?*looping only the count of a and not having to go all length(letters)*/

 and findc or indexc 

end

 

Just didn't feel confident enough to suggest OP. Of course I am excited to notice you being more active here these days with extensive diligence posts helps me speed. 

hashman
Ammonite | Level 13

@novinosrin:

 

Actually, using COUNTC is a great idea. Though it won't make it any faster, it makes it more straightforward to code:

 

data _null_ ;                                            
  retain ch "a" str "hfisfae afwa ffjeudad judsh sewla" ;
  pos = 0 ;                                              
  do _n_ = 1 to countc (str, "a") ;                      
    pos = findc (str, ch, pos + 1) ;                     
    put pos @ ;                                          
  end ;                                                  
run ;                                                    

 

Best

Paul D.

thanikondharish
Fluorite | Level 6
finally it gives results on log or output window but i want to observations
should be
save in a dataset.
novinosrin
Tourmaline | Level 20

I can help you with petty things and you rather bother boss for harder stuff lol

so change 

data _null_ ;   

to 

data want ;   

 to save observations in a dataset, 

 

Ordinary blokes like me can do easy stuff. Let's bother boss for harder stuff 🙂

hashman
Ammonite | Level 13

It's easy.

Just change _null_ to the name of the data set you want and replace the entire PUT statement with the OUTPUT statement.

Voila.

 

Paul D.

ballardw
Super User

@thanikondharish wrote:
finally it gives results on log or output window but i want to observations
should be
save in a dataset.

You have been asked to show what the desired output should like. Please do not be surprised if the result of process does not match a NOT STATED output format.

 

Still not clear. Do you want one observation for each occurrence of the found letter? Do you want one observation with multiple variables holding the positions?

 

Show what the output should look like.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2035 views
  • 3 likes
  • 5 in conversation