BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SAS_New_User1
Obsidian | Level 7

I have a string 

example 1: abcdefgs:456:fdkweu/:START:41A:45583993:START:60A:6483829992:STOP:59:hagdsu2381ygu:73:

example 2:

hsdgsdgcj:64:54:adcg6edd:START:71A:274765765:START:84A:26357457454:STOP:59:2237fdguef

 

I want to extract any string between the 1st START and STOP strings i.e. the output should look like 

Output 1:

41A:45583993:START:60A:6483829992

Output 2:

71A:274765765:START:84A:26357457454

 

How do I resolve this? I tried using SCAN function but with no luck. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

You can use the INDEX function to find the first position of one string within another.

 

data have;
   input string :$100.;
datalines;
abcdefgs:456:fdkweu/:START:41A:45583993:START:60A:6483829992:STOP:59:hagdsu2381ygu:73:
hsdgsdgcj:64:54:adcg6edd:START:71A:274765765:START:84A:26357457454:STOP:59:2237fdguef
;

data want;
  set have;
  startpos=index(string,'START:');
  endpos  =index(string,':STOP:');
  length newstring $ 100;
  newstring = substr(string,startpos+6,endpos-(startpos+7));
run;

I left the variables startpos and endpos as examples. The startpos has to be shifted because is the first position of the "START:" so you need to add 6 characters, the length of "START:" to get the first character following the first occurence. The endpos needs to be adjusted as it is the original position in the string and you need to remove the characters that come before the START:.

The INDEX function will return 0 for the position if the string is not found. Best practice would be to do the SUBSTR only when both Statpos and Endpos are greater than 0.

You don't say what to do if you have a Start without a Stop. That should be considered.

 

View solution in original post

5 REPLIES 5
ballardw
Super User

You can use the INDEX function to find the first position of one string within another.

 

data have;
   input string :$100.;
datalines;
abcdefgs:456:fdkweu/:START:41A:45583993:START:60A:6483829992:STOP:59:hagdsu2381ygu:73:
hsdgsdgcj:64:54:adcg6edd:START:71A:274765765:START:84A:26357457454:STOP:59:2237fdguef
;

data want;
  set have;
  startpos=index(string,'START:');
  endpos  =index(string,':STOP:');
  length newstring $ 100;
  newstring = substr(string,startpos+6,endpos-(startpos+7));
run;

I left the variables startpos and endpos as examples. The startpos has to be shifted because is the first position of the "START:" so you need to add 6 characters, the length of "START:" to get the first character following the first occurence. The endpos needs to be adjusted as it is the original position in the string and you need to remove the characters that come before the START:.

The INDEX function will return 0 for the position if the string is not found. Best practice would be to do the SUBSTR only when both Statpos and Endpos are greater than 0.

You don't say what to do if you have a Start without a Stop. That should be considered.

 

novinosrin
Tourmaline | Level 20

data have;
length eg $500;
eg='abcdefgs:456:fdkweu/:START:41A:45583993:START:60A:6483829992:STOP:59:hagdsu2381ygu:73';
output;
eg='hsdgsdgcj:64:54:adcg6edd:START:71A:274765765:START:84A:26357457454:STOP:59:2237fdguef';
output;
run;

data want;
 set have;
 length want $200;
 n=index(eg,'START')+6;
 want=substr(eg,n,index(eg,'STOP')-n-1);
 drop n;
run;
Ksharp
Super User
data have;
   input string :$100.;
datalines;
abcdefgs:456:fdkweu/:START:41A:45583993:START:60A:6483829992:STOP:59:hagdsu2381ygu:73:
hsdgsdgcj:64:54:adcg6edd:START:71A:274765765:START:84A:26357457454:STOP:59:2237fdguef
;
data want;
 set have;
 pid=prxparse('/start.+stop/i');
 if prxmatch(pid,string) then do;
   call prxsubstr(pid,string,p,l);
   want=substr(string,p+6,l-11);
 end;
 drop pid p l;
run;
SAS_New_User1
Obsidian | Level 7

Thank you Members, All the solutions worked 

Kurt_Bremser
Super User

Or you simply loop through the string (the method for using SCAN):

data want;
set have;
length newstring $100;
_flag = 0;
do _i = 1 to countw(string,":");
  _word = scan(string,_i,":");
  if _word = "START" then _flag = 1;
  else if _word = "STOP" then _flag = 0;
  else if _flag then newstring = catx(":",newstring,_word);
end;
drop _:;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 996 views
  • 3 likes
  • 5 in conversation