BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hi Everyone,

 

I would like to find position of (the start) a word in my text variables.

 

So the start of the first myword is 3, the start of the second myword is 10.

 

Can you please help me with that?

 

Thank you.

 

HHCFX

 

<"myword myword: []/'\":+!@#$%& -0*&^% myword><W15ySpnsrCW1sA5ZZ0</W1Smiley FrustratedurvA5ySpnsrCW1sA5ZZD><W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><W1:MA5oZozA5m></W1:MA5ozW1DW1ozW1Mrup><W1:MA5ozW1DW1ozW1Mrup MA5ozW1DW1ozW1MrupNW1mA5="A5CW1SA5_CS_MW1"><W1Smiley Very HappyW1ozW1ZZ A5lA5mA5nozNW1mA5="WRKLW1D_OZYPA5_CDA5">myword</W1Smiley Very HappyW1ozW1ZZoA5m><W1Smiley Very HappyW1ozW1ZZozA5m A5lA5mA5nozNW1mA5="WRKLW1D_P_OZYPA5_CDA5"></1Smiley Very HappyW1ozW1ZZozA5m></W1:MA5ozW1DW1ozW1Mrup></W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><WMW1zzlzznmW1ddrAW1ddr1>20OZ 11W1</W1:W1ddr1><W1:W1ddr3>NW1RRW1MW1NSA5OZOZ, RZZ -</W1:W1ddr3>myword

 

data hh;
    infile datalines dlm="|";

length var $ 30000;
input var $;
datalines;
<"myword myword: []/'\":+!@#$%& -0*&^% myword><W15ySpnsrCW1sA5ZZ0</W1urvA5ySpnsrCW1sA5ZZD><W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><W1:MA5oZozA5m></W1:MA5ozW1DW1ozW1Mrup><W1:MA5ozW1DW1ozW1Mrup MA5ozW1DW1ozW1MrupNW1mA5="A5CW1SA5_CS_MW1"><W1W1ozW1ZZ A5lA5mA5nozNW1mA5="WRKLW1D_OZYPA5_CDA5">myword</W1W1ozW1ZZoA5m><W1W1ozW1ZZozA5m A5lA5mA5nozNW1mA5="WRKLW1D_P_OZYPA5_CDA5"></1W1ozW1ZZozA5m></W1:MA5ozW1DW1ozW1Mrup></W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><WMW1zzlzznmW1ddrAW1ddr1>20OZ 11W1</W1:W1ddr1><W1:W1ddr3>NW1RRW1MW1NSA5OZOZ, RZZ -</W1:W1ddr3>myword;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19
data want;
   set hh;
   i = 1;
   p = find(var,'myword',i,'I');
   do while(p gt 0);
      output;
      i = p+1;
      p = find(var,'myword',i,'I');
      end;
   run;

View solution in original post

6 REPLIES 6
art297
Opal | Level 21

Here is one way using a regular expression:

 

data hh;
    infile datalines4 dlm="|";

length var $ 30000;
input var :;
datalines;
<"myword myword: []/'\":+!@#$%& -0*&^% myword><W15ySpnsrCW1sA5ZZ0</W1urvA5ySpnsrCW1sA5ZZD><W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><W1:MA5oZozA5m></W1:MA5ozW1DW1ozW1Mrup><W1:MA5ozW1DW1ozW1Mrup MA5ozW1DW1ozW1MrupNW1mA5="A5CW1SA5_CS_MW1"><W1W1ozW1ZZ A5lA5mA5nozNW1mA5="WRKLW1D_OZYPA5_CDA5">myword</W1W1ozW1ZZoA5m><W1W1ozW1ZZozA5m A5lA5mA5nozNW1mA5="WRKLW1D_P_OZYPA5_CDA5"></1W1ozW1ZZozA5m></W1:MA5ozW1DW1ozW1Mrup></W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><WMW1zzlzznmW1ddrAW1ddr1>20OZ 11W1</W1:W1ddr1><W1:W1ddr3>NW1RRW1MW1NSA5OZOZ, RZZ -</W1:W1ddr3>myword
;;;;
run;

data want;
  set hh;
  length positions $200;
  prxData=prxParse('/myword/i');
  start=1;
  call missing(positions);
  do _N_=1 to 30000;
    call prxNext(prxData,start,-1,var,pos,len);
    if len=0 then leave;
    else positions=catx(',',positions,pos);
  end;
  drop start pos len;
run;

Art, CEO, AnalystFinder.com

 

 

hhchenfx
Barite | Level 11

Thank you for your solution.

 

I try this one below, it kinds of work.

There are 2 things:

- it miss the last "myword"

- I try to put: i=position argument to get it jump. But it doesn't work.

 

Can anyone fix it for me?

 

Thanks a lot.

 

HHCFX

 

data hh;
    infile datalines4 dlm="|";
length var $ 30000;
input var :;
datalines;
<"myword myword: []/'\":+!@#$%& -0*&^% myword><W15ySpnsrCW1sA5ZZ0</W1urvA5ySpnsrCW1sA5ZZD><W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><W1:MA5oZozA5m></W1:MA5ozW1DW1ozW1Mrup><W1:MA5ozW1DW1ozW1Mrup MA5ozW1DW1ozW1MrupNW1mA5="A5CW1SA5_CS_MW1"><W1W1ozW1ZZ A5lA5mA5nozNW1mA5="WRKLW1D_OZYPA5_CDA5">myword</W1W1ozW1ZZoA5m><W1W1ozW1ZZozA5m A5lA5mA5nozNW1mA5="WRKLW1D_P_OZYPA5_CDA5"></1W1ozW1ZZozA5m></W1:MA5ozW1DW1ozW1Mrup></W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><WMW1zzlzznmW1ddrAW1ddr1>20OZ 11W1</W1:W1ddr1><W1:W1ddr3>NW1RRW1MW1NSA5OZOZ, RZZ -</W1:W1ddr3>myword
;;;;
run;

data want;
set hh;
keep position;
do i=1 to 200;
	if findc(var,'myword')>0 then do;
			position=find(var,'myword',i); 
			output; 
			end;
end;
run;

PROC SORT nodupkey data= want out=want2;
by position;
run;

 

art297
Opal | Level 21

You're not searching the entire string! Try:

data want;
  set hh;
  keep position;
  do i=1 to length(var);
	if findc(var,'myword')>0 then do;
			position=find(var,'myword',i); 
			output; 
	end;
  end;
run;

Art, CEO, AnalystFinder.com

 

 

 

data_null__
Jade | Level 19
data want;
   set hh;
   i = 1;
   p = find(var,'myword',i,'I');
   do while(p gt 0);
      output;
      i = p+1;
      p = find(var,'myword',i,'I');
      end;
   run;
Ksharp
Super User

Arthur.T point the right direction.

 

data hh;
    infile datalines dlm="|";

length var $ 30000;
input var $;
datalines4;
<"myword myword: []/'\":+!@#$%& -0*&^% myword><W15ySpnsrCW1sA5ZZ0</W1urvA5ySpnsrCW1sA5ZZD><W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><W1:MA5oZozA5m></W1:MA5ozW1DW1ozW1Mrup><W1:MA5ozW1DW1ozW1Mrup MA5ozW1DW1ozW1MrupNW1mA5="A5CW1SA5_CS_MW1"><W1W1ozW1ZZ A5lA5mA5nozNW1mA5="WRKLW1D_OZYPA5_CDA5">myword</W1W1ozW1ZZoA5m><W1W1ozW1ZZozA5m A5lA5mA5nozNW1mA5="WRKLW1D_P_OZYPA5_CDA5"></1W1ozW1ZZozA5m></W1:MA5ozW1DW1ozW1Mrup></W1:RA5prozzznmUnzzozMA5ozW1dW1ozW1><WMW1zzlzznmW1ddrAW1ddr1>20OZ 11W1</W1:W1ddr1><W1:W1ddr3>NW1RRW1MW1NSA5OZOZ, RZZ -</W1:W1ddr3>myword;
;;;;
run;

data _null_;
 set hh;
 pid=prxparse('/myword/');
 s=1;
 e=length(var);
 call prxnext(pid,s,e,var,p,l);
 do i=1 by 1while(p>0);
   put i= p=;
   call prxnext(pid,s,e,var,p,l);
 end;
run;
hhchenfx
Barite | Level 11
Thank you everyone for your help.
HHCFX

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1402 views
  • 3 likes
  • 4 in conversation