DATA Step, Macro, Functions and more

How to find the positions of the same substring in a long string

Reply
Contributor
Posts: 50

How to find the positions of the same substring in a long string

Dear all, I want to substr substrings from very long strings into different variables, but for each substring,they appear several times. For example: "price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496" . I want to substr the first price to be variable price1, the second price to be price2, .....the 10th price to be price10. But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks
Contributor RM6
Contributor
Posts: 24

Re: How to find the positions of the same substring in a long string

can you be more clear
Super User
Posts: 10,574

Re: How to find the positions of the same substring in a long string

[ Edited ]

So you have name/value pairs separated by commas, and those come in groups separated by semicolons.

Use countw() and scan() functions to dissect the string. Instead of creating a wide dataset with a number of variables where you need to know the number beforehand, create a long dataset where that number does not need to be known:

 

data want (keep=group price volume);
instring = 'price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496';
do group = 1 to countw(instring,';');
  substring = scan(instring,group,';');
  price = input(scan(scan(substring,1,','),2,'='),best.);
  volume = input(scan(scan(substring,2,','),2,'='),best.);
  output;
end;
run;

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 50

Re: How to find the positions of the same substring in a long string

Posted in reply to KurtBremser
@KurtBremser Thank you so much KurtBremser, my problem solved with your code! Cheers, Owen
Super User
Super User
Posts: 9,840

Re: How to find the positions of the same substring in a long string

Post test data in the form of a datastep using the code window {i}!

As such untested:

data want (drop=p_i v_i wrd);
  set have;
  length wrd $200;
  array price {10};
  array value {10};
  p_i=1;
  v_i=1;
  do i=1 to countw(str,",");
    wrd=scan(str,i,",");
    if index(wrd,"price") > 0 then do;
      price{p_i}=input(scan(wrd,2,"="),best.);
      p_i=p_i+1;
    end;
    else do;
      value{v_i}=input(scan(wrd,2,"="),best.);
      v_i=v_i+1;
    end;
  end;
run;
      

Note I assumed no more than 10 of each, and all in one string delimited by comma, and individually delimited by =.

Is this coming from a text file, as you can read named variables direct from the text file, then just transpose them up - again examples are extremely important to get good answers!

Super User
Posts: 13,941

Re: How to find the positions of the same substring in a long string

Is this an external file that you are attempting to read?

It may help to post a few lines of the file if so.

Super User
Posts: 24,010

Re: How to find the positions of the same substring in a long string

[ Edited ]

owenwqp1 wrote:
But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks

That's incorrect. Note the third parameter to those functions is the Start Position. So once you've found it once, you can set the start position to the last found value and it searches from there on.

 

 

 

http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p16rdsa30vmm43n1ej4936nwa01t.htm&...

 


owenwqp1 wrote:
Dear all, I want to substr substrings from very long strings into different variables, but for each substring,they appear several times. For example: "price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496" . I want to substr the first price to be variable price1, the second price to be price2, .....the 10th price to be price10. But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks

 

Contributor
Posts: 50

Re: How to find the positions of the same substring in a long string

Dear all, Thank you for your replies. Because I did not put my question very clearly, your answers could not completely solve my problem. My data is like the following: firmid date transactions 1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496 1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018" 1003 1/29/2017 price=21.9,volume=2200;price=22.5,volume=1700 number of transactions everyday is not the same, and the position of each transaction price and volume is not regular. I want to substring the first transction price to variable price1, volume to variable volume1.And the second and other transctions to variable price2 and varible volume2 and so on. Thanks a lot
Contributor
Posts: 50

Re: How to find the positions of the same substring in a long string

Dear all, Thank you for your replies. Because I did not put my question very clearly, your answers could not completely solve my problem. I'm really sorry to make it worse in the last reply.My sample excel data is attached: number of transactions everyday is not the same, and the position of each transaction price and volume is not regular. I want to substring the first transction price to variable price1, volume to variable volume1.And the second and other transctions to variable price2 and varible volume2 and so on. By the way, anyone could tell me how to directly paste excel data in table form in my posts? Thanks a lot
Super User
Super User
Posts: 9,840

Re: How to find the positions of the same substring in a long string

Please see my initial post:

Post test data in the form of a datastep in the post using a code window!

 

 

What does your data look like, are you doing a proc import from that Excel file?  Is it all in one line?  Did you try my code, as adjusting the number of arrays it should be quite close.  

 

Example of how to provide test data:

data have;
  length longstr $2000;
  input no dt $ longstr $
datalines4;
1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496 
1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018
;;;;
run;

Basically you just need to loop over longstr scanning each section delimited by ;, then scan that part by , - and then assign each to array.

Contributor
Posts: 50

Re: How to find the positions of the same substring in a long string

Dear all, Thank you for your replies. Because I did not put my question very clearly, your answers could not completely solve my problem. I'm really sorry to make it worse in my previous reply. My sample data is as follows: data sample; length transactions $ 1000; input firmid date transactions $ datalines4; 1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496 1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018 1003 1/29/2017 price=21.9,volume=2200;price=22.5,volume=1700 ......; run; The number of transactions everyday is not the same, and the position of each transaction price and volume is not regular. I want to substring the first transction price to variable price1, volume to variable volume1. And the second and other transctions to variable price2 and varible volume2 and so on. Thanks a lot
Super User
Posts: 10,574

Re: How to find the positions of the same substring in a long string

Based on your sample data (with some corrections to the code:

data have;
length firmid date 4 transactions $ 1000;
format date mmddyy10.;
input firmid date :mmddyy10. transactions;
datalines4;
1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496
1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018
1003 1/29/2017 price=21.9,volume=2200;price=22.5,volume=1700
;;;;
run;

data want (keep=firmid date group price volume);
set have;
do group = 1 to countw(transactions,';');
  substring = scan(transactions,group,';');
  price = input(scan(scan(substring,1,','),2,'='),best.);
  volume = input(scan(scan(substring,2,','),2,'='),best.);
  output;
end;
run;

Gives you a nice longitudinal dataset to work with.

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
PROC Star
Posts: 275

Re: How to find the positions of the same substring in a long string

One possibility is to use PRX (Pearl Regular Expressions). It looks like you always have first the price, then the volume, separated by a comma:

 

Data want;
  set have;
  prxData=prxParse('/price=([0-9\.]+),volume=([0-9\.]+)/i');
  array prices(*) 8 price1-price10;
  array volumes(*) 8 volume1-volume10;
  start=1;
  do _N_=1 to 10;
    call prxNext(prxData,start,-1,transactions,pos,len);
    if len=0 then leave;
    prices(_N_)=input(prxposn(prxData,1,transactions),best32.);
    volumes(_N_)=input(prxposn(prxData,2,transactions),best32.);
    end;
drop start pos len; run;

A short explanation of the string in PRXPARSE: it first looks for "price=". The string that follows is put in a paranthesis, so that it is stored in a capture buffer, the contents of the buffer must be digits or decimal point. Then follows the text ",volume=" and a new capture buffer with a number. The final "i" tells the function not to care about upper/lower case.

 

The PRXNEXT call routine automatically updates the START variable, so that the next call will look for the next occurence of the pattern. When no more patterns are found, the POS variable (position of pattern) is set to 0 when no more occurrences are found.

The PRXPOSN function retrieves the content of the capture buffers.

Ask a Question
Discussion stats
  • 12 replies
  • 258 views
  • 1 like
  • 7 in conversation