BookmarkSubscribeRSS Feed
owenwqp1
Obsidian | Level 7
Dear all, I want to substr substrings from very long strings into different variables, but for each substring,they appear several times. For example: "price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496" . I want to substr the first price to be variable price1, the second price to be price2, .....the 10th price to be price10. But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks
12 REPLIES 12
RM6
Obsidian | Level 7 RM6
Obsidian | Level 7
can you be more clear
Kurt_Bremser
Super User

So you have name/value pairs separated by commas, and those come in groups separated by semicolons.

Use countw() and scan() functions to dissect the string. Instead of creating a wide dataset with a number of variables where you need to know the number beforehand, create a long dataset where that number does not need to be known:

 

data want (keep=group price volume);
instring = 'price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496';
do group = 1 to countw(instring,';');
  substring = scan(instring,group,';');
  price = input(scan(scan(substring,1,','),2,'='),best.);
  volume = input(scan(scan(substring,2,','),2,'='),best.);
  output;
end;
run;

 

owenwqp1
Obsidian | Level 7
@Kurt_Bremser Thank you so much KurtBremser, my problem solved with your code! Cheers, Owen
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep using the code window {i}!

As such untested:

data want (drop=p_i v_i wrd);
  set have;
  length wrd $200;
  array price {10};
  array value {10};
  p_i=1;
  v_i=1;
  do i=1 to countw(str,",");
    wrd=scan(str,i,",");
    if index(wrd,"price") > 0 then do;
      price{p_i}=input(scan(wrd,2,"="),best.);
      p_i=p_i+1;
    end;
    else do;
      value{v_i}=input(scan(wrd,2,"="),best.);
      v_i=v_i+1;
    end;
  end;
run;
      

Note I assumed no more than 10 of each, and all in one string delimited by comma, and individually delimited by =.

Is this coming from a text file, as you can read named variables direct from the text file, then just transpose them up - again examples are extremely important to get good answers!

ballardw
Super User

Is this an external file that you are attempting to read?

It may help to post a few lines of the file if so.

Reeza
Super User

@owenwqp1 wrote:
But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks

That's incorrect. Note the third parameter to those functions is the Start Position. So once you've found it once, you can set the start position to the last found value and it searches from there on.

 

 

 

http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p16rdsa30vmm43n1ej4936nwa01t.htm&...

 


@owenwqp1 wrote:
Dear all, I want to substr substrings from very long strings into different variables, but for each substring,they appear several times. For example: "price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496" . I want to substr the first price to be variable price1, the second price to be price2, .....the 10th price to be price10. But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks

 

owenwqp1
Obsidian | Level 7
Dear all, Thank you for your replies. Because I did not put my question very clearly, your answers could not completely solve my problem. My data is like the following: firmid date transactions 1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496 1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018" 1003 1/29/2017 price=21.9,volume=2200;price=22.5,volume=1700 number of transactions everyday is not the same, and the position of each transaction price and volume is not regular. I want to substring the first transction price to variable price1, volume to variable volume1.And the second and other transctions to variable price2 and varible volume2 and so on. Thanks a lot
owenwqp1
Obsidian | Level 7
Dear all, Thank you for your replies. Because I did not put my question very clearly, your answers could not completely solve my problem. I'm really sorry to make it worse in the last reply.My sample excel data is attached: number of transactions everyday is not the same, and the position of each transaction price and volume is not regular. I want to substring the first transction price to variable price1, volume to variable volume1.And the second and other transctions to variable price2 and varible volume2 and so on. By the way, anyone could tell me how to directly paste excel data in table form in my posts? Thanks a lot
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please see my initial post:

Post test data in the form of a datastep in the post using a code window!

 

 

What does your data look like, are you doing a proc import from that Excel file?  Is it all in one line?  Did you try my code, as adjusting the number of arrays it should be quite close.  

 

Example of how to provide test data:

data have;
  length longstr $2000;
  input no dt $ longstr $
datalines4;
1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496 
1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018
;;;;
run;

Basically you just need to loop over longstr scanning each section delimited by ;, then scan that part by , - and then assign each to array.

owenwqp1
Obsidian | Level 7
Dear all, Thank you for your replies. Because I did not put my question very clearly, your answers could not completely solve my problem. I'm really sorry to make it worse in my previous reply. My sample data is as follows: data sample; length transactions $ 1000; input firmid date transactions $ datalines4; 1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496 1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018 1003 1/29/2017 price=21.9,volume=2200;price=22.5,volume=1700 ......; run; The number of transactions everyday is not the same, and the position of each transaction price and volume is not regular. I want to substring the first transction price to variable price1, volume to variable volume1. And the second and other transctions to variable price2 and varible volume2 and so on. Thanks a lot
Kurt_Bremser
Super User

Based on your sample data (with some corrections to the code:

data have;
length firmid date 4 transactions $ 1000;
format date mmddyy10.;
input firmid date :mmddyy10. transactions;
datalines4;
1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496
1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018
1003 1/29/2017 price=21.9,volume=2200;price=22.5,volume=1700
;;;;
run;

data want (keep=firmid date group price volume);
set have;
do group = 1 to countw(transactions,';');
  substring = scan(transactions,group,';');
  price = input(scan(scan(substring,1,','),2,'='),best.);
  volume = input(scan(scan(substring,2,','),2,'='),best.);
  output;
end;
run;

Gives you a nice longitudinal dataset to work with.

 

s_lassen
Meteorite | Level 14

One possibility is to use PRX (Pearl Regular Expressions). It looks like you always have first the price, then the volume, separated by a comma:

 

Data want;
  set have;
  prxData=prxParse('/price=([0-9\.]+),volume=([0-9\.]+)/i');
  array prices(*) 8 price1-price10;
  array volumes(*) 8 volume1-volume10;
  start=1;
  do _N_=1 to 10;
    call prxNext(prxData,start,-1,transactions,pos,len);
    if len=0 then leave;
    prices(_N_)=input(prxposn(prxData,1,transactions),best32.);
    volumes(_N_)=input(prxposn(prxData,2,transactions),best32.);
    end;
drop start pos len; run;

A short explanation of the string in PRXPARSE: it first looks for "price=". The string that follows is put in a paranthesis, so that it is stored in a capture buffer, the contents of the buffer must be digits or decimal point. Then follows the text ",volume=" and a new capture buffer with a number. The final "i" tells the function not to care about upper/lower case.

 

The PRXNEXT call routine automatically updates the START variable, so that the next call will look for the next occurence of the pattern. When no more patterns are found, the POS variable (position of pattern) is set to 0 when no more occurrences are found.

The PRXPOSN function retrieves the content of the capture buffers.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4902 views
  • 1 like
  • 7 in conversation