So you have name/value pairs separated by commas, and those come in groups separated by semicolons.
Use countw() and scan() functions to dissect the string. Instead of creating a wide dataset with a number of variables where you need to know the number beforehand, create a long dataset where that number does not need to be known:
data want (keep=group price volume);
instring = 'price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496';
do group = 1 to countw(instring,';');
substring = scan(instring,group,';');
price = input(scan(scan(substring,1,','),2,'='),best.);
volume = input(scan(scan(substring,2,','),2,'='),best.);
output;
end;
run;
Post test data in the form of a datastep using the code window {i}!
As such untested:
data want (drop=p_i v_i wrd); set have; length wrd $200; array price {10}; array value {10}; p_i=1; v_i=1; do i=1 to countw(str,","); wrd=scan(str,i,","); if index(wrd,"price") > 0 then do; price{p_i}=input(scan(wrd,2,"="),best.); p_i=p_i+1; end; else do; value{v_i}=input(scan(wrd,2,"="),best.); v_i=v_i+1; end; end; run;
Note I assumed no more than 10 of each, and all in one string delimited by comma, and individually delimited by =.
Is this coming from a text file, as you can read named variables direct from the text file, then just transpose them up - again examples are extremely important to get good answers!
Is this an external file that you are attempting to read?
It may help to post a few lines of the file if so.
@owenwqp1 wrote:
But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks
That's incorrect. Note the third parameter to those functions is the Start Position. So once you've found it once, you can set the start position to the last found value and it searches from there on.
@owenwqp1 wrote:
Dear all, I want to substr substrings from very long strings into different variables, but for each substring,they appear several times. For example: "price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496" . I want to substr the first price to be variable price1, the second price to be price2, .....the 10th price to be price10. But sas functions like find, index can only return the position of the first "price", so anyone could tell me how to find the positions of other substring "price"? Thanks
Please see my initial post:
Post test data in the form of a datastep in the post using a code window!
What does your data look like, are you doing a proc import from that Excel file? Is it all in one line? Did you try my code, as adjusting the number of arrays it should be quite close.
Example of how to provide test data:
data have; length longstr $2000; input no dt $ longstr $ datalines4; 1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496 1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018 ;;;; run;
Basically you just need to loop over longstr scanning each section delimited by ;, then scan that part by , - and then assign each to array.
Based on your sample data (with some corrections to the code:
data have;
length firmid date 4 transactions $ 1000;
format date mmddyy10.;
input firmid date :mmddyy10. transactions;
datalines4;
1001 1/27/2017 price=20.2,volume=1200;price=21,volume=1555;price=23,volume=1187;price=21,volume=1496
1002 1/28/2017 price=20.9,volume=1300;price=20.9,volume=1000;price=21,volume=2018
1003 1/29/2017 price=21.9,volume=2200;price=22.5,volume=1700
;;;;
run;
data want (keep=firmid date group price volume);
set have;
do group = 1 to countw(transactions,';');
substring = scan(transactions,group,';');
price = input(scan(scan(substring,1,','),2,'='),best.);
volume = input(scan(scan(substring,2,','),2,'='),best.);
output;
end;
run;
Gives you a nice longitudinal dataset to work with.
One possibility is to use PRX (Pearl Regular Expressions). It looks like you always have first the price, then the volume, separated by a comma:
Data want; set have; prxData=prxParse('/price=([0-9\.]+),volume=([0-9\.]+)/i'); array prices(*) 8 price1-price10; array volumes(*) 8 volume1-volume10; start=1; do _N_=1 to 10; call prxNext(prxData,start,-1,transactions,pos,len); if len=0 then leave; prices(_N_)=input(prxposn(prxData,1,transactions),best32.); volumes(_N_)=input(prxposn(prxData,2,transactions),best32.); end;
drop start pos len; run;
A short explanation of the string in PRXPARSE: it first looks for "price=". The string that follows is put in a paranthesis, so that it is stored in a capture buffer, the contents of the buffer must be digits or decimal point. Then follows the text ",volume=" and a new capture buffer with a number. The final "i" tells the function not to care about upper/lower case.
The PRXNEXT call routine automatically updates the START variable, so that the next call will look for the next occurence of the pattern. When no more patterns are found, the POS variable (position of pattern) is set to 0 when no more occurrences are found.
The PRXPOSN function retrieves the content of the capture buffers.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.