Hello. I seek advice on how to parse a character string with multiple "observations" into 5 separate columns. For example, I am trying to parse the following string:
{'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]}
In this column there are 5 "observations" that I want to parse into 5 separate columns. The next observation (not shown) has 2 "observations" that I want to parse into 5 columns, and the next has 1, and so on.
I have been able to parse the first clause, but I need help on how to continue to parse the 2nd clause, then the 3rd, etc. Also, if later observations have fewer than 5 as stated above, I need the loop to stop. In this case &MaxSeller=5 and the column I am parsing is called OfferList.
Here is my code so far:
data want;
set work.have;
Sellers=compress(OfferList,"'.","kads"); *removes punctuation, keeps single quotes, alpha, digits, and decimals;
/* create the 5 columns */
array names[&MaxSeller] $ 64;
array prices[&MaxSeller];
array ship[&MaxSeller];
array sellerid(&MaxSeller) $ 14;
array prime(&MaxSeller);
do i = 1 to &MaxSeller;
names[i]=scan(Sellers,i,"'",);
_NameLength=length(names[i]);
_Restofstring=strip(substr(Sellers,_NameLength+3));
prices[i]=scan(_Restofstring,i," ");
ship[i]=scan(_Restofstring,i+1," ");
_ID='/.\w{14}./o';
_ID_loc=prxmatch(_ID,_Restofstring);
sellerid[i]=strip(substr(_Restofstring,_ID_Loc+1,14));
_P='/ [01] /o';
_P_loc=prxmatch(_P,_Restofstring);
prime[i]=substr(_Restofstring,_P_loc+1,1);
end;
drop _:;
run;
Thank you! I use SAS Enterprise Guide 8.1.
Next is a tested code:
%let str = {'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]};
data a;
str = "&str";
str1 = compress(str,"'{[}");
str1 = translate(str1,',',':');
length strx $60 name $15 ;
do i=1 to 10;
strx = scan(str1,i,']');
if strx ne ' ' then do;
name = scan(strx,1,',');
price = scan(strx,2,',');
ship = scan(strx,3,',');
sellerid = scan(strx,4,',');
prime = scan(strx,5,',');
output;
end; end;
keep name price ship sellerid prime strx;
run;
Please post the rules and the delimiters between the observations.
You can use translate() function to replace the different delimiters into one specific delimiter then
use scan() function in a loop to assign each value to appropriate variable.
Yes, I think I've done that. I started with a scan function and a single quote as the delimiter, because I want names1 to include all words and spaces between the first pair of single quotes. Then I switch to scan functions using space as the delimiter because the remaining 4 have no spaces. In the 2nd loop, what is the best way to begin scanning at the 6th clause, and then the 11th clause on the 3rd loop, etc? The scan function does not have an option to start in the middle of a column.
I'm not sure you understood what I meant. I can't guess the expected output.
Maybe next code will give a hint to what I meant:
%let str = {'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]};
data a;
str = "&str";
str1 = compress(str,'{[}]');
length strx $40;
do i=1 to 10;
strx = scan(str1,i,':');
if strx ne ' ' then output;
end;
run;
What are the expected variables and observations from this given string?
Next is a tested code:
%let str = {'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]};
data a;
str = "&str";
str1 = compress(str,"'{[}");
str1 = translate(str1,',',':');
length strx $60 name $15 ;
do i=1 to 10;
strx = scan(str1,i,']');
if strx ne ' ' then do;
name = scan(strx,1,',');
price = scan(strx,2,',');
ship = scan(strx,3,',');
sellerid = scan(strx,4,',');
prime = scan(strx,5,',');
output;
end; end;
keep name price ship sellerid prime strx;
run;
Thank you very much!!!
Is it originally JSON/XML file? If so, you could try the libname approach to handling those files.
If it's coming from a DB it's more complicated....
I think it does come from one of those file types, but I am reading from a CSV file. I haven't used LIBNAME with JSON or XML. I can try that.
I would start like this:
data want;
txt = "{'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]}";
id = prxParse("/'([^']+)': \[([^\]]+)\]/");
length name $24 values $64;
start = 1;
stop = length(txt);
call prxnext(id, start, stop, txt, pos, len);
do while (pos > 0);
name = prxPosn(id, 1, txt);
values = prxPosn(id, 2, txt);
/* Parse values here */
output;
call prxnext(id, start, stop, txt, pos, len);
end;
keep name values;
run;
.... and then use SCAN or the same technique again on variable values.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.