BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
benjamin_2018
Fluorite | Level 6

Hello. I seek advice on how to parse a character string with multiple "observations" into 5 separate columns. For example, I am trying to parse the following string: 

 

{'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]}

 

In this column there are 5 "observations" that I want to parse into 5 separate columns. The next observation (not shown) has 2 "observations" that I want to parse into 5 columns, and the next has 1, and so on. 

I have been able to parse the first clause, but I need help on how to continue to parse the 2nd clause, then the 3rd, etc. Also, if later observations have fewer than 5 as stated above, I need the loop to stop. In this case &MaxSeller=5 and the column I am parsing is called OfferList. 

 

Here is my code so far:

 

data want;
	set work.have;
	Sellers=compress(OfferList,"'.","kads"); *removes punctuation, keeps single quotes, alpha, digits, and decimals;

	/* create the 5 columns */
array names[&MaxSeller] $ 64; array prices[&MaxSeller]; array ship[&MaxSeller]; array sellerid(&MaxSeller) $ 14; array prime(&MaxSeller); do i = 1 to &MaxSeller; names[i]=scan(Sellers,i,"'",); _NameLength=length(names[i]); _Restofstring=strip(substr(Sellers,_NameLength+3)); prices[i]=scan(_Restofstring,i," "); ship[i]=scan(_Restofstring,i+1," "); _ID='/.\w{14}./o'; _ID_loc=prxmatch(_ID,_Restofstring); sellerid[i]=strip(substr(_Restofstring,_ID_Loc+1,14)); _P='/ [01] /o'; _P_loc=prxmatch(_P,_Restofstring); prime[i]=substr(_Restofstring,_P_loc+1,1); end; drop _:; run;

Thank you! I use SAS Enterprise Guide 8.1.

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Next is a tested code:

%let str = {'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]};
data a;
  str = "&str";
  str1 = compress(str,"'{[}");
  str1 = translate(str1,',',':');
  length strx $60 name $15 ;
  do i=1 to 10;
     strx = scan(str1,i,']'); 
    if strx ne ' ' then do; 
     name = scan(strx,1,',');
     price = scan(strx,2,',');
     ship  = scan(strx,3,',');
     sellerid = scan(strx,4,',');
     prime = scan(strx,5,',');
     output;
  end; end;
  keep name price ship sellerid prime strx;
run;

View solution in original post

10 REPLIES 10
Shmuel
Garnet | Level 18

Please post the rules and the delimiters between the observations.

You can use translate() function to replace the different delimiters into one specific delimiter then

use scan() function in a loop to assign each value to appropriate variable.

 

 

benjamin_2018
Fluorite | Level 6

Yes, I think I've done that. I started with a scan function and a single quote as the delimiter, because I want names1 to include all words and spaces between the first pair of single quotes. Then I switch to scan functions using space as the delimiter because the remaining 4 have no spaces. In the 2nd loop, what is the best way to begin scanning at the 6th clause, and then the 11th clause on the 3rd loop, etc? The scan function does not have an option to start in the middle of a column. 

Shmuel
Garnet | Level 18

I'm not sure you understood what I meant. I can't guess the expected output.

Maybe next code will give a hint to what I meant:

%let str = {'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]};
data a;
  str = "&str";
  str1 = compress(str,'{[}]');
  length strx $40;
  do i=1 to 10;
     strx = scan(str1,i,':'); 
if strx ne ' ' then output; end; run;

What are the expected variables and observations from this given string?

benjamin_2018
Fluorite | Level 6
I see. Yes sorry I realize I did not post the expected output. The variables I want to create are the variables created in the array statements.
names[i] should be the string between the first pair of quotes.
prices[i] should be the first number in the brackets
ship[i] should be the 2nd number in the brackets.
sellerid[i] should be the quoted string within the brackets.
prime[i] should be the last number in the brackets.
After that I want to go back through the loop to begin again with names[i], etc.

I will look into your solution next week. Thank you.
Shmuel
Garnet | Level 18

Next is a tested code:

%let str = {'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]};
data a;
  str = "&str";
  str1 = compress(str,"'{[}");
  str1 = translate(str1,',',':');
  length strx $60 name $15 ;
  do i=1 to 10;
     strx = scan(str1,i,']'); 
    if strx ne ' ' then do; 
     name = scan(strx,1,',');
     price = scan(strx,2,',');
     ship  = scan(strx,3,',');
     sellerid = scan(strx,4,',');
     prime = scan(strx,5,',');
     output;
  end; end;
  keep name price ship sellerid prime strx;
run;
benjamin_2018
Fluorite | Level 6

Thank you very much!!!

Reeza
Super User

Is it originally JSON/XML file? If so, you could try the libname approach to handling those files.

 

If it's coming from a DB it's more complicated....

benjamin_2018
Fluorite | Level 6

I think it does come from one of those file types, but I am reading from a CSV file. I haven't used LIBNAME with JSON or XML. I can try that. 

PGStats
Opal | Level 21

I would start like this:

 

data want;
txt = "{'Orva Stores': [36.99, 0.0, 'A2NEM58BFPMEIL', 1], 'J.WALKER LLC': [36.99, 0.0, 'A1B3KT3F9BMSN1', 1], 'Justin Smiles': [36.99, 0.0, 'A3SP9XX1M7ZN73', 1], 'Closeout Pro': [36.99, 0.0, 'A3IRLODW57QOBV', 1], 'Flow Dealz': [44.95, 0.0, 'A2GP4414DQTJ29', 1]}";
id = prxParse("/'([^']+)': \[([^\]]+)\]/");
length name $24 values $64;
start = 1;
stop = length(txt);
call prxnext(id, start, stop, txt, pos, len);
do while (pos > 0);
    name = prxPosn(id, 1, txt);
    values = prxPosn(id, 2, txt);
    /* Parse values here */
    output;
    call prxnext(id, start, stop, txt, pos, len);
    end;
keep name values;
run;

PGStats_0-1614380062931.png

 

 .... and then use SCAN or the same technique again on variable values

PG
benjamin_2018
Fluorite | Level 6
THANK YOU! I will try this solution next week. I’m not familiar with prxnext but I’ll give it a shot!!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2397 views
  • 1 like
  • 4 in conversation