I am trying to parse a string and look up the alpha part in a look up table. My string follows the format A0B0C0D0...X0, where the letters represent a product and the numbers represent a frequency of that product, so for example a string starting A0B2 could represent a mix of 0 apples and 2 loaves of bread. I would like to parse this string out and hit it against the look up table basically to translate the results back to English.
My first problem is how to loop through the strings (I have about 300k records) and then also how to eliminate results with 0 frequencies and pull the product names from the look up table (for the products with frequency >0) and output the product names and counts, i.e the end result would say "this customer bought 5 loaves of bread and 2 chocolate bars). Any help is appreciated.
A couple of examples of the input data set records and the desired output would be helpful. Also since you say you are comparing to another look up table, some corresponding examples from that table are a good idea.
Are all of the character codes going to be a single letter? Is the code string in a fixed order will all codes always present or can the order or number of codes change?
So the table below shows some possible values and the results I would like to get. The order of the string is always fixed and the frequencies will always be less than 9 (so the string length will always be the same). The character codes will always be a single letter, that pull from a look up table of fixed vales (A for Apple, B for Bread etc).
String | Result |
A5B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0 | 5 Apples |
A0B2C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0 | 2 Bread |
A0B0C3D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0 | 3 Cookies |
A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0 | 1 Waffle |
A1B1C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0 | 1 Apple, 1 Bread |
A0B0C0D0E4F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0 | 4 Eggs, 1 Waffle |
A1B1C1D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0 | 1 Apple, 1 Bread, 1 Cookie |
This is a perfect task for PRX functions :
data strings;
length string $100;
input custId string;
datalines;
1 A0B2C0D0
2 A0B0C5D2
;
data products;
length prodCode $1 prodName $40;
input prodCode prodName &;
datalines;
A screwdriver
B pound of tomato
C loaf of bread
D chocolate bar
;
data basket(keep=custId prodCode prodNumber);
if prx1 = 0 then prx1 + prxparse("/(\D+)(\d+)/io");
set strings;
start = 1;
do until(pos=0);
call prxnext(prx1, start, -1, string, pos, len);
if pos > 0 then do;
prodCode = prxposn(prx1, 1, string);
prodNumber = input(prxposn(prx1, 2, string), best.);
if prodNumber > 0 then output;
end;
end;
run;
proc sql;
create table shopping as
select custId, prodName, prodNumber
from basket natural join products;
select * from shopping;
quit;
PG
I love prx but since the structure is static and always fully defined, I would opt for substrings and hash tables.
data lookuptable;
input id $1. desc $20.;
datalines;
A Apples
B Bread
C Cookie
D Danish
;
run;
data have;
input string $8.;
datalines;
A1B2C0D4
;
run;
data want;
length id $1. count $1. desc $20. result $500.;
if _N_=1 then do;
declare hash myhash(dataset: 'lookuptable');
myhash.defineKey('id');
myhash.defineData('desc');
myhash.defineDone();
end;
set have;
do i=1 to 4; /* taking advantage of fixed length and 24 different codes - here only 4 for the example data */
id = substr(string, i*2-1, 1);
count = substr(string, i*2, 1);
if count NE "0" then do;
myhash.find();
result=result||id||" "||desc||","; /*would need to add some if conditionings to remove the , at the end of the string if that annoys you */
end;
end;
run;
Vincent
Rather than a Hash Solution, a format may also work:
data have;
input var1 $50.;
cards;
A5B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0
A0B2C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0
A0B0C3D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0
A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0
A1B1C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0
A0B0C0D0E4F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0
A1B1C1D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X1
;
run;
proc format;
value $ letter_code
'A' = 'Apple'
'B' = 'Bread'
'C' = 'Cookies'
'W' = 'Waffles'
'E' = 'Eggs'
'X' = 'Other';
run;
Data want;
set have;
length string_out $100. temp1 temp2 $1. new_string $50.;
format temp1 $letter_code.;
i=1;
do while (substr(var1, i, 2) ne "");
temp1=substr(var1, i, 1);
temp2=substr(var1, i+1, 1);
new_string="";
if temp2 ne "0" then new_string=temp2 || " " || put(temp1, $letter_code.);
string_out=catx(",",string_out,new_string);
i+2;
end;
run;
result=result||id||" "||desc||","; /*would need to add some if conditionings to remove the , at the end of the string if that annoys you */
CATX(',',result,CATX(' ',id,desc));
Vincent, I am trying the hash solution, but doesn't this just grab the last non-zero product? In the code you posted, I get a null result field and a desc field of danish. I would like to output all non-zero products, so from the code you posted above, the result should be 1 Apple, 2 Bread, 4 Danish.
Hi,
I'm sorry I guess I pasted the wrong version. "desc" only contains the name of the last product but your desired output is supposed to be in "result". The reason why it is empty/buggy is because there is no handling of blank padding thus at every loop iteration, the concatenation was trying to concatenate 500 blanks with whatever additionnal info and truncating to the 500 blanks.
Here's the fixed version without removal of the ending comma atm although that isn't overly complicated to add.
data want;
length id $1. count $1. desc $20. result $500.;
if _N_=1 then do;
declare hash myhash(dataset: 'lookuptable');
myhash.defineKey('id');
myhash.defineData('desc');
myhash.defineDone();
end;
set have;
do i=1 to 4; /* taking advantage of fixed length and 24 different codes - here only 4 for the example data */
id = substr(string, i*2-1, 1);
count = substr(string, i*2, 1);
if count NE "0" then do;
myhash.find();
result=cat(trim(result), trim(desc), " ", trim(count), ",");
end;
end;
drop desc count i id;
run;
Sorry about that.
Vincent
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.