DATA Step, Macro, Functions and more

Parse a Alphanumeric String against a Table

Reply
Frequent Contributor
Posts: 84

Parse a Alphanumeric String against a Table

I am trying to parse a string and look up the alpha part in a look up table.  My string follows the format A0B0C0D0...X0, where the letters represent a product and the numbers represent a frequency of that product, so for example a string starting A0B2 could represent a mix of 0 apples and 2 loaves of bread.  I would like to parse this string out and hit it against the look up table basically to translate the results back to English. 

My first problem is how to loop through the strings (I have about 300k records) and then also how to eliminate results with 0 frequencies and pull the product names from the look up table (for the products with frequency >0) and output the product names and counts, i.e the end result would say "this customer bought 5 loaves of bread and 2 chocolate bars).  Any help is appreciated.

Super User
Posts: 10,543

Re: Parse a Alphanumeric String against a Table

A couple of examples of the input data set records and the desired output would be helpful. Also since you say you are comparing to another look up table, some corresponding examples from that table are a good idea.

Are all of the character codes going to be a single letter? Is the code string in a fixed order will all codes always present or can the order or number of codes change?

Frequent Contributor
Posts: 84

Re: Parse a Alphanumeric String against a Table

So the table below shows some possible values and the results I would like to get.  The order of the string is always fixed and the frequencies will always be less than 9 (so the string length will always be the same).  The character codes will always be a single letter, that pull from a look up table of fixed vales (A for Apple, B for Bread etc).

StringResult
A5B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X05 Apples
A0B2C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X02 Bread
A0B0C3D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X03 Cookies
A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X01 Waffle
A1B1C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X01 Apple, 1 Bread
A0B0C0D0E4F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X04 Eggs, 1 Waffle
A1B1C1D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X01 Apple, 1 Bread, 1 Cookie
Respected Advisor
Posts: 4,659

Re: Parse a Alphanumeric String against a Table

This is a perfect task for PRX functions :

data strings;
length string $100;
input custId string;
datalines;
1 A0B2C0D0
2 A0B0C5D2
;

data products;
length prodCode $1 prodName $40;
input prodCode prodName &;
datalines;
A screwdriver
B pound of tomato
C loaf of bread
D chocolate bar
;

data basket(keep=custId prodCode prodNumber);
if prx1 = 0 then prx1 + prxparse("/(\D+)(\d+)/io");
set strings;
start = 1;
do until(pos=0);
     call prxnext(prx1, start, -1, string, pos, len);
     if pos > 0 then do;
          prodCode = prxposn(prx1, 1, string);
          prodNumber = input(prxposn(prx1, 2, string), best.);
          if prodNumber > 0 then output;
          end;
     end;
run;

proc sql;
create table shopping as
select custId, prodName, prodNumber
from basket natural join products;

select * from shopping;
quit;

PG

PG
Super Contributor
Posts: 339

Re: Parse a Alphanumeric String against a Table

I love prx but since the structure is static and always fully defined, I would opt for substrings and hash tables.

data lookuptable;

     input id $1. desc $20.;

     datalines;

A Apples

B Bread

C Cookie

D Danish

;

run;

data have;

     input string $8.;

     datalines;

A1B2C0D4

;

run;

data want;

     length id $1. count $1. desc $20. result $500.;

     if _N_=1 then do;

     declare hash myhash(dataset: 'lookuptable');

     myhash.defineKey('id');

     myhash.defineData('desc');

     myhash.defineDone();

     end;

  

      set have;

      do i=1 to 4; /* taking advantage of fixed length and 24 different codes - here only 4 for the example data */

          id = substr(string, i*2-1, 1);

          count = substr(string, i*2, 1);

          if count NE "0" then do;

               myhash.find();

               result=result||id||" "||desc||","; /*would need to add some if conditionings to remove the , at the end of the string if that annoys you */

          end;

     end;

run;

Vincent

Super User
Posts: 17,930

Re: Parse a Alphanumeric String against a Table

Rather than a Hash Solution, a format may also work:

data have;

input var1 $50.;

cards;

A5B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B2C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B0C3D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0

A1B1C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B0C0D0E4F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0

A1B1C1D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X1

;

run;

proc format;

    value $ letter_code

        'A' = 'Apple'

        'B' = 'Bread'

        'C' = 'Cookies'

        'W' = 'Waffles'

        'E' = 'Eggs'

        'X' = 'Other';

run;

Data want;

set have;

length string_out $100. temp1 temp2 $1. new_string $50.;

format temp1 $letter_code.;

i=1;

do while (substr(var1, i, 2) ne "");

    temp1=substr(var1, i, 1);

    temp2=substr(var1, i+1, 1);

    new_string="";

    if temp2 ne "0" then  new_string=temp2 || " " || put(temp1, $letter_code.);

    string_out=catx(",",string_out,new_string);

    i+2;

end;

run;

Respected Advisor
Posts: 3,777

Re: Parse a Alphanumeric String against a Table

               result=result||id||" "||desc||","; /*would need to add some if conditionings to remove the , at the end of the string if that annoys you */

CATX(',',result,CATX(' ',id,desc));

Frequent Contributor
Posts: 84

Re: Parse a Alphanumeric String against a Table

Vincent, I am trying the hash solution, but doesn't this just grab the last non-zero product?  In the code you posted, I get a null result field and a desc field of danish.  I would like to output all non-zero products, so from the code you posted above, the result should be 1 Apple, 2 Bread, 4 Danish.

Super Contributor
Posts: 339

Re: Parse a Alphanumeric String against a Table

Hi,

I'm sorry I guess I pasted the wrong version. "desc" only contains the name of the last product but your desired output is supposed to be in "result". The reason why it is empty/buggy is because there is no handling of blank padding thus at every loop iteration, the concatenation was trying to concatenate 500 blanks with whatever additionnal info and truncating to the 500 blanks.

Here's the fixed version without removal of the ending comma atm although that isn't overly complicated to add.

data want;

     length id $1. count $1. desc $20. result $500.;

     if _N_=1 then do;

     declare hash myhash(dataset: 'lookuptable');

     myhash.defineKey('id');

     myhash.defineData('desc');

     myhash.defineDone();

     end;

      set have;

      do i=1 to 4; /* taking advantage of fixed length and 24 different codes - here only 4 for the example data */

          id = substr(string, i*2-1, 1);

          count = substr(string, i*2, 1);

          if count NE "0" then do;

               myhash.find();

               result=cat(trim(result), trim(desc), " ", trim(count), ",");

          end;

     end;

     drop desc count i id;

run;

Sorry about that.

Vincent

Ask a Question
Discussion stats
  • 8 replies
  • 417 views
  • 0 likes
  • 6 in conversation