BookmarkSubscribeRSS Feed
P5C768
Obsidian | Level 7

I am trying to parse a string and look up the alpha part in a look up table.  My string follows the format A0B0C0D0...X0, where the letters represent a product and the numbers represent a frequency of that product, so for example a string starting A0B2 could represent a mix of 0 apples and 2 loaves of bread.  I would like to parse this string out and hit it against the look up table basically to translate the results back to English. 

My first problem is how to loop through the strings (I have about 300k records) and then also how to eliminate results with 0 frequencies and pull the product names from the look up table (for the products with frequency >0) and output the product names and counts, i.e the end result would say "this customer bought 5 loaves of bread and 2 chocolate bars).  Any help is appreciated.

8 REPLIES 8
ballardw
Super User

A couple of examples of the input data set records and the desired output would be helpful. Also since you say you are comparing to another look up table, some corresponding examples from that table are a good idea.

Are all of the character codes going to be a single letter? Is the code string in a fixed order will all codes always present or can the order or number of codes change?

P5C768
Obsidian | Level 7

So the table below shows some possible values and the results I would like to get.  The order of the string is always fixed and the frequencies will always be less than 9 (so the string length will always be the same).  The character codes will always be a single letter, that pull from a look up table of fixed vales (A for Apple, B for Bread etc).

StringResult
A5B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X05 Apples
A0B2C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X02 Bread
A0B0C3D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X03 Cookies
A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X01 Waffle
A1B1C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X01 Apple, 1 Bread
A0B0C0D0E4F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X04 Eggs, 1 Waffle
A1B1C1D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X01 Apple, 1 Bread, 1 Cookie
PGStats
Opal | Level 21

This is a perfect task for PRX functions :

data strings;
length string $100;
input custId string;
datalines;
1 A0B2C0D0
2 A0B0C5D2
;

data products;
length prodCode $1 prodName $40;
input prodCode prodName &;
datalines;
A screwdriver
B pound of tomato
C loaf of bread
D chocolate bar
;

data basket(keep=custId prodCode prodNumber);
if prx1 = 0 then prx1 + prxparse("/(\D+)(\d+)/io");
set strings;
start = 1;
do until(pos=0);
     call prxnext(prx1, start, -1, string, pos, len);
     if pos > 0 then do;
          prodCode = prxposn(prx1, 1, string);
          prodNumber = input(prxposn(prx1, 2, string), best.);
          if prodNumber > 0 then output;
          end;
     end;
run;

proc sql;
create table shopping as
select custId, prodName, prodNumber
from basket natural join products;

select * from shopping;
quit;

PG

PG
Vince28_Statcan
Quartz | Level 8

I love prx but since the structure is static and always fully defined, I would opt for substrings and hash tables.

data lookuptable;

     input id $1. desc $20.;

     datalines;

A Apples

B Bread

C Cookie

D Danish

;

run;

data have;

     input string $8.;

     datalines;

A1B2C0D4

;

run;

data want;

     length id $1. count $1. desc $20. result $500.;

     if _N_=1 then do;

     declare hash myhash(dataset: 'lookuptable');

     myhash.defineKey('id');

     myhash.defineData('desc');

     myhash.defineDone();

     end;

  

      set have;

      do i=1 to 4; /* taking advantage of fixed length and 24 different codes - here only 4 for the example data */

          id = substr(string, i*2-1, 1);

          count = substr(string, i*2, 1);

          if count NE "0" then do;

               myhash.find();

               result=result||id||" "||desc||","; /*would need to add some if conditionings to remove the , at the end of the string if that annoys you */

          end;

     end;

run;

Vincent

Reeza
Super User

Rather than a Hash Solution, a format may also work:

data have;

input var1 $50.;

cards;

A5B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B2C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B0C3D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B0C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0

A1B1C0D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X0

A0B0C0D0E4F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W1X0

A1B1C1D0E0F0G0H0I0J0K0L0M0N0O0P0Q0R0S0T0U0V0W0X1

;

run;

proc format;

    value $ letter_code

        'A' = 'Apple'

        'B' = 'Bread'

        'C' = 'Cookies'

        'W' = 'Waffles'

        'E' = 'Eggs'

        'X' = 'Other';

run;

Data want;

set have;

length string_out $100. temp1 temp2 $1. new_string $50.;

format temp1 $letter_code.;

i=1;

do while (substr(var1, i, 2) ne "");

    temp1=substr(var1, i, 1);

    temp2=substr(var1, i+1, 1);

    new_string="";

    if temp2 ne "0" then  new_string=temp2 || " " || put(temp1, $letter_code.);

    string_out=catx(",",string_out,new_string);

    i+2;

end;

run;

data_null__
Jade | Level 19

               result=result||id||" "||desc||","; /*would need to add some if conditionings to remove the , at the end of the string if that annoys you */

CATX(',',result,CATX(' ',id,desc));

P5C768
Obsidian | Level 7

Vincent, I am trying the hash solution, but doesn't this just grab the last non-zero product?  In the code you posted, I get a null result field and a desc field of danish.  I would like to output all non-zero products, so from the code you posted above, the result should be 1 Apple, 2 Bread, 4 Danish.

Vince28_Statcan
Quartz | Level 8

Hi,

I'm sorry I guess I pasted the wrong version. "desc" only contains the name of the last product but your desired output is supposed to be in "result". The reason why it is empty/buggy is because there is no handling of blank padding thus at every loop iteration, the concatenation was trying to concatenate 500 blanks with whatever additionnal info and truncating to the 500 blanks.

Here's the fixed version without removal of the ending comma atm although that isn't overly complicated to add.

data want;

     length id $1. count $1. desc $20. result $500.;

     if _N_=1 then do;

     declare hash myhash(dataset: 'lookuptable');

     myhash.defineKey('id');

     myhash.defineData('desc');

     myhash.defineDone();

     end;

      set have;

      do i=1 to 4; /* taking advantage of fixed length and 24 different codes - here only 4 for the example data */

          id = substr(string, i*2-1, 1);

          count = substr(string, i*2, 1);

          if count NE "0" then do;

               myhash.find();

               result=cat(trim(result), trim(desc), " ", trim(count), ",");

          end;

     end;

     drop desc count i id;

run;

Sorry about that.

Vincent

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1160 views
  • 0 likes
  • 6 in conversation