Dataset term flagging

Reply
Occasional Contributor
Posts: 6

Dataset term flagging

Hi,

I am looking for the best way to do the following:

I have 2 datasets:

Dataset A contains a list of words e.g.

Product
Sony
Panasonic
JVC
Toshiba

Dataset B contains a list of items purchased (for this example the date and cost variables are irrelevant

DateProductCost
01JAN2013Toshiba100
01JAN2013JVC200
01JAN2013Thomson300
01JAN2013Apple / Sony400

I need an efficient way to flag the products purchased if they are in the dataset A i.e. My Dataset C should look like:

DateProductCostOur Products
01JAN2013Toshiba100Y
01JAN2013JVC200Y
01JAN2013Thomson300
01JAN2013Apple / Sony400Y

Essentially it should look at dataset A and if any of those terms appear in the product variable of Dataset B, even if there are multiple terms, then flag it as Y

Any help?

Super User
Posts: 5,254

Re: Dataset term flagging

proc sql;

     create table want as

          select b.*,

                    case when a.product <> ' ' then 'Y'

                              else ' ' end as our_products

               from a, b

               where b.product contains a.product

     ;

quit;

Data never sleeps
Super Contributor
Posts: 578

Re: Dataset term flagging

This might produce a 1-many join if there happens to be products that are similar.  For example...JV1 and JV11.  Not sure if that's a possibility or not based on the data given.

Super Contributor
Posts: 578

Re: Dataset term flagging

I'm sure there are other ways..but this might work.

proc sql;

alter table ds_b add Our_Products char(1);

update ds_b t1

set Our_Products='Y'

where exists (select * from ds_a da where index(t1.product,da,product)>0);

quit;

PROC Star
Posts: 7,356

Re: Dataset term flagging

Here is one way you could do it with a datastep:

data a;

  input Product $9.;

  cards;

Sony

Panasonic

JVC

Toshiba

;

data b;

  informat date date9.;

  format date date9.;

  informat product $25.;

  input Date Product & Cost;

  cards;

01JAN2013 Toshiba  100

01JAN2013 JVC  200

01JAN2013 Thomson  300

01JAN2013 Apple / Sony  400

;

data _null_;

if 0 then set a nobs=nobs;

CALL SYMPUT('NUMREC',nobs);

stop;

run;

data want (drop=i products: a_product);

  array products(&numrec.) $9.;

  i=0;

  do until (eof1);

    set a (rename=(product=a_product)) end=eof1;

    i=i+1;

    products(i)=a_product;

  end;

  do until (eof2);

    set b end=eof2;

    i=1;

    format our_products $1.;

    call missing(our_products);

    do while (scan(product,i) ne "");

      if scan(product,i) in products then our_products='Y';

      i+1;

    end;

    output;

  end;

run;

Ask a Question
Discussion stats
  • 4 replies
  • 196 views
  • 0 likes
  • 4 in conversation