DATA Step, Macro, Functions and more

find non-matching data

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

find non-matching data

Hello Everyone I have two table master (master_grossery) and child (grossery). I want to find the items from grosssery table which are not matcing with master_grossery table

The problem is that all items in child table separated by exclamatory mark.

How can I take single item from it and then match with master table to see whether present or not.

data grossery;

input name $ item $;

cards;

fruit_name   apple

fruit-veg orange!onion!potato

vegetable tomato!cashew!drumstick

name_of_veg garlic!ginger!sapota

;

run;

data master_grossery;

input id mitem $ price;

cards;

101 orange 23

102 apple 34

103 garlic 21

104 sapota 45

105 custardapple 32

106 bittergourd 12

107 sugar 45

108 onion 43

109 sapota 102

;

run;

proc sql;

create table item_not_found as

select * from grossery

where item not in (select mitem from master_grossery)

;

quit;

--> The output should be displayed like

Potato under column fruit-veg is not available

Tomato under column vegetable is not available

Cashew under column vegetable is not available

Drumstick under column vegetable is not available

Ginger under column name_of_veg is not available


Accepted Solutions
Solution
‎02-09-2015 01:40 AM
Super Contributor
Posts: 298

Re: find non-matching data

How about using Proc Format?

Create a Format for the Master data set. The read your CHILD data set and while you read, parse it to get the individual name and apply the format. If it does not give 'Y' the output the obsevation with the flag having your label.

data master_grossery;

input id mitem $12. price;

cards;

101 orange       23

102 apple        34

103 garlic       21

104 sapota       45

105 custardapple 32

106 bittergourd  12

107 sugar        45

108 onion        43

109 sapota      102

;

run;

data cnt;

   set master_grossery(keep = mitem rename = (mitem=start)) end = eof;

   retain fmtname 'mitem' type 'C' Label 'Y';

   output;

   if eof then do;

      HLO = 'O';

      start = ' ';

      label = 'N';

      output;

   end;

run;

proc sort data = cnt nodupkey;

by start;

run;

proc format cntlin = cnt;

run;

data grossery(keep=i_name name flag);

length i_name $10;

input name $11.  item $25.;

nw = countw(item, '!');

do i = 1 to nw;

   i_name = scan(item, i, '!');

   if put(i_name, mitem.) ^= 'Y' then do;

      flag = 'Not available';

      output;

   end;

end;

cards;

fruit_name   apple

fruit-veg    orange!onion!potato

vegetable    tomato!cashew!drumstick

name_of_veg  garlic!ginger!sapota

;

run;

Instead of Proc Format, we can use Hash Table to search for the item and when

the name is not in the hash table, the

flag will be "Not available".

data master_grossery;

input id mitem $12. price;

cards;

101 orange 23

102 apple 34

103 garlic 21

104 sapota 45

105 custardapple 32

106 bittergourd 12

107 sugar 45

108 onion 43

109 sapota 102

;

run;

data want(keep=i_name name flag);

if _n_ = 1 then do;

if 0 then set master_grossery(keep=mitem);

declare hash h();

h.definekey('mitem');

h.definedone();

do until(eof);

set master_grossery(keep=mitem) end = eof;

if h.find() ^= 0 then h.add();

end;

end;

length i_name $10;

input name $11. item $25.;

nw = countw(item, '!');

do i = 1 to nw;

i_name = scan(item, i, '!');

if h.find(key:i_name) ^= 0 then do;

flag = 'Not available';

output;

end;

end;

cards;

fruit_name apple

fruit-veg orange!onion!potato

vegetable tomato!cashew!drumstick

name_of_veg garlic!ginger!sapota

;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,474

Re: find non-matching data

: I think it would be a lot easier if you first include a datastep to expand the grossery dataset. e.g.:

data grossery (keep=name item);

  informat name $50.;

  informat items $100.;

  length item $20.;

  input name $ items $;

  i=1;

  do while (scan(items,i,'!') ne '');

    item=scan(items,i,'!');

    output;

    i+1;

  end;

cards;

fruit_name   apple

fruit-veg orange!onion!potato

vegetable tomato!cashew!drumstick

name_of_veg garlic!ginger!sapota

;

run;

data master_grossery;

  length mitem $20.;

  input id mitem price;

  cards;

101 orange 23

102 apple 34

103 garlic 21

104 sapota 45

105 custardapple 32

106 bittergourd 12

107 sugar 45

108 onion 43

109 sapota 102

;

run;

proc sql;

select catx(' ',item,'from column',name,'is not available')

   from grossery

     where item not in (select mitem from master_grossery)

;

quit;

Contributor
Posts: 63

Re: find non-matching data

Thank you Arthur for your help.

Suppose I change the data slightly. then the word is not splitted properly.

Could you please help whether I made any mistake in the program as the dataset contain exclamatory mark for 4 columns. Is it possible that a single loop can split the word in a single dataset.

Thank you very much in advance

data grossery (keep=id name item);

  informat name $100.;

  informat items $200.;

  length item $20.;

  input id name $ items $;

  i=1;

  do while (scan(items,i,'!') ne '');

    name=scan(name,i,'!');

    item=scan(items,i,'!');

    output;

    i+1;

  end;

cards;

101 fruit_name!fruit_na1   apple

102 fruit-veg!fruit1!fruit2 orange!onion!potato

103 vegetable tomato!cashew!drumstick

104 name_of_veg garlic!ginger!sapota

;

run;

PROC Star
Posts: 7,474

Re: find non-matching data

: I would have used:

data grossery (keep=id name item);

  length id $8.;

  length name $50.;

  length item $20.;

  input;

  id=scan(_infile_,1,'! ');

  name=scan(_infile_,2,'! ');

  i=3;

  do while (scan(_infile_,i,'! ') ne '');

    item=scan(_infile_,i,'! ');

    output;

    i+1;

  end;

cards;

101 fruit_name!fruit_na1   apple

102 fruit-veg!fruit1!fruit2 orange!onion!potato

103 vegetable tomato!cashew!drumstick

104 name_of_veg garlic!ginger!sapota

;

run;

Contributor
Posts: 63

Re: find non-matching data

Thank you Arthur once again. But I am not getting correct output.

The value "fruit_na1" is wrongly populated in variable 'name'. In stead, it is captured by other variable 'Item' like below

id              name                     item

101           fruit_name            fruit_na1

101           fruit_name            apple

The output for that row should be

id              name                     item

101           fruit_name            apple

101           fruit_na                 apple

PROC Star
Posts: 7,474

Re: find non-matching data

: Then I don't understand what you are trying to extract. Given your example dataset, please show what the desired output file should look like.

Solution
‎02-09-2015 01:40 AM
Super Contributor
Posts: 298

Re: find non-matching data

How about using Proc Format?

Create a Format for the Master data set. The read your CHILD data set and while you read, parse it to get the individual name and apply the format. If it does not give 'Y' the output the obsevation with the flag having your label.

data master_grossery;

input id mitem $12. price;

cards;

101 orange       23

102 apple        34

103 garlic       21

104 sapota       45

105 custardapple 32

106 bittergourd  12

107 sugar        45

108 onion        43

109 sapota      102

;

run;

data cnt;

   set master_grossery(keep = mitem rename = (mitem=start)) end = eof;

   retain fmtname 'mitem' type 'C' Label 'Y';

   output;

   if eof then do;

      HLO = 'O';

      start = ' ';

      label = 'N';

      output;

   end;

run;

proc sort data = cnt nodupkey;

by start;

run;

proc format cntlin = cnt;

run;

data grossery(keep=i_name name flag);

length i_name $10;

input name $11.  item $25.;

nw = countw(item, '!');

do i = 1 to nw;

   i_name = scan(item, i, '!');

   if put(i_name, mitem.) ^= 'Y' then do;

      flag = 'Not available';

      output;

   end;

end;

cards;

fruit_name   apple

fruit-veg    orange!onion!potato

vegetable    tomato!cashew!drumstick

name_of_veg  garlic!ginger!sapota

;

run;

Instead of Proc Format, we can use Hash Table to search for the item and when

the name is not in the hash table, the

flag will be "Not available".

data master_grossery;

input id mitem $12. price;

cards;

101 orange 23

102 apple 34

103 garlic 21

104 sapota 45

105 custardapple 32

106 bittergourd 12

107 sugar 45

108 onion 43

109 sapota 102

;

run;

data want(keep=i_name name flag);

if _n_ = 1 then do;

if 0 then set master_grossery(keep=mitem);

declare hash h();

h.definekey('mitem');

h.definedone();

do until(eof);

set master_grossery(keep=mitem) end = eof;

if h.find() ^= 0 then h.add();

end;

end;

length i_name $10;

input name $11. item $25.;

nw = countw(item, '!');

do i = 1 to nw;

i_name = scan(item, i, '!');

if h.find(key:i_name) ^= 0 then do;

flag = 'Not available';

output;

end;

end;

cards;

fruit_name apple

fruit-veg orange!onion!potato

vegetable tomato!cashew!drumstick

name_of_veg garlic!ginger!sapota

;

run;

PROC Star
Posts: 7,474

Re: find non-matching data

: Is the following what you are trying to get?:

data grossery (keep=id name item);

  length id $8.;

  length name $50.;

  length item $20.;

  length name_part $80.;

  length item_part $80.;

  input;

  id=scan(_infile_,1,' ');

  name_part=scan(_infile_,2,' ');

  item_part=scan(_infile_,3,' ');

  i=1;

  do while (scan(name_part,i,'!') ne '');

    name=scan(name_part,i,'!');

    i+1;

    j=1;

    do while (scan(item_part,j,'!') ne '');

      item=scan(item_part,j,'!');

      output;

      j+1;

    end;

  end;

cards;

101 fruit_name!fruit_na1   apple

102 fruit-veg!fruit1!fruit2 orange!onion!potato

103 vegetable tomato!cashew!drumstick

104 name_of_veg garlic!ginger!sapota

;

run;

Contributor
Posts: 63

Re: find non-matching data

Thank you datasp for showing two diff methods to resolve. I learn new concept of Hash.

Thank you Arthur very much. Nicely explained.

My issue resolved.

Thanks once again.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 408 views
  • 6 likes
  • 3 in conversation