Hello Everyone I have two table master (master_grossery) and child (grossery). I want to find the items from grosssery table which are not matcing with master_grossery table
The problem is that all items in child table separated by exclamatory mark.
How can I take single item from it and then match with master table to see whether present or not.
data grossery;
input name $ item $;
cards;
fruit_name apple
fruit-veg orange!onion!potato
vegetable tomato!cashew!drumstick
name_of_veg garlic!ginger!sapota
;
run;
data master_grossery;
input id mitem $ price;
cards;
101 orange 23
102 apple 34
103 garlic 21
104 sapota 45
105 custardapple 32
106 bittergourd 12
107 sugar 45
108 onion 43
109 sapota 102
;
run;
proc sql;
create table item_not_found as
select * from grossery
where item not in (select mitem from master_grossery)
;
quit;
--> The output should be displayed like
Potato under column fruit-veg is not available
Tomato under column vegetable is not available
Cashew under column vegetable is not available
Drumstick under column vegetable is not available
Ginger under column name_of_veg is not available
How about using Proc Format?
Create a Format for the Master data set. The read your CHILD data set and while you read, parse it to get the individual name and apply the format. If it does not give 'Y' the output the obsevation with the flag having your label.
data master_grossery;
input id mitem $12. price;
cards;
101 orange 23
102 apple 34
103 garlic 21
104 sapota 45
105 custardapple 32
106 bittergourd 12
107 sugar 45
108 onion 43
109 sapota 102
;
run;
data cnt;
set master_grossery(keep = mitem rename = (mitem=start)) end = eof;
retain fmtname 'mitem' type 'C' Label 'Y';
output;
if eof then do;
HLO = 'O';
start = ' ';
label = 'N';
output;
end;
run;
proc sort data = cnt nodupkey;
by start;
run;
proc format cntlin = cnt;
run;
data grossery(keep=i_name name flag);
length i_name $10;
input name $11. item $25.;
nw = countw(item, '!');
do i = 1 to nw;
i_name = scan(item, i, '!');
if put(i_name, mitem.) ^= 'Y' then do;
flag = 'Not available';
output;
end;
end;
cards;
fruit_name apple
fruit-veg orange!onion!potato
vegetable tomato!cashew!drumstick
name_of_veg garlic!ginger!sapota
;
run;
Instead of Proc Format, we can use Hash Table to search for the item and when
the name is not in the hash table, the
flag will be "Not available".
data master_grossery;
input id mitem $12. price;
cards;
101 orange 23
102 apple 34
103 garlic 21
104 sapota 45
105 custardapple 32
106 bittergourd 12
107 sugar 45
108 onion 43
109 sapota 102
;
run;
data want(keep=i_name name flag);
if _n_ = 1 then do;
if 0 then set master_grossery(keep=mitem);
declare hash h();
h.definekey('mitem');
h.definedone();
do until(eof);
set master_grossery(keep=mitem) end = eof;
if h.find() ^= 0 then h.add();
end;
end;
length i_name $10;
input name $11. item $25.;
nw = countw(item, '!');
do i = 1 to nw;
i_name = scan(item, i, '!');
if h.find(key:i_name) ^= 0 then do;
flag = 'Not available';
output;
end;
end;
cards;
fruit_name apple
fruit-veg orange!onion!potato
vegetable tomato!cashew!drumstick
name_of_veg garlic!ginger!sapota
;
run;
: I think it would be a lot easier if you first include a datastep to expand the grossery dataset. e.g.:
data grossery (keep=name item);
informat name $50.;
informat items $100.;
length item $20.;
input name $ items $;
i=1;
do while (scan(items,i,'!') ne '');
item=scan(items,i,'!');
output;
i+1;
end;
cards;
fruit_name apple
fruit-veg orange!onion!potato
vegetable tomato!cashew!drumstick
name_of_veg garlic!ginger!sapota
;
run;
data master_grossery;
length mitem $20.;
input id mitem price;
cards;
101 orange 23
102 apple 34
103 garlic 21
104 sapota 45
105 custardapple 32
106 bittergourd 12
107 sugar 45
108 onion 43
109 sapota 102
;
run;
proc sql;
select catx(' ',item,'from column',name,'is not available')
from grossery
where item not in (select mitem from master_grossery)
;
quit;
Thank you Arthur for your help.
Suppose I change the data slightly. then the word is not splitted properly.
Could you please help whether I made any mistake in the program as the dataset contain exclamatory mark for 4 columns. Is it possible that a single loop can split the word in a single dataset.
Thank you very much in advance
data grossery (keep=id name item);
informat name $100.;
informat items $200.;
length item $20.;
input id name $ items $;
i=1;
do while (scan(items,i,'!') ne '');
name=scan(name,i,'!');
item=scan(items,i,'!');
output;
i+1;
end;
cards;
101 fruit_name!fruit_na1 apple
102 fruit-veg!fruit1!fruit2 orange!onion!potato
103 vegetable tomato!cashew!drumstick
104 name_of_veg garlic!ginger!sapota
;
run;
data grossery (keep=id name item);
length id $8.;
length name $50.;
length item $20.;
input;
id=scan(_infile_,1,'! ');
name=scan(_infile_,2,'! ');
i=3;
do while (scan(_infile_,i,'! ') ne '');
item=scan(_infile_,i,'! ');
output;
i+1;
end;
cards;
101 fruit_name!fruit_na1 apple
102 fruit-veg!fruit1!fruit2 orange!onion!potato
103 vegetable tomato!cashew!drumstick
104 name_of_veg garlic!ginger!sapota
;
run;
Thank you Arthur once again. But I am not getting correct output.
The value "fruit_na1" is wrongly populated in variable 'name'. In stead, it is captured by other variable 'Item' like below
id name item
101 fruit_name fruit_na1
101 fruit_name apple
The output for that row should be
id name item
101 fruit_name apple
101 fruit_na apple
How about using Proc Format?
Create a Format for the Master data set. The read your CHILD data set and while you read, parse it to get the individual name and apply the format. If it does not give 'Y' the output the obsevation with the flag having your label.
data master_grossery;
input id mitem $12. price;
cards;
101 orange 23
102 apple 34
103 garlic 21
104 sapota 45
105 custardapple 32
106 bittergourd 12
107 sugar 45
108 onion 43
109 sapota 102
;
run;
data cnt;
set master_grossery(keep = mitem rename = (mitem=start)) end = eof;
retain fmtname 'mitem' type 'C' Label 'Y';
output;
if eof then do;
HLO = 'O';
start = ' ';
label = 'N';
output;
end;
run;
proc sort data = cnt nodupkey;
by start;
run;
proc format cntlin = cnt;
run;
data grossery(keep=i_name name flag);
length i_name $10;
input name $11. item $25.;
nw = countw(item, '!');
do i = 1 to nw;
i_name = scan(item, i, '!');
if put(i_name, mitem.) ^= 'Y' then do;
flag = 'Not available';
output;
end;
end;
cards;
fruit_name apple
fruit-veg orange!onion!potato
vegetable tomato!cashew!drumstick
name_of_veg garlic!ginger!sapota
;
run;
Instead of Proc Format, we can use Hash Table to search for the item and when
the name is not in the hash table, the
flag will be "Not available".
data master_grossery;
input id mitem $12. price;
cards;
101 orange 23
102 apple 34
103 garlic 21
104 sapota 45
105 custardapple 32
106 bittergourd 12
107 sugar 45
108 onion 43
109 sapota 102
;
run;
data want(keep=i_name name flag);
if _n_ = 1 then do;
if 0 then set master_grossery(keep=mitem);
declare hash h();
h.definekey('mitem');
h.definedone();
do until(eof);
set master_grossery(keep=mitem) end = eof;
if h.find() ^= 0 then h.add();
end;
end;
length i_name $10;
input name $11. item $25.;
nw = countw(item, '!');
do i = 1 to nw;
i_name = scan(item, i, '!');
if h.find(key:i_name) ^= 0 then do;
flag = 'Not available';
output;
end;
end;
cards;
fruit_name apple
fruit-veg orange!onion!potato
vegetable tomato!cashew!drumstick
name_of_veg garlic!ginger!sapota
;
run;
: Is the following what you are trying to get?:
data grossery (keep=id name item);
length id $8.;
length name $50.;
length item $20.;
length name_part $80.;
length item_part $80.;
input;
id=scan(_infile_,1,' ');
name_part=scan(_infile_,2,' ');
item_part=scan(_infile_,3,' ');
i=1;
do while (scan(name_part,i,'!') ne '');
name=scan(name_part,i,'!');
i+1;
j=1;
do while (scan(item_part,j,'!') ne '');
item=scan(item_part,j,'!');
output;
j+1;
end;
end;
cards;
101 fruit_name!fruit_na1 apple
102 fruit-veg!fruit1!fruit2 orange!onion!potato
103 vegetable tomato!cashew!drumstick
104 name_of_veg garlic!ginger!sapota
;
run;
Thank you datasp for showing two diff methods to resolve. I learn new concept of Hash.
Thank you Arthur very much. Nicely explained.
My issue resolved.
Thanks once again.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.