I have a dataset in which I am trying to find the Outlets which are selling only two category i.e 'CSP' and 'Stills' and not other categories
for eg:-
24034743 | CSP |
24034743 | Stills |
27344795 | Water |
27344795 | Stills |
27344795 | CSP |
24660690 | CSP |
In this outlet 24034743 sells CSP and Stills, outlet 27344795 sells CSP,Water and Stills and outlet 24660690 sells only CSP in this case I need only outlet which sells CSP and Stills.
I tried few methods not got success.
Please help me on this.
option compress=yes;
data test;
input Outlet $ Category $;
Cards;
27344795 CSP
29306680 Water
29106861 Coffee
24660690 CSP
25298585 CSP
27852713 CSP
27344795 Water
24034743 CSP
24792773 Coffee
2991031 CSP
21362260 CSP
29003700 CSP
27314730 CSP
26678079 CSP
22625435 CSP
29702708 CSP
25459917 Stills
24010565 CSP
28131642 CSP
22078107 CSP
21501108 CSP
28393563 CSP
25784580 CSP
24976068 CSP
29306680 CSP
25784580 Stills
21038003 CSP
22145937 CSP
21906503 CSP
28506284 Stills
29993536 CSP
23915168 CSP
21153504 Coffee
24981847 CSP
21977529 Water
22684050 Water
29291387 Stills
25298585 Coffee
29106861 CSP
24976068 Stills
28393563 Coffee
28506284 CSP
22078107 Stills
29106861 Stills
21906503 Stills
21501108 Coffee
29003700 Stills
29306680 Stills
28393563 Stills
27314730 Stills
27344795 Stills
28506284 Water
27933637 CSP
2991031 Stills
28131642 Water
24034743 Stills
22078107 Water
24976068 Coffee
29306680 Coffee
26857155 CSP
21362260 Water
;
run;
proc sort data =test out=abc nodupkey;
by Outlet Category;
where Category in ('CSP','Stills') and Category not in ('Water', 'Coffee');
run;
proc sort data =test out=abc nodupkey;
by Outlet Category;
where Category in ('CSP','Stills') and Category not in ('Water', 'Coffee');
run;
data abc;
set test;
where Category not in ('Water', 'Coffee') and Category in ('CSP','Stills');
run;
proc sql;
select count(distinct outlet) from test
where Category not in ('Water', 'Coffee') and Category in ('CSP','Stills');
run;
proc sql;
select count (distinct c.Outlet) as o_count
from test as c, test as s
where c.Outlet = s.Outlet
and s.Category like '%CSP%'
and c.Category like '%Stills%';
quit;
run;
proc sql;
create table abc as select distinct b.Outlet as outlet
from test a left join test b
on a.Outlet=b.Outlet
and a.Category=b.Category
where b.Category in ('CSP','Stills') and a.Category not in ('Water', 'Coffee');
quit;
run;
Hi @umeshgiri48
To handle this use case, you can simply add the NODUPKEY option in the proc sort as below:
proc sort data=test out=test_sorted nodupkey;
by Outlet Category;
run;
proc transpose data=test_sorted out=test_tr;
by Outlet;
var Category;
run;
data want (keep=outlet);
set test_tr;
Category = catx(" ",of col:);
if find(Category,"CSP") and find(Category,"Stills") and countw(Category)=2 then output;
run;
Hi @umeshgiri48
Here is one approach to do that:
proc sort data=test out=test_sorted;
by Outlet Category;
run;
proc transpose data=test_sorted out=test_tr;
by Outlet;
var Category;
run;
data want (keep=outlet);
set test_tr;
Category = catx(" ",of col:);
if find(Category,"CSP") and find(Category,"Stills") and countw(Category)=2 then output;
run;
@ed_sas_member this code is fine when there are no duplicate OUTLET/CATEGORY records
For instance, if an OUTLET has, say, 4 CSP's and 3 Still (and no other records), it won't be captured and we will get 0 output
Hi @umeshgiri48
To handle this use case, you can simply add the NODUPKEY option in the proc sort as below:
proc sort data=test out=test_sorted nodupkey;
by Outlet Category;
run;
proc transpose data=test_sorted out=test_tr;
by Outlet;
var Category;
run;
data want (keep=outlet);
set test_tr;
Category = catx(" ",of col:);
if find(Category,"CSP") and find(Category,"Stills") and countw(Category)=2 then output;
run;
Hi @umeshgiri48
You can use this for example :
proc sql;
select outlet
from (select distinct *
from test
group by Outlet)
group by outlet
having count(distinct Category)=2 and sum(Category in ('CSP','Stills'))=2;
quit;
HI @umeshgiri48 This is a natural case for PROC SQL and just one step if i understand your requirement correctly
data test;
input Outlet $ Category $;
Cards;
24034743 CSP
24034743 Stills
27344795 Water
27344795 Stills
27344795 CSP
24660690 CSP
;
run;
proc sql;
create table want as
select *
from test
group by outlet
having count(distinct Category)=2 and sum(Category in ('CSP','Stills'))=2;
quit;
/*Or I am also assuming the instead of the previous HAVING filter, the one below could be what you are after*/
having count(distinct Category)>1 and count(*)= sum(Category in ('CSP','Stills'));
@novinosrin I agree that this is a ready-made task for PROC SQL.
BUT … there is an embedded assumption in your code, namely that there are no duplicate OUTLET/CATEGORY records.
For instance, if an OUTLET has, say, 2 CSP's and 1 Still (and no other records), it won't be captured - presumably a false negative.
Of course, the assumption may very well be the case. If so, the OP would be safe.
Hi @mkeintz You are right. I had that doubt too. I often prefer any OP's to offer us a feedback and confirm if the communication is right for us to make edit/changes if needed. So yes, some OPs are interactive while others not so much. Let's see.
@novinosrin this query will only provide an outlet which is having CSP and Stills but I need only those Outlets which are selling only CSP and Stills not other categories like water and etc
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.