Hi Community,
I have a dataset looks like below:
apple | fruits |
apple | fruits |
apple | fruits |
apple | fruits |
apple | vegetables |
apple | fruits |
artichoke | vegetables |
artichoke | vegetables |
asparagus | vegetables |
asparagus | vegetables |
broccoli | vegetables |
broccoli | vegetables |
carrots | vegetables |
eggplant | vegetables |
eggplant | fruits |
orange | fruits |
orange | fruits |
orange | fruits |
orange | fruits |
orange | fruits |
orange | fruits |
pear | vegetables |
pear | fruits |
pear | fruits |
pear | fruits |
pear | fruits |
pear | fruits |
pepper | vegetables |
pepper | fruits |
I want to fish out the items which have been assigned as the wrong category as below
apple | vegetables |
apple | fruits |
eggplant | vegetables |
eggplant | fruits |
pear | vegetables |
pear | fruits |
pepper | vegetables |
pepper | fruits |
Is there any effective way to do this?
Thanks you!
Here's a proc sql method. The order of the final table is different, but the result is the same:
data temp;
length one two $32;
input one two;
datalines;
apple fruits
apple fruits
apple fruits
apple fruits
apple vegetables
apple fruits
artichoke vegetables
artichoke vegetables
asparagus vegetables
asparagus vegetables
broccoli vegetables
broccoli vegetables
carrots vegetables
eggplant vegetables
eggplant fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
pear vegetables
pear fruits
pear fruits
pear fruits
pear fruits
pear fruits
pepper vegetables
pepper fruits
;;;
run;
proc sql noprint;
create table want as
select two_count.one, two_count.two
from (select one, count(*) as one_count
from temp
group by one) as one_count,
( select one, two, count(*) as count_both
from temp
group by one, two) as two_count
where one_count.one = two_count.one
and one_count.one_count ne two_count.count_both;
quit;
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Hi zimcom
The question is, how will you determine what is correct/incorrect? In your example, you have 2 observations for Eggplant, 1 categorized as Fruit, 1 as Vegetable and is a great example of a fruit that people generally think of as a vegetable (Tomato would be another).
Here's an idea you could use a PROC SUMMARY to view counts, which can be pushed to a SAS Dataset. You could then review the output and determine what is correct/incorrect.
data input ;
input item : $16. category : $16. ;
cards ;
apple fruits
apple fruits
apple fruits
apple fruits
apple vegetables
apple fruits
artichoke vegetables
artichoke vegetables
asparagus vegetables
asparagus vegetables
broccoli vegetables
broccoli vegetables
carrots vegetables
eggplant vegetables
eggplant fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
pear vegetables
pear fruits
pear fruits
pear fruits
pear fruits
pear fruits
pepper vegetables
pepper fruits
;
run ;
proc summary print;
class item category ;
run ;
how about something like this:
data foodtypes;
length food type $ 12;
input food $ type $;
datalines;
apple fruits
apple fruits
apple fruits
apple fruits
apple vegetables
apple fruits
artichoke vegetables
artichoke vegetables
asparagus vegetables
asparagus vegetables
broccoli vegetables
broccoli vegetables
carrots vegetables
eggplant vegetables
eggplant fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
pear vegetables
pear fruits
pear fruits
pear fruits
pear fruits
pear fruits
pepper vegetables
pepper fruits
;
run;
proc freq data=foodtypes noprint;
tables food*type / out=crosstab;
run;
proc freq data=crosstab noprint;
tables food / out=foodfreq;
run;
data discrepant;
set foodfreq;
if count=2;
run;
proc sql noprint;
select "'"||strip(food)||"'" into :FoodList separated by ','
from discrepant;
quit;
proc print data=crosstab noobs;
var food type;
where food in (&foodlist);
run;
@cminard @AMSAS @noling @Astounding
They all worked, one way or the other, that is the power of SAS and that is the power of programming!
THANK YOU ALL!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.