BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zimcom
Pyrite | Level 9

 

Hi Community,

 

I have a dataset looks like below:

apple fruits
apple fruits
apple fruits
apple fruits
apple vegetables
apple fruits
artichokevegetables
artichokevegetables
asparagusvegetables
asparagusvegetables
broccoli vegetables
broccoli vegetables
carrotsvegetables
eggplantvegetables
eggplantfruits
orangefruits
orangefruits
orangefruits
orangefruits
orangefruits
orangefruits
pearvegetables
pearfruits
pearfruits
pearfruits
pearfruits
pearfruits
peppervegetables
pepperfruits

 

I want to fish out the items which have been assigned as the wrong category as below

apple vegetables
apple fruits
eggplantvegetables
eggplantfruits
pearvegetables
pearfruits
peppervegetables
pepperfruits

 

Is there any effective way to do this?

 

Thanks you!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star
Plenty of ways... here is one:

proc freq data=have noprint;
tables food * category / out=counts;
run;

data want;
set counts;
by item;
if first.item = 0 or last.item = 0;
run;

View solution in original post

5 REPLIES 5
Astounding
PROC Star
Plenty of ways... here is one:

proc freq data=have noprint;
tables food * category / out=counts;
run;

data want;
set counts;
by item;
if first.item = 0 or last.item = 0;
run;
noling
SAS Employee

Here's a proc sql method. The order of the final table is different, but the result is the same:

data temp;
length one two $32;
input one two;

datalines;
apple 	fruits
apple 	fruits
apple 	fruits
apple 	fruits
apple 	vegetables
apple 	fruits
artichoke vegetables
artichoke vegetables
asparagus vegetables
asparagus vegetables
broccoli vegetables
broccoli vegetables
carrots vegetables
eggplant vegetables
eggplant fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
pear vegetables
pear fruits
pear fruits
pear fruits
pear fruits
pear fruits
pepper vegetables
pepper fruits
;;;
run;

proc sql noprint;
	create table want as
	select two_count.one, two_count.two 
	from (select one, count(*) as one_count
		  from temp
		  group by one) as one_count, 
		  (	select one, two, count(*) as count_both
	from temp
	group by one, two) as two_count
	where one_count.one = two_count.one
		and one_count.one_count ne two_count.count_both;
quit;

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

AMSAS
SAS Super FREQ

Hi 

 

The question is, how will you determine what is correct/incorrect? In your example, you have 2 observations for Eggplant, 1 categorized as Fruit, 1 as Vegetable and is a great example of a fruit that people generally think of as a vegetable (Tomato would be another).

Here's an idea you could use a PROC SUMMARY to view counts, which can be pushed to a SAS Dataset. You could then review the output and determine what is correct/incorrect.

data input ;
	input item : $16. category : $16. ;
cards ;

apple 	fruits
apple 	fruits
apple 	fruits
apple 	fruits
apple 	vegetables
apple 	fruits
artichoke	vegetables
artichoke	vegetables
asparagus	vegetables
asparagus	vegetables
broccoli 	vegetables
broccoli 	vegetables
carrots	vegetables
eggplant	vegetables
eggplant	fruits
orange	fruits
orange	fruits
orange	fruits
orange	fruits
orange	fruits
orange	fruits
pear	vegetables
pear	fruits
pear	fruits
pear	fruits
pear	fruits
pear	fruits
pepper	vegetables
pepper	fruits
;
run ;

proc summary print;
	class item category ;
run ;

 

 

 

 

cminard
Obsidian | Level 7

how about something like this:

 

data foodtypes;
length food type $ 12;
input food $ type $;
datalines;
apple fruits
apple fruits
apple fruits
apple fruits
apple vegetables
apple fruits
artichoke vegetables
artichoke vegetables
asparagus vegetables
asparagus vegetables
broccoli vegetables
broccoli vegetables
carrots vegetables
eggplant vegetables
eggplant fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
orange fruits
pear vegetables
pear fruits
pear fruits
pear fruits
pear fruits
pear fruits
pepper vegetables
pepper fruits
;
run;

proc freq data=foodtypes noprint;
tables food*type / out=crosstab;
run;

proc freq data=crosstab noprint;
tables food / out=foodfreq;
run;

data discrepant;
set foodfreq;
if count=2;
run;

proc sql noprint;
select "'"||strip(food)||"'" into :FoodList separated by ','
from discrepant;
quit;

proc print data=crosstab noobs;
var food type;
where food in (&foodlist);
run;

zimcom
Pyrite | Level 9

@cminard @AMSAS @noling @Astounding 

 

They all worked, one way or the other, that is the power of SAS and that is the power of programming!

 

THANK YOU ALL! 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 733 views
  • 0 likes
  • 5 in conversation