Hi
I have a large dataset with patients and their consumed fruits. And I would like to create a new dataset with only the lines where a patient has consumed any fruit twice or more.
This is my starting point:
data exercise;
input patient fruit$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run;
And this is how it should look after:
1 Apple
1 Apple
1 Berry
1 Berry
3 Pear
3 Pear
4 Berry
4 Berry
4 Berry
Thank you for your help!
Take a look at First. Last. and BY group processing
data exercise;
input patient fruit$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run;
proc sort data=exercise out=srtd_exercise ;
by patient fruit ;
run ;
data want ;
set srtd_exercise ;
by patient fruit ;
put _all_ ;
if first.fruit=1 and last.fruit=1 then
delete ;
run ;
Take a look at First. Last. and BY group processing
data exercise;
input patient fruit$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run;
proc sort data=exercise out=srtd_exercise ;
by patient fruit ;
run ;
data want ;
set srtd_exercise ;
by patient fruit ;
put _all_ ;
if first.fruit=1 and last.fruit=1 then
delete ;
run ;
Try this.
proc sql;
create table want as
select
a.*
from
exercise a
join
(select patient, fruit, count(*) as cnt
from exercise
group by 1,2
having count(*) > 1
) b
on a.patient = b.patient
and a.fruit = b.fruit
;
quit;
Thank you both!
Both of your solution work great 😊
data exercise; input patient fruit$; cards; 1 Apple 1 Pear 1 Apple 1 Berry 1 Berry 2 Apple 2 Pear 3 Pear 3 Pear 3 Berry 4 Pear 4 Berry 4 Berry 4 Berry ; run; proc sort data=exercise out=want nouniquekey; by patient fruit; run;
If you have a limited number of values for the variable FRUIT, then consider using a SET statement with one argument per fruit (see the "WHERE=" parameters).
That is, the SET statement will read a collection of subsets (one per fruit). When used with a BY statement, each subset will be processed as a consecutive sequence of obsrvatinos (i.e. all the "Apple", then all the "Berr", etc.) which provides an easy way to detect the presence of more than one obs for a given PATIENT/FRUIT combination:
data exercise;
input patient fruit$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
run;
data want;
set exercise (where=(fruit='Apple'))
exercise (where=(fruit='Pear'))
exercise (where=(fruit='Berry')) ;
by patient fruit;
if not (first.fruit=1 and last.fruit=1);
run;
If first.fruit=1 and last.fruit=1 simultaneously then the record-in-hand is the only one for that patient/fruit, and is not wanted.
But if you have too many distinct FRUIT values, then another approach is needed - one where you can dynamically get the frequency of each fruit.
For what it's worth-
data exercise ;
input patient fruit$ ;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run ;
proc sql ;
create table want as
select a.*
from exercise a
inner join
(select patient, fruit, count(*) as c from exercise group by patient, fruit) b
on a.patient=b.patient and a.fruit=b.fruit and c > 1 ;
quit ;
data exercise ;
input patient fruit$ ;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run ;
data want ;
if _n_ = 1 then do ;
dcl hash h ( ) ;
h.definekey ('fruit') ;
h.definedata ('_iorc_') ;
h.definedone ( ) ;
end ;
do _n_ = 1 by 1 until (last.patient) ;
set exercise ;
by patient ;
if h.find( ) = 0 then _iorc_ = sum (_iorc_ , 1) ;
else _iorc_ = 1 ;
h.replace( ) ;
end ;
do _n_ = 1 to _n_ ;
set exercise ;
h.find( ) ;
if _iorc_ > 1 then output ;
end ;
h.clear( ) ;
run ;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.