BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

## Keep lines where the values are the same

Hi

I have a large dataset with patients and their consumed fruits. And I would like to create a new dataset with only the lines where a patient has consumed any fruit twice or more.

This is my starting point:

```data exercise;input patient fruit\$;cards;1 Apple1 Pear1 Apple1 Berry1 Berry2 Apple2 Pear3 Pear3 Pear3 Berry4 Pear4 Berry4 Berry4 Berry;run;
```

And this is how it should look after:

1 Apple
1 Apple
1 Berry
1 Berry
3 Pear
3 Pear
4 Berry
4 Berry
4 Berry

Thank you for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Keep lines where the values are the same

Take a look at First. Last. and BY group processing

Spoiler
``````data exercise;
input patient fruit\$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run;

proc sort data=exercise out=srtd_exercise ;
by patient fruit ;
run ;

data want ;
set srtd_exercise ;
by patient fruit ;
put _all_ ;
if first.fruit=1 and last.fruit=1 then
delete ;
run ;``````

9 REPLIES 9
SAS Super FREQ

## Re: Keep lines where the values are the same

Take a look at First. Last. and BY group processing

Spoiler
``````data exercise;
input patient fruit\$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run;

proc sort data=exercise out=srtd_exercise ;
by patient fruit ;
run ;

data want ;
set srtd_exercise ;
by patient fruit ;
put _all_ ;
if first.fruit=1 and last.fruit=1 then
delete ;
run ;``````

Obsidian | Level 7

## Re: Keep lines where the values are the same

Try this.

``````proc sql;
create table want as
select
a.*
from
exercise  a
join
(select patient, fruit, count(*) as cnt
from exercise
group by 1,2
having count(*) > 1
)  b
on a.patient = b.patient
and a.fruit = b.fruit
;
quit;

``````
Obsidian | Level 7

## Re: Keep lines where the values are the same

Thank you both!
Both of your solution work great 😊

Super User

## Re: Keep lines where the values are the same

```data exercise;
input patient fruit\$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run;

proc sort data=exercise out=want nouniquekey;
by patient fruit;
run;```
Obsidian | Level 7

## Re: Keep lines where the values are the same

Very nice. Didn't know that option existed until now.
PROC Star

## Re: Keep lines where the values are the same

If you have a limited number of values for the variable FRUIT, then consider using a SET statement with one argument per fruit (see the "WHERE=" parameters).

That is, the SET statement will read a collection of subsets (one per fruit).  When used with a BY statement, each subset will be processed as a consecutive sequence of obsrvatinos (i.e. all the "Apple", then all the "Berr", etc.) which provides an easy way to detect the presence of more than one obs for a given PATIENT/FRUIT combination:

``````data exercise;
input patient fruit\$;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
run;

data want;
set exercise (where=(fruit='Apple'))
exercise (where=(fruit='Pear'))
exercise (where=(fruit='Berry')) ;
by patient fruit;
if not (first.fruit=1 and last.fruit=1);
run;``````

If first.fruit=1 and last.fruit=1 simultaneously then the record-in-hand is the only one for that patient/fruit, and is not wanted.

But if you have too many distinct FRUIT values, then another approach is needed - one where you can dynamically get the frequency of each fruit.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: Keep lines where the values are the same

This approach is not feasible for me, as I have thousend of distinct fruit variables.
Still thank you for your option!
Tourmaline | Level 20

## Re: Keep lines where the values are the same

For what it's worth-

``````
data exercise ;
input patient fruit\$ ;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run ;

proc sql ;
create table want as
select a.*
from exercise a
inner join
(select patient, fruit, count(*) as c from exercise group by patient, fruit) b
on a.patient=b.patient and a.fruit=b.fruit and c > 1 ;
quit ;``````
Tourmaline | Level 20

## Re: Keep lines where the values are the same

``````

data exercise ;
input patient fruit\$ ;
cards;
1 Apple
1 Pear
1 Apple
1 Berry
1 Berry
2 Apple
2 Pear
3 Pear
3 Pear
3 Berry
4 Pear
4 Berry
4 Berry
4 Berry
;
run ;

data want ;
if _n_ = 1 then do ;
dcl hash h ( ) ;
h.definekey ('fruit') ;
h.definedata ('_iorc_') ;
h.definedone ( ) ;
end ;
do _n_ = 1 by 1 until (last.patient) ;
set exercise ;
by patient ;
if h.find( ) = 0 then _iorc_ = sum (_iorc_ , 1) ;
else _iorc_ = 1 ;
h.replace( ) ;
end ;
do _n_ = 1 to _n_ ;
set exercise ;
h.find( ) ;
if _iorc_ > 1 then output ;
end ;
h.clear( ) ;
run ;``````
Discussion stats
• 9 replies
• 1156 views
• 4 likes
• 6 in conversation