DATA Step, Macro, Functions and more

find subjects with more than 3 duplicate values

Reply
Frequent Contributor
Frequent Contributor
Posts: 101

find subjects with more than 3 duplicate values

[ Edited ]

Have:

 

idvar1var2
150001
220151
220151
37001
37001
3700

1

220151
3700

1

 

want:

idvar1var2
37001
37001
37001

 

I think there should be an easy way to this in SAS. I need to select the id's where 3 or more observations of the same  ID have var2=1.

 

Any ideas?

 

Thanks

Super User
Posts: 6,933

Re: find subjects with more than 3 duplicate values

proc sql;
create table want as
select id from have
where var2 = 1
group by id
having count(*) >= 3
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Frequent Contributor
Posts: 101

Re: find subjects with more than 3 duplicate values

[ Edited ]

a

Super User
Posts: 6,933

Re: find subjects with more than 3 duplicate values

Just add the other variables in the select list of the SQL statement:

data have;
input id var1 var2;
cards;
1 5000 1
2 2015 1
2 2015 1
3 700 1
3 700 1
3 700 1
;
run;

proc sql;
create table want as
select id, var1, var2 from have
where var2 = 1
group by id
having count(*) >= 3
;
quit;

You can also see the best way to provide test data (data step with inline data in cardsSmiley Wink

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Frequent Contributor
Posts: 101

Re: find subjects with more than 3 duplicate values

[ Edited ]

Hi Kurt,

 this program does not preserve the order of the id's. I need to look at the subsequent observation of the same id's. 

 

for instance,

cards;1 5000 1
2 2015 1
2 2015 1
3 700 1
3 700 1
3 700 1
2 2015 1
3 700 1
3 700 1
;
run;

I want to select those id's separately rather than combining them together.

 

Super User
Posts: 5,081

Re: find subjects with more than 3 duplicate values

ari,

 

Kurt's program still gets you most of the way there.  If you want to extract all records for those IDs, use Kurt's results to extract:

 

proc sql;

create table want_list as

select id from have

where var2=1

group by id

having count(*) >= 3;

create table want as

select * from have where id in (select id from want_list);

quit;

 

You may need to correct the syntax on the final SELECT ... I'm much more at home with a DATA step than SQL.

 

Even in this sample code, SQL does not promise to maintain the order of the incoming records.  It's just a characteristic of SQL.  If you want to guarantee that order, you will need to use a DATA step instead.

Super User
Posts: 6,933

Re: find subjects with more than 3 duplicate values

Do you only need to preserve the original order, or do you need to select ID's that have three successive 1's in the original dataset?

 

Stating your complete requirements shortens the process of finding a working solution considerably Smiley Wink

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Frequent Contributor
Posts: 101

Re: find subjects with more than 3 duplicate values

Hi Kurt,

 

Yes, I need to preserve the original order and at the same time I need to select ID's with 3 successive 1's. I think sequence no as suggested by Linush works. I will try that option.

 

Thanks,

 

Super User
Posts: 5,256

Re: find subjects with more than 3 duplicate values

If the original sort order is that important you need a variable that holds that information. Like a datatime. If that's doesn't exist have a sequence no added in a preceeding step, then use that in an order by clause in @KurtBremser SQL.
Data never sleeps
Frequent Contributor
Frequent Contributor
Posts: 101

Re: find subjects with more than 3 duplicate values

Sounds good idea. I will check this. Thanks

Super User
Posts: 9,676

Re: find subjects with more than 3 duplicate values


data have;
input id var1 var2;
cards;
1 5000 1
2 2015 1
2 2015 1
3 700 1
3 700 1
3 700 1
;
run;
data want;
 do i=1 by 1 until(last.id);
  set have;
  by id;
 end;
 do until(last.id);
  set have;
  by id;
  if i gt 2 then output;
 end;
 drop i;
run;

Ask a Question
Discussion stats
  • 10 replies
  • 386 views
  • 1 like
  • 5 in conversation