BookmarkSubscribeRSS Feed
marleeakerson
Calcite | Level 5

Hello, 

 

I need figuring out the code to stratify my data. 

For instance, take this sample data: 

ID              Index

1                  A

1                  B

1                  C

2                  A

2                  B

3                  C

3                  B

3                  A

4                  A

 

So I am trying to make two output data sets  - 

  • One table where it only contains the rows where the ID has both A and B Index
  • One table where it only contains the rows where the ID has both A and C Index

Does anyone have any ideas? 

 

Thank you!

2 REPLIES 2
PaigeMiller
Diamond | Level 26

This creates flags that indicate if an ID has both A and B, or both A and C.

 

proc sql;
	create table want as select *,
		max(index='A')*max(index='B') as flag_ab,
		max(index='A')*max(index='C') as flag_ac
	from have
	group by id;
quit;

I advise against splitting these up into separate data sets, as that is usually unnecessary. If you want to do an analysis of just the IDs that have both A and B, you can use (in a PROC or DATA step)

 

where flag_ab=1;

 

--
Paige Miller
Ksharp
Super User
data have;
input ID index $;
cards;
1                  A
1                  B
1                  C
2                  A
2                  B
3                  C
3                  B
3                  A
4                  A
;
proc sql;
create table want1 as
select * from have group by id 
having sum(index not in ('A' 'B'))=0 and count(distinct index)=2; 

create table want2 as
select * from have group by id 
having sum(index not in ('A' 'C'))=0 and count(distinct index)=2; 
quit;