Help with data stratification

marleeakerson · Posted 07-14-2020 03:01 PM

Hello,

I need figuring out the code to stratify my data.

For instance, take this sample data:

ID Index

1 A

1 B

1 C

2 A

2 B

3 C

3 B

3 A

4 A

So I am trying to make two output data sets -

One table where it only contains the rows where the ID has both A and B Index
One table where it only contains the rows where the ID has both A and C Index

Does anyone have any ideas?

Thank you!

PaigeMiller · Posted 07-14-2020 03:08 PM

This creates flags that indicate if an ID has both A and B, or both A and C.

proc sql;
	create table want as select *,
		max(index='A')*max(index='B') as flag_ab,
		max(index='A')*max(index='C') as flag_ac
	from have
	group by id;
quit;

I advise against splitting these up into separate data sets, as that is usually unnecessary. If you want to do an analysis of just the IDs that have both A and B, you can use (in a PROC or DATA step)

where flag_ab=1;

--
Paige Miller

Ksharp · Posted 07-15-2020 08:57 AM

data have;
input ID index $;
cards;
1                  A
1                  B
1                  C
2                  A
2                  B
3                  C
3                  B
3                  A
4                  A
;
proc sql;
create table want1 as
select * from have group by id 
having sum(index not in ('A' 'B'))=0 and count(distinct index)=2; 

create table want2 as
select * from have group by id 
having sum(index not in ('A' 'C'))=0 and count(distinct index)=2; 
quit;

Help with data stratification

Re: Help with data stratification

Re: Help with data stratification

SAS Innovate 2026 Registration is Open