Solved: Flagging out observations based on multiple by group values

Nikos · Posted 05-03-2014 11:05 AM

Dear all,

I have a huge dataset with the following info

ID_A	ID_B	EXP_A	EXP_B	CREATION_DATE	RAT_X
1	4797	6667	KMH1	24OCT2011:13:52:22	4
1	4797	6667		06AUG2012:22:44:30	17
55	11236	6667	ADN1	03JUN2012:08:05:29	3
1	4797	6667	MR01	27MAR2012:08:56:08	17
1	4797	6667	JHM1	24OCT2011:13:51:46	4
1	4797	6667	JHM2	24OCT2011:13:51:43	4
55	11236	6667	ADN1	03JUN2013:10:54:19	3

I need to flag or save in a different dataset all the observations that for variable EXP_A have different EXP_B and RAT_X

Please note that generally the EXP_A is grouped under ID_A & ID_B, although it seems that the same EXP_A can be found under two or more different groupings by ID_A & ID_B values (I do not know whether this is an error or not )

Thank you in advance.

Best regards

Nikos

PGStats · Posted 05-03-2014 02:41 PM

If I understand the problem statement, you may want something like this:

data have;

input ID_A ID_B EXP_A $ EXP_B $ CREATION_DATE $ RAT_X;

drop creation_date;

datalines;

1 4797 6667 KMH1 24OCT2011:13:52:22 4

1 4797 6667 . 06AUG2012:22:44:30 17

55 11236 6667 ADN1 03JUN2012:08:05:29 3

1 4797 6667 MR01 27MAR2012:08:56:08 17

1 4797 6667 JHM1 24OCT2011:13:51:46 4

1 4797 6667 JHM2 24OCT2011:13:51:43 4

55 11236 6667 ADN1 03JUN2013:10:54:19 3

;

proc sql;

create table flagged as

select ID_A, ID_B, EXP_A

from have

group by ID_A, ID_B, EXP_A

having count(distinct catx("-", EXP_B, RAT_X)) > 1;

select * from flagged;

quit;

PG

View solution in original post

PGStats · Posted 05-03-2014 02:41 PM

If I understand the problem statement, you may want something like this:

data have;

input ID_A ID_B EXP_A $ EXP_B $ CREATION_DATE $ RAT_X;

drop creation_date;

datalines;

1 4797 6667 KMH1 24OCT2011:13:52:22 4

1 4797 6667 . 06AUG2012:22:44:30 17

55 11236 6667 ADN1 03JUN2012:08:05:29 3

1 4797 6667 MR01 27MAR2012:08:56:08 17

1 4797 6667 JHM1 24OCT2011:13:51:46 4

1 4797 6667 JHM2 24OCT2011:13:51:43 4

55 11236 6667 ADN1 03JUN2013:10:54:19 3

;

proc sql;

create table flagged as

select ID_A, ID_B, EXP_A

from have

group by ID_A, ID_B, EXP_A

having count(distinct catx("-", EXP_B, RAT_X)) > 1;

select * from flagged;

quit;

PG

Flagging out observations based on multiple by group values

Re: Flagging out observations based on multiple by group values

Re: Flagging out observations based on multiple by group values

Flagging out observations based on multiple by group values

Re: Flagging out observations based on multiple by group values

Re: Flagging out observations based on multiple by group values

Click image to register for webinar

Classroom Training Available!