Re: Select groupings if contains an intermittent field

podarum · Posted 10-17-2013 03:38 PM

I have a large dataset grouped by Field_A and dates for each rows within the groups. There is also another Field_B that may or may not be intermittent within each group.. I need to pick only the groupings that have Field_B intermittent.. here's what I mean:

HAVE:

Field_A Date Field_B
A 2001.01 T

A 2001.02 C

A 2001.03 C

A 2001.04 T

----------------------------------------------------

B 2001.01 T

B 2001.02 T

B 2001.03 C

B 2001.04 C

-----------------------------------------------------

C 2001.01 C

C 2001.02 T

C 2001.03 C

C 2001.04 T

-----------------------------------------------------

D 2001.01 C

D 2001.02 C

D 2001.03 T

D 2001.04 T

WANT:

Field_A Date Field_B
A 2001.01 T

A 2001.02 C

A 2001.03 C

A 2001.04 T

-----------------------------------------------------

C 2001.01 T

C 2001.02 C

C 2001.03 T

C 2001.04 C

WANT tables shows that Field_B has values that are mixed within it, not arranged as seen in Field_A = B or D in the HAVE table.

art297 · Posted 10-17-2013 03:50 PM

I think you have to specify exactly what you mean by "intermittent".

podarum · Posted 10-17-2013 04:10 PM

The options could be either Option 1 : Field_B is 1 value and then the other (eg. T's and then C's or vice versa) OR

Option 2 : T's then C's and then T's again or C's then T's and then C's again... so if there is only one switch in Feild_B then don't keep . if there is more than one switch in Field_B then keep (sorted by date and groupped by Field_A)

Reeza · Posted 10-17-2013 04:29 PM

Here's a very basic way. Count the number of switches in a dataset and then select the ones that meet your criteria.

data have;
input Field_A $        Date   $       Field_B $;
cards;
A                 2001.01             T
A                 2001.02             C
A                 2001.03             C
A                 2001.04             T
B                 2001.01             T
B                 2001.02             T
B                 2001.03             C
B                 2001.04             C
C                 2001.01             C
C                 2001.02             T
C                 2001.03             C
C                 2001.04             T
D                 2001.01             C
D                 2001.02             C
D                 2001.03             T
D                 2001.04             T
;
run;

data have2;
set have;
by field_a field_b notsorted;

retain switch;

if first.field_a then switch=0;

if first.field_b and not first.field_a then switch+1;

run;

proc sql;
create table want as
select * from have2
group by field_a
having max(switch)>1;
quit;

art297 · Posted 10-17-2013 04:30 PM

: As usual, you probably have a number of options. The following probably isn't the most efficient, but easy to follow what the code is doing:

data want (drop=test1-test3);

set have (in=a) have (in=b);;

by field_a;

retain test1-test3;

length test1 test2 $1;

if a then do;

if first.field_a then do;

test1=field_b;

call missing(test2);

call missing(test3);

end;

else if missing(test2) then do;

if field_b ne test1 then test2=field_b;

end;

else if field_b ne test2 then test3=1;

end;

else if test3 then output;

run;

PGStats · Posted 10-17-2013 09:18 PM

I would count on the BY mechanism to count the number of switches in Field_B, like this :

data have;
input Field_A $ Date $ Field_B $;
datalines;
A                 2001.01             T
A                 2001.02             C
A                 2001.03             C
A                 2001.04             T
B                 2001.01             T
B                 2001.02             T
B                 2001.03             C
B                 2001.04             C
C                 2001.01             C
C                 2001.02             T
C                 2001.03             C
C                 2001.04             T
D                 2001.01             C
D                 2001.02             C
D                 2001.03             T
D                 2001.04             T
;

data want;
do until(last.Field_A);
     set have; by Field_A Field_B notsorted;
     c = sum(c, first.Field_b);
     end;
do until(last.Field_A);
     set have; by Field_A Field_B notsorted;
     if c <= 2 then output;
     end;
drop c;
run;

proc print data=want noobs; run;

PG

Haikuo · Posted 10-17-2013 09:37 PM

Liked!

Select groupings if contains an intermittent field