I have a large dataset grouped by Field_A and dates for each rows within the groups. There is also another Field_B that may or may not be intermittent within each group.. I need to pick only the groupings that have Field_B intermittent.. here's what I mean:
HAVE:
Field_A Date Field_B
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
----------------------------------------------------
B 2001.01 T
B 2001.02 T
B 2001.03 C
B 2001.04 C
-----------------------------------------------------
C 2001.01 C
C 2001.02 T
C 2001.03 C
C 2001.04 T
-----------------------------------------------------
D 2001.01 C
D 2001.02 C
D 2001.03 T
D 2001.04 T
WANT:
Field_A Date Field_B
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
-----------------------------------------------------
C 2001.01 T
C 2001.02 C
C 2001.03 T
C 2001.04 C
WANT tables shows that Field_B has values that are mixed within it, not arranged as seen in Field_A = B or D in the HAVE table.
I think you have to specify exactly what you mean by "intermittent".
The options could be either Option 1 : Field_B is 1 value and then the other (eg. T's and then C's or vice versa) OR
Option 2 : T's then C's and then T's again or C's then T's and then C's again... so if there is only one switch in Feild_B then don't keep . if there is more than one switch in Field_B then keep (sorted by date and groupped by Field_A)
Here's a very basic way. Count the number of switches in a dataset and then select the ones that meet your criteria.
data have;
input Field_A $ Date $ Field_B $;
cards;
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
B 2001.01 T
B 2001.02 T
B 2001.03 C
B 2001.04 C
C 2001.01 C
C 2001.02 T
C 2001.03 C
C 2001.04 T
D 2001.01 C
D 2001.02 C
D 2001.03 T
D 2001.04 T
;
run;
data have2;
set have;
by field_a field_b notsorted;
retain switch;
if first.field_a then switch=0;
if first.field_b and not first.field_a then switch+1;
run;
proc sql;
create table want as
select * from have2
group by field_a
having max(switch)>1;
quit;
: As usual, you probably have a number of options. The following probably isn't the most efficient, but easy to follow what the code is doing:
data want (drop=test1-test3);
set have (in=a) have (in=b);;
by field_a;
retain test1-test3;
length test1 test2 $1;
if a then do;
if first.field_a then do;
test1=field_b;
call missing(test2);
call missing(test3);
end;
else if missing(test2) then do;
if field_b ne test1 then test2=field_b;
end;
else if field_b ne test2 then test3=1;
end;
else if test3 then output;
run;
I would count on the BY mechanism to count the number of switches in Field_B, like this :
data have;
input Field_A $ Date $ Field_B $;
datalines;
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
B 2001.01 T
B 2001.02 T
B 2001.03 C
B 2001.04 C
C 2001.01 C
C 2001.02 T
C 2001.03 C
C 2001.04 T
D 2001.01 C
D 2001.02 C
D 2001.03 T
D 2001.04 T
;
data want;
do until(last.Field_A);
set have; by Field_A Field_B notsorted;
c = sum(c, first.Field_b);
end;
do until(last.Field_A);
set have; by Field_A Field_B notsorted;
if c <= 2 then output;
end;
drop c;
run;
proc print data=want noobs; run;
PG
Liked!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.