I have a large dataset grouped by Field_A and dates for each rows within the groups. There is also another Field_B that may or may not be intermittent within each group.. I need to pick only the groupings that have Field_B intermittent.. here's what I mean:
HAVE:
Field_A Date Field_B
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
----------------------------------------------------
B 2001.01 T
B 2001.02 T
B 2001.03 C
B 2001.04 C
-----------------------------------------------------
C 2001.01 C
C 2001.02 T
C 2001.03 C
C 2001.04 T
-----------------------------------------------------
D 2001.01 C
D 2001.02 C
D 2001.03 T
D 2001.04 T
WANT:
Field_A Date Field_B
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
-----------------------------------------------------
C 2001.01 T
C 2001.02 C
C 2001.03 T
C 2001.04 C
WANT tables shows that Field_B has values that are mixed within it, not arranged as seen in Field_A = B or D in the HAVE table.
I think you have to specify exactly what you mean by "intermittent".
The options could be either Option 1 : Field_B is 1 value and then the other (eg. T's and then C's or vice versa) OR
Option 2 : T's then C's and then T's again or C's then T's and then C's again... so if there is only one switch in Feild_B then don't keep . if there is more than one switch in Field_B then keep (sorted by date and groupped by Field_A)
Here's a very basic way. Count the number of switches in a dataset and then select the ones that meet your criteria.
data have;
input Field_A $ Date $ Field_B $;
cards;
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
B 2001.01 T
B 2001.02 T
B 2001.03 C
B 2001.04 C
C 2001.01 C
C 2001.02 T
C 2001.03 C
C 2001.04 T
D 2001.01 C
D 2001.02 C
D 2001.03 T
D 2001.04 T
;
run;
data have2;
set have;
by field_a field_b notsorted;
retain switch;
if first.field_a then switch=0;
if first.field_b and not first.field_a then switch+1;
run;
proc sql;
create table want as
select * from have2
group by field_a
having max(switch)>1;
quit;
: As usual, you probably have a number of options. The following probably isn't the most efficient, but easy to follow what the code is doing:
data want (drop=test1-test3);
set have (in=a) have (in=b);;
by field_a;
retain test1-test3;
length test1 test2 $1;
if a then do;
if first.field_a then do;
test1=field_b;
call missing(test2);
call missing(test3);
end;
else if missing(test2) then do;
if field_b ne test1 then test2=field_b;
end;
else if field_b ne test2 then test3=1;
end;
else if test3 then output;
run;
I would count on the BY mechanism to count the number of switches in Field_B, like this :
data have;
input Field_A $ Date $ Field_B $;
datalines;
A 2001.01 T
A 2001.02 C
A 2001.03 C
A 2001.04 T
B 2001.01 T
B 2001.02 T
B 2001.03 C
B 2001.04 C
C 2001.01 C
C 2001.02 T
C 2001.03 C
C 2001.04 T
D 2001.01 C
D 2001.02 C
D 2001.03 T
D 2001.04 T
;
data want;
do until(last.Field_A);
set have; by Field_A Field_B notsorted;
c = sum(c, first.Field_b);
end;
do until(last.Field_A);
set have; by Field_A Field_B notsorted;
if c <= 2 then output;
end;
drop c;
run;
proc print data=want noobs; run;
PG
Liked!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.