BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

I have a large dataset grouped by Field_A and dates for each rows within the groups.  There is also another Field_B that may or may not be intermittent within each group.. I need to pick only the groupings that have Field_B intermittent..  here's what I mean:

HAVE:

Field_A          Date          Field_B
A                 2001.01             T

A                 2001.02             C

A                 2001.03             C

A                 2001.04             T

----------------------------------------------------

B                 2001.01             T 

B                 2001.02             T

B                 2001.03             C

B                 2001.04             C

-----------------------------------------------------

C                 2001.01             C 

C                 2001.02             T

C                 2001.03             C

C                 2001.04             T

-----------------------------------------------------

D                 2001.01             C 

D                 2001.02             C

D                 2001.03             T

D                 2001.04             T

WANT:

Field_A          Date          Field_B
A                 2001.01             T

A                 2001.02             C

A                 2001.03             C

A                 2001.04             T

-----------------------------------------------------

C                 2001.01             T 

C                 2001.02             C

C                 2001.03             T

C                 2001.04             C

           

WANT tables shows that Field_B has values that are mixed within it, not arranged as seen in Field_A = B or D in the HAVE table.

6 REPLIES 6
art297
Opal | Level 21

I think you have to specify exactly what you mean by "intermittent".

podarum
Quartz | Level 8

The options could be either Option 1 : Field_B is 1 value and then the other (eg. T's and then C's or vice versa) OR

Option 2 : T's then C's and then T's again or C's then T's and then C's again...  so if there is only one switch in Feild_B then don't keep . if there is more than one switch in Field_B then keep (sorted by date and groupped by Field_A)

Reeza
Super User

Here's a very basic way. Count the number of switches in a dataset and then select the ones that meet your criteria.

data have;
input Field_A  $        Date   $       Field_B $;
cards;
A                 2001.01             T
A                 2001.02             C
A                 2001.03             C
A                 2001.04             T
B                 2001.01             T
B                 2001.02             T
B                 2001.03             C
B                 2001.04             C
C                 2001.01             C
C                 2001.02             T
C                 2001.03             C
C                 2001.04             T
D                 2001.01             C
D                 2001.02             C
D                 2001.03             T
D                 2001.04             T
;
run;

data have2;
set have;
by field_a field_b notsorted;

retain switch;

if first.field_a then switch=0;

if first.field_b and not first.field_a then switch+1;

run;

proc sql;
create table want as
select * from have2
group by field_a
having max(switch)>1;
quit;

art297
Opal | Level 21

: As usual, you probably have a number of options.  The following probably isn't the most efficient, but easy to follow what the code is doing:

data want (drop=test1-test3);

  set have (in=a) have (in=b);;

  by field_a;

  retain test1-test3;

  length test1 test2 $1;

  if a then do;

    if first.field_a then do;

      test1=field_b;

      call missing(test2);

      call missing(test3);

    end;

    else if missing(test2) then do;

      if field_b ne test1 then test2=field_b;

    end;

    else if field_b ne test2 then test3=1;

  end;

  else if test3 then output;

run;

PGStats
Opal | Level 21

I would count on the BY mechanism to count the number of switches in Field_B, like this :

data have;
input Field_A $ Date $ Field_B $;
datalines;
A                 2001.01             T
A                 2001.02             C
A                 2001.03             C
A                 2001.04             T
B                 2001.01             T 
B                 2001.02             T
B                 2001.03             C
B                 2001.04             C
C                 2001.01             C 
C                 2001.02             T
C                 2001.03             C
C                 2001.04             T
D                 2001.01             C 
D                 2001.02             C
D                 2001.03             T
D                 2001.04             T
;

data want;
do until(last.Field_A);
     set have; by Field_A Field_B notsorted;
     c = sum(c, first.Field_b);
     end;
do until(last.Field_A);
     set have; by Field_A Field_B notsorted;
     if c <= 2 then output;
     end;
drop c;
run;

proc print data=want noobs; run;

PG

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1982 views
  • 2 likes
  • 5 in conversation