BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

I have a large dataset grouped by Field_A and dates for each rows within the groups.  There is also another Field_B that may or may not be intermittent within each group.. I need to pick only the groupings that have Field_B intermittent..  here's what I mean:

HAVE:

Field_A          Date          Field_B
A                 2001.01             T

A                 2001.02             C

A                 2001.03             C

A                 2001.04             T

----------------------------------------------------

B                 2001.01             T 

B                 2001.02             T

B                 2001.03             C

B                 2001.04             C

-----------------------------------------------------

C                 2001.01             C 

C                 2001.02             T

C                 2001.03             C

C                 2001.04             T

-----------------------------------------------------

D                 2001.01             C 

D                 2001.02             C

D                 2001.03             T

D                 2001.04             T

WANT:

Field_A          Date          Field_B
A                 2001.01             T

A                 2001.02             C

A                 2001.03             C

A                 2001.04             T

-----------------------------------------------------

C                 2001.01             T 

C                 2001.02             C

C                 2001.03             T

C                 2001.04             C

           

WANT tables shows that Field_B has values that are mixed within it, not arranged as seen in Field_A = B or D in the HAVE table.

6 REPLIES 6
art297
Opal | Level 21

I think you have to specify exactly what you mean by "intermittent".

podarum
Quartz | Level 8

The options could be either Option 1 : Field_B is 1 value and then the other (eg. T's and then C's or vice versa) OR

Option 2 : T's then C's and then T's again or C's then T's and then C's again...  so if there is only one switch in Feild_B then don't keep . if there is more than one switch in Field_B then keep (sorted by date and groupped by Field_A)

Reeza
Super User

Here's a very basic way. Count the number of switches in a dataset and then select the ones that meet your criteria.

data have;
input Field_A  $        Date   $       Field_B $;
cards;
A                 2001.01             T
A                 2001.02             C
A                 2001.03             C
A                 2001.04             T
B                 2001.01             T
B                 2001.02             T
B                 2001.03             C
B                 2001.04             C
C                 2001.01             C
C                 2001.02             T
C                 2001.03             C
C                 2001.04             T
D                 2001.01             C
D                 2001.02             C
D                 2001.03             T
D                 2001.04             T
;
run;

data have2;
set have;
by field_a field_b notsorted;

retain switch;

if first.field_a then switch=0;

if first.field_b and not first.field_a then switch+1;

run;

proc sql;
create table want as
select * from have2
group by field_a
having max(switch)>1;
quit;

art297
Opal | Level 21

: As usual, you probably have a number of options.  The following probably isn't the most efficient, but easy to follow what the code is doing:

data want (drop=test1-test3);

  set have (in=a) have (in=b);;

  by field_a;

  retain test1-test3;

  length test1 test2 $1;

  if a then do;

    if first.field_a then do;

      test1=field_b;

      call missing(test2);

      call missing(test3);

    end;

    else if missing(test2) then do;

      if field_b ne test1 then test2=field_b;

    end;

    else if field_b ne test2 then test3=1;

  end;

  else if test3 then output;

run;

PGStats
Opal | Level 21

I would count on the BY mechanism to count the number of switches in Field_B, like this :

data have;
input Field_A $ Date $ Field_B $;
datalines;
A                 2001.01             T
A                 2001.02             C
A                 2001.03             C
A                 2001.04             T
B                 2001.01             T 
B                 2001.02             T
B                 2001.03             C
B                 2001.04             C
C                 2001.01             C 
C                 2001.02             T
C                 2001.03             C
C                 2001.04             T
D                 2001.01             C 
D                 2001.02             C
D                 2001.03             T
D                 2001.04             T
;

data want;
do until(last.Field_A);
     set have; by Field_A Field_B notsorted;
     c = sum(c, first.Field_b);
     end;
do until(last.Field_A);
     set have; by Field_A Field_B notsorted;
     if c <= 2 then output;
     end;
drop c;
run;

proc print data=want noobs; run;

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 925 views
  • 2 likes
  • 5 in conversation