Help using Base SAS procedures

selecting a specific sequence of records

Reply
Occasional Contributor
Posts: 12

selecting a specific sequence of records

I am trying to select a specific sequence of records in a SAS dataset. For example, I have repeated measures data with multiple records per case ID. I am trying to select 3 records where variable 2 is flanked by 2 instances of variable 1. So for record 1, variable 1 would be true, then at record 2 variable 2 would be true, and at record 3 variable 1 would again be true. The variable in question is a lab value where I am looking for instances where the lab value is low, then changes to high, then changes back to low. Each case ID could have more than 3 records, but I only want to pick out the records where this low-high-low instance occurs. Anything before or after that instance doesn't matter. If multiple instances occur for the same ID I would like to keep those as well. Is this possible?

PROC Star
Posts: 7,492

selecting a specific sequence of records

The method described at http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back may be perfect for what you are trying to do.

Occasional Contributor
Posts: 12

selecting a specific sequence of records

This solution seems to have worked for picking out the records with the high (middle) value. I am now trying to keep the records that flank this middle value (the lows). Any idea how I could do this? Thanks.

PROC Star
Posts: 7,492

Re: selecting a specific sequence of records

I think it would help if you provide some example data.  If the look-ahead-look-back method worked you could just output all three records at the same time since you have them in memory already.  The only issue I can forsee is that flanking records might be part of yet another set and I don't know if you would only want them represented once.

Occasional Contributor
Posts: 12

Re: selecting a specific sequence of records

Bear with me as I try to create some sample data:

Case ID   X        High        Low

0000001: 50       .             1

0000001: 672     1            .

0000001: 50       .             1

0000002: 50       .             1

0000002: 100     1            .

0000002: 50       .             1

0000002: 50       .             1

0000002: 50       .             1

In the example above, I have successfully assigned a value to the Records where X is > 50 and <=50. This is indicated by the newly created High and Low fields. While it is easy to select the High values, what I want is to select the High as well as the two Lows that flank it (e.g. the first three records of each Case ID). This is tricky for me since there are obviously records that have a Low value w/o a High, and records with High values and no Lows. Does this make sense?

Valued Guide
Posts: 2,177

Re: selecting a specific sequence of records

for Case_id=2 do you want all rows (rows 4-8), or only the three (rows 4-6 in that example) surrounding the "High"?

Occasional Contributor
Posts: 12

Re: selecting a specific sequence of records

Just records 4-6.

Valued Guide
Posts: 2,177

selecting a specific sequence of records

when the data are like

0000001: 50 . 1

0000001: 672 1 .

0000001: 50 . 1

0000001: 672 1 .

0000001: 50 . 1

should row 3 appear twice (once for each of the overlapping trios) or just once?

Occasional Contributor
Posts: 12

selecting a specific sequence of records

In my example above, this is data I am left with after I assign values to an existing data set. As I mentioned, I am left with Case IDs that have a High flanked by two lows. I am also left with Case IDs that have Highs and Lows but no actual Highs that have the flanking.

I want to create a new data set that shows only the records where the flanking occurs. To be more clear, maybe I should have written the sample data as follows:

Case ID     Date                       X        High        Low

0000001    01/01/2010              50       .             1

0000001    02/01/2010              672     1            .

0000001    03/01/2010              50       .             1

0000002    01/01/2011              50       .             1

0000002    02/01/2011              100     1            .

0000002    03/01/2011              50       .             1

0000002    04/01/2011              50       .             1

0000002    05/01/2011              50       .             1

In this example, I want to keep all Case 0000001 records, and only the Case 0000002 records dated 01/01/2011, 02/01/2011, and 03/01/2011 because the remaining records (04/01/2011 & 05/01/2011) do not have the High flanked by two Lows.

So the new dataset would look like:

Case ID     Date                       X        High        Low

0000001    01/01/2010              50       .             1

0000001    02/01/2010              672     1            .

0000001    03/01/2010              50       .             1

0000002    01/01/2011              50       .             1

0000002    02/01/2011              100     1            .

0000002    03/01/2011              50       .             1

PROC Star
Posts: 7,492

selecting a specific sequence of records

You didn't indicate whether you had tried my suggested code.  Given your new additional requirements, I would suggest that you try the following and see if it does what you want:

data have;

  input Case_ID  Date mmddyy10.  X;

  cards;

0000001 01/01/2010   50

0000001 02/01/2010  672

0000001 03/01/2010   50

0000002 01/01/2011   50

0000002 02/01/2011  100

0000002 03/01/2011   50

0000002 04/01/2011   50

0000002 05/01/2011   50

;

data want;

  set have;

  by Case_ID;

  set have ( firstobs = 2 keep = x date  rename =

           (x = Next_x date = Next_date) )

      have (      obs = 1 drop = _all_                                     );

  Prev_x = ifn( first.Case_ID, (.), lag(x) );

  Prev_date = ifn( first.Case_ID, (.), lag(date) );

  Next_x = ifn(  last.Case_ID, (.), Next_x );

  Next_date = ifn(  last.Case_ID, (.), Next_date );

  if x gt 50 and not (first.Case_ID or last.Case_ID)

  then do;

    output;

    x=Prev_x;

    date=Prev_date;

    output;

    x=Next_x;

    date=Next_date;

    output;

  end;

run;

proc sort data=want;

  by case_ID date;

run;

Valued Guide
Posts: 765

Re: selecting a specific sequence of records

Hi ... another approach with your data that  gives your desired result ...

data x;

input id dt : mmddyy. x high low;

format id z7. dt mmddyy10.;

datalines;

0000001    01/01/2010              50       .             1

0000001    02/01/2010              672     1            .

0000001    03/01/2010              50       .             1

0000002    01/01/2011              50       .             1

0000002    02/01/2011              100     1            .

0000002    03/01/2011              50       .             1

0000002    04/01/2011              50       .             1

0000002    05/01/2011              50       .             1

;

run;

data q;

set x;

by id;

if not (first.id or last.id);

pre  = _n_ - 1;

post = _n_ + 1;

set x (keep=x rename=(x=x0)) point=pre;

set x (keep=x rename=(x=x1)) point=post;

if x gt x0 and x gt x1 then do;

   set x point=pre;  output;

   set x point=_n_;  output;

   set x point=post; output;

end;

drop x0 x1;

run;

Occasional Contributor
Posts: 12

Re: selecting a specific sequence of records

With some tweaking I think I have made this work. Thanks all for the input.

PROC Star
Posts: 7,492

Re: selecting a specific sequence of records

I'm still not sure if I understand what you are trying to accomplish.  The following finds a record that has a value greater than 50 and outputs it.  If there is a record that precedes it, it then outputs it and then outputs the record (if there is one) that comes aftre it.  Is it at least close to what you are trying to do?

data have;

  input Case_ID $  X;

  cards;

0000001 50

0000001 672

0000001 49

0000002 42

0000002 100

0000002 37

0000002 50

0000002 50

;

data want;

  set have;

  if x gt 50 then highlow=1;

  else highlow=0;

run;

data want;

  set have;

  by Case_ID;

  set have ( firstobs = 2 keep = x  rename = (x = Next_x) )

      have (      obs = 1 drop = _all_                                     );

  Prev_x = ifn( first.Case_ID, (.), lag(x) );

  Next_x = ifn(  last.Case_ID, (.), Next_x );

  if x gt 50 then do;

    output;

    if not first.Case_ID then do;

      x=Prev_x;

      output;

    end;

    if not last.Case_ID then do;

      x=Next_x;

      output;

    end;

  end;

run;

Valued Guide
Posts: 2,177

Re: selecting a specific sequence of records

double pass within Case_ID

  1. collecting info on the condition
  2. extracting if rules require it.

there will be no measurable performance penalty when your data are in case_ID order

For example

data wanted ;

   set labValues( in= pass1) labValues(in=pass2) ;

   by case_ID ;

   if pass2 then do ;

          if wanted then output ;

          delete ;    

   end ;

   if first.case_ID then do ;

      wanted = 0 ;

      abc = '000' ;

   end ;

   if wanted then delete ;

   if 1<= value <= 2 then abc = cats( value, abc ) ;

         else abc= cats( '0', abc ) ;

   if abc = '121' then wanted = 1 ;

   drop   abc wanted ;

   retain abc wanted ;

run ;

Message was edited by: Peter Crawford

also needed the RETAIN statement

Super User
Posts: 10,044

Re: selecting a specific sequence of records

data x;
input id dt : mmddyy. x high low;
format id z7. dt mmddyy10.;
datalines;
0000001    01/01/2010              50       .             1
0000001    02/01/2010              672     1            .
0000001    03/01/2010              50       .             1
0000002    01/01/2011              50       .             1
0000002    02/01/2011              100     1            .
0000002    03/01/2011              50       .             1
0000002    04/01/2011              50       .             1
0000002    05/01/2011              50       .             1
;
run;
data want(drop=count);
 set x;
 count+1;
 if low=1 and (id=lag(id) and lag(high)=1) and (id=lag2(id) and lag2(low)=1) then do;
  output;
  _count=count-1;set x point=_count;output;
  _count=count-2;set x point=_count;output;
  end;
run;

Ksharp

Ask a Question
Discussion stats
  • 22 replies
  • 244 views
  • 0 likes
  • 6 in conversation