BookmarkSubscribeRSS Feed
lnb001
Calcite | Level 5

I am trying to select a specific sequence of records in a SAS dataset. For example, I have repeated measures data with multiple records per case ID. I am trying to select 3 records where variable 2 is flanked by 2 instances of variable 1. So for record 1, variable 1 would be true, then at record 2 variable 2 would be true, and at record 3 variable 1 would again be true. The variable in question is a lab value where I am looking for instances where the lab value is low, then changes to high, then changes back to low. Each case ID could have more than 3 records, but I only want to pick out the records where this low-high-low instance occurs. Anything before or after that instance doesn't matter. If multiple instances occur for the same ID I would like to keep those as well. Is this possible?

22 REPLIES 22
art297
Opal | Level 21

The method described at http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back may be perfect for what you are trying to do.

lnb001
Calcite | Level 5

This solution seems to have worked for picking out the records with the high (middle) value. I am now trying to keep the records that flank this middle value (the lows). Any idea how I could do this? Thanks.

art297
Opal | Level 21

I think it would help if you provide some example data.  If the look-ahead-look-back method worked you could just output all three records at the same time since you have them in memory already.  The only issue I can forsee is that flanking records might be part of yet another set and I don't know if you would only want them represented once.

lnb001
Calcite | Level 5

Bear with me as I try to create some sample data:

Case ID   X        High        Low

0000001: 50       .             1

0000001: 672     1            .

0000001: 50       .             1

0000002: 50       .             1

0000002: 100     1            .

0000002: 50       .             1

0000002: 50       .             1

0000002: 50       .             1

In the example above, I have successfully assigned a value to the Records where X is > 50 and <=50. This is indicated by the newly created High and Low fields. While it is easy to select the High values, what I want is to select the High as well as the two Lows that flank it (e.g. the first three records of each Case ID). This is tricky for me since there are obviously records that have a Low value w/o a High, and records with High values and no Lows. Does this make sense?

Peter_C
Rhodochrosite | Level 12

for Case_id=2 do you want all rows (rows 4-8), or only the three (rows 4-6 in that example) surrounding the "High"?

lnb001
Calcite | Level 5

Just records 4-6.

Peter_C
Rhodochrosite | Level 12

when the data are like

0000001: 50 . 1

0000001: 672 1 .

0000001: 50 . 1

0000001: 672 1 .

0000001: 50 . 1

should row 3 appear twice (once for each of the overlapping trios) or just once?

lnb001
Calcite | Level 5

In my example above, this is data I am left with after I assign values to an existing data set. As I mentioned, I am left with Case IDs that have a High flanked by two lows. I am also left with Case IDs that have Highs and Lows but no actual Highs that have the flanking.

I want to create a new data set that shows only the records where the flanking occurs. To be more clear, maybe I should have written the sample data as follows:

Case ID     Date                       X        High        Low

0000001    01/01/2010              50       .             1

0000001    02/01/2010              672     1            .

0000001    03/01/2010              50       .             1

0000002    01/01/2011              50       .             1

0000002    02/01/2011              100     1            .

0000002    03/01/2011              50       .             1

0000002    04/01/2011              50       .             1

0000002    05/01/2011              50       .             1

In this example, I want to keep all Case 0000001 records, and only the Case 0000002 records dated 01/01/2011, 02/01/2011, and 03/01/2011 because the remaining records (04/01/2011 & 05/01/2011) do not have the High flanked by two Lows.

So the new dataset would look like:

Case ID     Date                       X        High        Low

0000001    01/01/2010              50       .             1

0000001    02/01/2010              672     1            .

0000001    03/01/2010              50       .             1

0000002    01/01/2011              50       .             1

0000002    02/01/2011              100     1            .

0000002    03/01/2011              50       .             1

art297
Opal | Level 21

You didn't indicate whether you had tried my suggested code.  Given your new additional requirements, I would suggest that you try the following and see if it does what you want:

data have;

  input Case_ID  Date mmddyy10.  X;

  cards;

0000001 01/01/2010   50

0000001 02/01/2010  672

0000001 03/01/2010   50

0000002 01/01/2011   50

0000002 02/01/2011  100

0000002 03/01/2011   50

0000002 04/01/2011   50

0000002 05/01/2011   50

;

data want;

  set have;

  by Case_ID;

  set have ( firstobs = 2 keep = x date  rename =

           (x = Next_x date = Next_date) )

      have (      obs = 1 drop = _all_                                     );

  Prev_x = ifn( first.Case_ID, (.), lag(x) );

  Prev_date = ifn( first.Case_ID, (.), lag(date) );

  Next_x = ifn(  last.Case_ID, (.), Next_x );

  Next_date = ifn(  last.Case_ID, (.), Next_date );

  if x gt 50 and not (first.Case_ID or last.Case_ID)

  then do;

    output;

    x=Prev_x;

    date=Prev_date;

    output;

    x=Next_x;

    date=Next_date;

    output;

  end;

run;

proc sort data=want;

  by case_ID date;

run;

MikeZdeb
Rhodochrosite | Level 12

Hi ... another approach with your data that  gives your desired result ...

data x;

input id dt : mmddyy. x high low;

format id z7. dt mmddyy10.;

datalines;

0000001    01/01/2010              50       .             1

0000001    02/01/2010              672     1            .

0000001    03/01/2010              50       .             1

0000002    01/01/2011              50       .             1

0000002    02/01/2011              100     1            .

0000002    03/01/2011              50       .             1

0000002    04/01/2011              50       .             1

0000002    05/01/2011              50       .             1

;

run;

data q;

set x;

by id;

if not (first.id or last.id);

pre  = _n_ - 1;

post = _n_ + 1;

set x (keep=x rename=(x=x0)) point=pre;

set x (keep=x rename=(x=x1)) point=post;

if x gt x0 and x gt x1 then do;

   set x point=pre;  output;

   set x point=_n_;  output;

   set x point=post; output;

end;

drop x0 x1;

run;

lnb001
Calcite | Level 5

With some tweaking I think I have made this work. Thanks all for the input.

art297
Opal | Level 21

I'm still not sure if I understand what you are trying to accomplish.  The following finds a record that has a value greater than 50 and outputs it.  If there is a record that precedes it, it then outputs it and then outputs the record (if there is one) that comes aftre it.  Is it at least close to what you are trying to do?

data have;

  input Case_ID $  X;

  cards;

0000001 50

0000001 672

0000001 49

0000002 42

0000002 100

0000002 37

0000002 50

0000002 50

;

data want;

  set have;

  if x gt 50 then highlow=1;

  else highlow=0;

run;

data want;

  set have;

  by Case_ID;

  set have ( firstobs = 2 keep = x  rename = (x = Next_x) )

      have (      obs = 1 drop = _all_                                     );

  Prev_x = ifn( first.Case_ID, (.), lag(x) );

  Next_x = ifn(  last.Case_ID, (.), Next_x );

  if x gt 50 then do;

    output;

    if not first.Case_ID then do;

      x=Prev_x;

      output;

    end;

    if not last.Case_ID then do;

      x=Next_x;

      output;

    end;

  end;

run;

Peter_C
Rhodochrosite | Level 12

double pass within Case_ID

  1. collecting info on the condition
  2. extracting if rules require it.

there will be no measurable performance penalty when your data are in case_ID order

For example

data wanted ;

   set labValues( in= pass1) labValues(in=pass2) ;

   by case_ID ;

   if pass2 then do ;

          if wanted then output ;

          delete ;    

   end ;

   if first.case_ID then do ;

      wanted = 0 ;

      abc = '000' ;

   end ;

   if wanted then delete ;

   if 1<= value <= 2 then abc = cats( value, abc ) ;

         else abc= cats( '0', abc ) ;

   if abc = '121' then wanted = 1 ;

   drop   abc wanted ;

   retain abc wanted ;

run ;

Message was edited by: Peter Crawford

also needed the RETAIN statement

Ksharp
Super User
data x;
input id dt : mmddyy. x high low;
format id z7. dt mmddyy10.;
datalines;
0000001    01/01/2010              50       .             1
0000001    02/01/2010              672     1            .
0000001    03/01/2010              50       .             1
0000002    01/01/2011              50       .             1
0000002    02/01/2011              100     1            .
0000002    03/01/2011              50       .             1
0000002    04/01/2011              50       .             1
0000002    05/01/2011              50       .             1
;
run;
data want(drop=count);
 set x;
 count+1;
 if low=1 and (id=lag(id) and lag(high)=1) and (id=lag2(id) and lag2(low)=1) then do;
  output;
  _count=count-1;set x point=_count;output;
  _count=count-2;set x point=_count;output;
  end;
run;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1440 views
  • 0 likes
  • 6 in conversation