Selecting rows satisfying condition and keeping also row preceding each of them

Accepted Solution Solved
Reply
Contributor
Posts: 35
Accepted Solution

Selecting rows satisfying condition and keeping also row preceding each of them

Dear All,

I would like to select all rows of a dataset satisfying a certain condition (when the variable V1 = 0) AND also keep the observation/row preceding each of them. That is, given database db1

data db1;

     input id V1 V2;

datalines;

1     2     2.1

1     0     2.9

1     2     3.1

2     4     1.7

2     0     4.5

2     0     0.5.

2     5     2.8

;

I would like to obtain the following

data want;

     input id V1 V2;

datalines;

1     2     2.1

1     0     2.9

2     4     1.7

2     0     4.5

2     0     0.5.

;


Notice that I would like to avoid duplicates. Otherwise, I would have the 5th observations twice in the database want.


I'm able to select rows matching my condition by using the following code:


data want;

if V1=0;

run;


Any help would be highly appreciated.


Accepted Solutions
Solution
‎09-04-2014 11:17 AM
Super User
Posts: 5,509

Re: Selecting rows satisfying condition and keeping also row preceding each of them

How about extracting the row numbers, then extracting the full observations afterwards?

data selected_rows;

   set have;

   if V=0;   /* or whatever conditions */

   rownum = _n_;

   output;

   rownum = _n_ - 1;

   if rownum > 0 then output;

   keep rownum;

run;

proc sort data=selected_rows NODUPKEY;

   by rownum;

run;

That gives you a list of the unique row numbers you need.  To retrieve them:

data want;

   set selected_rows;

   set have point=rownum;

run;

Good luck.

View solution in original post


All Replies
Super User
Super User
Posts: 7,955

Re: Selecting rows satisfying condition and keeping also row preceding each of them

You could use the lag() function.  I would question the logic though as take last record seems a bit ad-hoc.  Why the previous row?  If I sort the data the previous row changes, new data, maybe the rows change again.  Identify specific grouping variables and orders first then work out previous data based on that.

Contributor
Posts: 35

Re: Selecting rows satisfying condition and keeping also row preceding each of them

I'm trying to investigate the possibility that my dataset is affected by recording errors.

The database is already sorted according to some specific criteria, so I just need to select the observations satisfying my condition (V1=0) and the previous one of each of them. I suspect that one of the two is a duplicate of the other one.

Super User
Super User
Posts: 7,955

Re: Selecting rows satisfying condition and keeping also row preceding each of them

So the key here is "specific criteria".
As a quick lag example:

data db1;
     input id V1 V2;
datalines;
1     2     2.1
1     0     2.9
1     2     3.1
2     4     1.7
2     0     4.5
2     0     0.5
2     5     2.8
;
run;

data want;
  set db1;
  prev_v1=lag(v1);
  prev_v2=lag(v2);
run;

You can manipulate it with output statements if you want to get the two rows, or just compare the two.

Contributor
Posts: 35

Re: Selecting rows satisfying condition and keeping also row preceding each of them

Thank you @RW9. The point is that I have 30-40 columns in my dataset, hence comparing original values of the variables with lagged ones horizontally is not handy. That's why I wanted to do it vertically.

I know how to do it with other programming languages (for instance, by using a for loop), but I cannot use them because of the size of my dataset (140M records).

Super User
Super User
Posts: 7,955

Re: Selecting rows satisfying condition and keeping also row preceding each of them

That's what I meant by using the output wisely:

data want (drop=prev_v1 prev_v2);

  set db1;

  prev_v1=lag(v1);

  prev_v2=lag(v2);

     if v1=0 then do;

          output;

           v1=prev_v1;

          v2=prev_v2;

          output;

     end;

run;

This would output the row which falls under your criteria, and then output the previous row underneath.  You could, by having a third set of variables swap the current data into new vars, set them current data to previous, output and then set the current data back from the newvars and output to get previous then criteria breaking row.

You may also still want to look into grouping your data, 140m records is a lot and any suggestion is going to tak e a long time, grouping will help there.

Solution
‎09-04-2014 11:17 AM
Super User
Posts: 5,509

Re: Selecting rows satisfying condition and keeping also row preceding each of them

How about extracting the row numbers, then extracting the full observations afterwards?

data selected_rows;

   set have;

   if V=0;   /* or whatever conditions */

   rownum = _n_;

   output;

   rownum = _n_ - 1;

   if rownum > 0 then output;

   keep rownum;

run;

proc sort data=selected_rows NODUPKEY;

   by rownum;

run;

That gives you a list of the unique row numbers you need.  To retrieve them:

data want;

   set selected_rows;

   set have point=rownum;

run;

Good luck.

Occasional Contributor
Posts: 10

Re: Selecting rows satisfying condition and keeping also row preceding each of them

data db2;

  merge db1 db1 (firstobs=2 keep=v1 rename=(v1=next_v1));

  if (v1=0 or next_v1=0);

  drop next_v1;

run;

Occasional Contributor
Posts: 9

Re: Selecting rows satisfying condition and keeping also row preceding each of them

Maybe the long way is the easiest and safest, it includes all ideas mentioned above.

Good luck.

data db0 ;
n = _n_;
     input id V1 V2;
datalines;

1     2     2.1
1     0     2.9
1     2     3.1
2     4     1.7
2     0     4.5
2     0     0.5
2     5     2.8
;
run;


data db1;
set db0;
by n;
where V1 = 0;
j = n-1;
run;


proc sql;
create table db2 as
select a.*, b.j from db0 a inner join  db1 b on
a.id = b.id and a.n = b.j
where a.v1 ne 0
order by n;
quit;

data db3(drop = j  n);
set db1 db2;
by n;
run;

Message was edited by: Anna Kovacs Your key is the ID that must be considered, not yet mentioned. Read in the database BY sort order, assign N =_N_, and work it. The database needs no sorting when using _N_. You may eliminate the J  by simply using N-1, but I put it there for clarity.

Trusted Advisor
Posts: 1,228

Re: Selecting rows satisfying condition and keeping also row preceding each of them

data want;
set have;
if v1>0 then cnt+1;
run;

proc sql;
select * from want
group by cnt
having count(cnt)>1 or cnt=0;
quit;

N/A
Posts: 1

Re: Selecting rows satisfying condition and keeping also row preceding each of them

Well this may not be the fastest way but it will solve ur problem. Create a new variable in have Dataset.

data have;

n = _n_;  /* If obs number is present then rename it to n else create a new variable */

     input id V1 V2;

datalines;

1     2     2.1

1     0     2.9

1     2     3.1

2     4     1.7

2     0     4.5

2     0     0.5

2     5     2.8

;

data want2(keep = n);

set have;

n = _n_ ;

if v1 = 0 then do;

        output ;

        n = n - 1 ;

        output;

     end;

run;

proc sort data = want2 nodupkey;

by n;

run;

data want3;

merge have(in= x) want2(in= y);

by n;

if x and y;

drop n;

run;

Super User
Posts: 10,028

Re: Selecting rows satisfying condition and keeping also row preceding each of them

How about :


data db1;
     input id V1 V2;
datalines;
1     2     2.1
1     0     2.9
1     2     3.1
2     4     1.7
2     0     4.5
2     0     0.5
2     5     2.8
;
run;

data want;
 do until(v1=0 or last.id);
  set db1;
  by id;
  if v1=0 then found=1;
 end;
 do until(v1=0 or last.id);
  set db1;
  by id;
  if  found then output;
 end;
run;

Xia Keshan

N/A
Posts: 1

Re: Selecting rows satisfying condition and keeping also row preceding each of them

data want;

   set db1;

   if v1=0 then do;

       output;

     if lag(v1) <> 0 then output;

end;

run;

(caution:  lag can give false results if called from a conditional dataset)

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 2003 views
  • 4 likes
  • 9 in conversation