Solved: Re: Compare the variable in first row to the another varibale in secon...

ttqkroe · Posted 08-04-2022 01:38 PM

Hi, everyone.

This is what my dataset looks like. And I am trying to compare the 'start' and 'end' to delete unnecessary rows.

If 'start' is unqiue compare to the previous 'end', then we will keep the row. If 'start' is the same to the previous 'end', then we need to delete that row and make the 'end' in first row to be the same as the second row.

The result should be like

Thanks!

mkeintz · Posted 08-04-2022 04:05 PM

If the data are already sorted by ID/START, then you could:

data want (drop=i nxt_:);
  merge have
        have (firstobs=2 keep=id start rename=(id=nxt_id start=nxt_start));

  if end^=nxt_start or id^=nxt_id then do i=1 to coalesce(dif(_n_),_n_);
    set have (drop=end);
    if i=1 then output;
  end;
run;

The MERGE statement read a number of observations until the current END doesn't match the next START.

The do loop rereads all those observations (except the variable END). It outputs only the first of them (to get the initial START as well as initial a b c d). But it keeps the END value from the last connected obs determined via the MERGE statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

PaigeMiller · Posted 08-04-2022 01:59 PM

I don't understand the logic. It seems as if you want to either keep a row, or delete a row, but the first row in your output is not an actual row of the data set. Please explain.

--
Paige Miller

ttqkroe · Posted 08-04-2022 02:14 PM

Thanks for the respond. Yeah, maybe it is not clear. So, the logic is if the 'start' in the second row is the same number as 'end' in the second row. Then we can say that there is not necessary to use two rows to write this dataset. we can replace it by using the 'start' in the first row and 'end' in the second row. And obey the same logic when they have same 'id'.

Thanks

Reeza · Posted 08-04-2022 02:12 PM

Please post data as text not images. Code is untested as I'm too lazy to type it out to test anything.

Use LAG to check for groupings. Then use PROC MEANS/SQL to collate the data. I'll use SQL as you may have character variables.

data groups;
set have;
by ID;
prev_end= lag(end);
if first.id then do;
group=0;
call missing(prev_end);
end;
if start ne prev_end then group+1;
run;

proc sql;
create table want as 
select id, group, min(start) as start, max(end) as end, a, b, c, d,
from groups
group by id, group, a, b, c, d
quit;

mkeintz · Posted 08-04-2022 04:05 PM

If the data are already sorted by ID/START, then you could:

data want (drop=i nxt_:);
  merge have
        have (firstobs=2 keep=id start rename=(id=nxt_id start=nxt_start));

  if end^=nxt_start or id^=nxt_id then do i=1 to coalesce(dif(_n_),_n_);
    set have (drop=end);
    if i=1 then output;
  end;
run;

The MERGE statement read a number of observations until the current END doesn't match the next START.

The do loop rereads all those observations (except the variable END). It outputs only the first of them (to get the initial START as well as initial a b c d). But it keeps the END value from the last connected obs determined via the MERGE statement.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Catch up on SAS Innovate 2026

Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Re: Compare the variable in first row to the another varibale in second row by the same id.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away