Hi, everyone.
This is what my dataset looks like. And I am trying to compare the 'start' and 'end' to delete unnecessary rows.
If 'start' is unqiue compare to the previous 'end', then we will keep the row. If 'start' is the same to the previous 'end', then we need to delete that row and make the 'end' in first row to be the same as the second row.
The result should be like
Thanks!
If the data are already sorted by ID/START, then you could:
data want (drop=i nxt_:);
merge have
have (firstobs=2 keep=id start rename=(id=nxt_id start=nxt_start));
if end^=nxt_start or id^=nxt_id then do i=1 to coalesce(dif(_n_),_n_);
set have (drop=end);
if i=1 then output;
end;
run;
The MERGE statement read a number of observations until the current END doesn't match the next START.
The do loop rereads all those observations (except the variable END). It outputs only the first of them (to get the initial START as well as initial a b c d). But it keeps the END value from the last connected obs determined via the MERGE statement.
I don't understand the logic. It seems as if you want to either keep a row, or delete a row, but the first row in your output is not an actual row of the data set. Please explain.
Please post data as text not images. Code is untested as I'm too lazy to type it out to test anything.
Use LAG to check for groupings. Then use PROC MEANS/SQL to collate the data. I'll use SQL as you may have character variables.
data groups;
set have;
by ID;
prev_end= lag(end);
if first.id then do;
group=0;
call missing(prev_end);
end;
if start ne prev_end then group+1;
run;
proc sql;
create table want as
select id, group, min(start) as start, max(end) as end, a, b, c, d,
from groups
group by id, group, a, b, c, d
quit;
If the data are already sorted by ID/START, then you could:
data want (drop=i nxt_:);
merge have
have (firstobs=2 keep=id start rename=(id=nxt_id start=nxt_start));
if end^=nxt_start or id^=nxt_id then do i=1 to coalesce(dif(_n_),_n_);
set have (drop=end);
if i=1 then output;
end;
run;
The MERGE statement read a number of observations until the current END doesn't match the next START.
The do loop rereads all those observations (except the variable END). It outputs only the first of them (to get the initial START as well as initial a b c d). But it keeps the END value from the last connected obs determined via the MERGE statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.