BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ttqkroe
Calcite | Level 5

Hi, everyone.

 

ttqkroe_1-1659634707159.png

 

This is what my dataset looks like. And I am trying to compare the 'start' and 'end' to delete unnecessary rows.

If 'start' is unqiue compare to the previous 'end', then we will keep the row. If 'start' is the same to the previous 'end', then we need to delete that row and make the 'end' in first row to be the same as the second row.

The result should be like

 

ttqkroe_2-1659634716019.png

 

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If the data are already sorted by ID/START, then you could:

 

data want (drop=i nxt_:);
  merge have
        have (firstobs=2 keep=id start rename=(id=nxt_id start=nxt_start));

  if end^=nxt_start or id^=nxt_id then do i=1 to coalesce(dif(_n_),_n_);
    set have (drop=end);
    if i=1 then output;
  end;
run;

The MERGE statement read a number of observations until the current END doesn't match the next START.

 

The do loop rereads all those observations (except the variable END).  It outputs only the first of them (to get the initial START as well as initial a b c d).  But it keeps the END value from the last connected obs determined via the MERGE statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I don't understand the logic. It seems as if you want to either keep a row, or delete a row, but the first row in your output is not an actual row of the data set. Please explain.

--
Paige Miller
ttqkroe
Calcite | Level 5
Thanks for the respond. Yeah, maybe it is not clear. So, the logic is if the 'start' in the second row is the same number as 'end' in the second row. Then we can say that there is not necessary to use two rows to write this dataset. we can replace it by using the 'start' in the first row and 'end' in the second row. And obey the same logic when they have same 'id'.

Thanks
Reeza
Super User

Please post data as text not images. Code is untested as I'm too lazy to type it out to test anything.

 

Use LAG to check for groupings. Then use PROC MEANS/SQL to collate the data. I'll use SQL as you may have character variables. 

 

data groups;
set have;
by ID;
prev_end= lag(end);
if first.id then do;
group=0;
call missing(prev_end);
end;
if start ne prev_end then group+1;
run;

proc sql;
create table want as 
select id, group, min(start) as start, max(end) as end, a, b, c, d,
from groups
group by id, group, a, b, c, d
quit;

 

 

mkeintz
PROC Star

If the data are already sorted by ID/START, then you could:

 

data want (drop=i nxt_:);
  merge have
        have (firstobs=2 keep=id start rename=(id=nxt_id start=nxt_start));

  if end^=nxt_start or id^=nxt_id then do i=1 to coalesce(dif(_n_),_n_);
    set have (drop=end);
    if i=1 then output;
  end;
run;

The MERGE statement read a number of observations until the current END doesn't match the next START.

 

The do loop rereads all those observations (except the variable END).  It outputs only the first of them (to get the initial START as well as initial a b c d).  But it keeps the END value from the last connected obs determined via the MERGE statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1776 views
  • 1 like
  • 4 in conversation