Solved: Re: How do I collapse rows/observations and incorporate a minimum and ...

Count · Posted 09-26-2018 02:23 AM

Hello,

I'm using Base SAS v9.3.

I have data set HAVE as follows:

data HAVE;
input Dept $ CM_start CM_end Year Count;
length dept $ 4;
datalines;
M82T 1 3 2005 789
M82T 4 5 2005 789
M82T 7 9 2005 789
M82T 10 11 2006 985
M82T 12 14 2006 985
UTYD 5 7 2008 980
UTYD 8 9 2008 980
UTYD 10 15 2011 784
UTYD 16 17 2011 784
UTYD 18 20 2011 784
SRT6 5 7 2008 452
SRT6 8 9 2008 452
SRT6 10 15 2008 452
SRT6 16 17 2008 452
SRT6 18 20 2008 452
;
run;

I want data set WANT that collapses the rows into one record for all rows that have the same DEPT, YEAR and COUNT.

BUT, I also want CM_START to represent the lowest CM value and CM_END to represent the highest from that DEPT/YEAR/COUNT group.

For example:

Dept	CM_start	CM_end	Year	Count
M82T	1	9	2005	789
M82T	10	14	2006	985
UTYD	5	9	2008	980
UTYD	10	20	2011	784
SRT6	5	20	2008	452

I feel like this is a multi-step process, that might incorporate by-group processing, with use of first and last but can't for the life of me figure it out.

Thanks for your time!

FredrikE · Posted 09-26-2018 04:07 AM

Does it need to be so complicated, why not just:?

proc sql;
 create table want as
  select dept, min(cm_start) as cm_start, max(cm_end) as cm_end, year, count
  from have
  group by dept, year, count
  ;
quit;

//Fredrik

View solution in original post

RW9 · Posted 09-26-2018 03:42 AM

Something like:

data have;
 length dept $ 4;
 input dept $ cm_start cm_end year count;
datalines;
M82T 1 3 2005 789
M82T 4 5 2005 789
M82T 7 9 2005 789
M82T 10 11 2006 985
M82T 12 14 2006 985
UTYD 5 7 2008 980
UTYD 8 9 2008 980
UTYD 10 15 2011 784
UTYD 16 17 2011 784
UTYD 18 20 2011 784
SRT6 5 7 2008 452
SRT6 8 9 2008 452
SRT6 10 15 2008 452
SRT6 16 17 2008 452
SRT6 18 20 2008 452
;
run;
data want;
 set have;
 by dept year count notsorted;
 retain f_start f_year f_count;
 if first.count then do;
   f_start=cm_start;
   f_year=year;
   f_count=count;
 end;
 if last.count then do;
   cm_start=f_start;
   year=f_year;
   count=f_count;
   output;
 end;
run;

FredrikE · Posted 09-26-2018 04:07 AM

Does it need to be so complicated, why not just:?

proc sql;
 create table want as
  select dept, min(cm_start) as cm_start, max(cm_end) as cm_end, year, count
  from have
  group by dept, year, count
  ;
quit;

//Fredrik

RW9 · Posted 09-26-2018 04:12 AM

It depends, if there is a lot of data sql is not the best procedure to process it - been shown in other posts. SQL isn't also known by everyone. Everyone however who uses SAS knows Base SAS datastep language, and to be perfectly honest, there is very little difference between the two codes there anyways, a by group, you use a min/max function, I use a retained variable, big whoop.

Count · Posted 09-26-2018 09:01 PM

Hi RW9, thank you for this. I'm not sure, why but when I applied this to my real data (sensitive so couldn't post real), it did not work. I will explore further and advise if I figure it out! It may be that there is some characteristic of the data that I did not capture appropriately with my small example.

andreas_lds · Posted 09-26-2018 04:16 AM

What would be the expected result of data like this

bob42 1 3 2009 123
bob42 5 8 2009 123
bob42 9 10 2009 123

or is it sure that there are no breaks between cm_end and the next cm_start?

Count · Posted 09-26-2018 09:05 PM

This is a valid point @andreas_lds and one I should have included in my post (apologies). The data provider assures me that there are no breaks and review of the data indicates this is true. Of course, that doesn't mean that an error could occur in pre-processing in the future. I believe (perhaps wrongly) that @FredrikE's code would account for such an instance and creating separate rows where a break occurs. This is sufficient for my purposes.

himalayan · Posted 09-26-2018 07:35 AM

Here's a proc summary solution

proc summary data=have nway;
   class Dept year count;
   output out=want(drop=_:) min(CM_start)=CM_start max(CM_end)=CM_end;
   run;quit;
proc print;run;

Count · Posted 09-26-2018 09:07 PM

Thanks @himalayan. This worked well, and I don't regularly use proc summary so it was nice to see an alternative method! It was however a bit slower to process than the SQL method, so have accepted that solution.

How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

Re: How do I collapse rows/observations and incorporate a minimum and maximum from different rows?

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away