BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Count
Obsidian | Level 7

Hello, 

 

I'm using Base SAS v9.3.

 

I have data set HAVE as follows:

data HAVE;
input Dept $ CM_start CM_end Year Count;
length dept $ 4;
datalines;
M82T 1 3 2005 789
M82T 4 5 2005 789
M82T 7 9 2005 789
M82T 10 11 2006 985
M82T 12 14 2006 985
UTYD 5 7 2008 980
UTYD 8 9 2008 980
UTYD 10 15 2011 784
UTYD 16 17 2011 784
UTYD 18 20 2011 784
SRT6 5 7 2008 452
SRT6 8 9 2008 452
SRT6 10 15 2008 452
SRT6 16 17 2008 452
SRT6 18 20 2008 452
;
run;

I want data set WANT that collapses the rows into one record for all rows that have the same DEPT, YEAR and COUNT.

BUT, I also want CM_START to represent the lowest CM value and CM_END to represent the highest from that DEPT/YEAR/COUNT group. 

 

For example:

 

DeptCM_startCM_endYearCount
M82T192005789
M82T10142006985
UTYD592008980
UTYD10202011784
SRT65202008452

 

I feel like this is a multi-step process, that might incorporate by-group processing, with use of first and last but can't for the life of me figure it out.

 

Thanks for your time!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FredrikE
Rhodochrosite | Level 12

Does it need to be so complicated, why not just:?

 

proc sql;
 create table want as
  select dept, min(cm_start) as cm_start, max(cm_end) as cm_end, year, count
  from have
  group by dept, year, count
  ;
quit;

//Fredrik

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like:

data have;
length dept $ 4;
input dept $ cm_start cm_end year count;
datalines;
M82T 1 3 2005 789
M82T 4 5 2005 789
M82T 7 9 2005 789
M82T 10 11 2006 985
M82T 12 14 2006 985
UTYD 5 7 2008 980
UTYD 8 9 2008 980
UTYD 10 15 2011 784
UTYD 16 17 2011 784
UTYD 18 20 2011 784
SRT6 5 7 2008 452
SRT6 8 9 2008 452
SRT6 10 15 2008 452
SRT6 16 17 2008 452
SRT6 18 20 2008 452
;
run;
data want;
set have;
by dept year count notsorted;
retain f_start f_year f_count;
if first.count then do;
f_start=cm_start;
f_year=year;
f_count=count;
end;
if last.count then do;
cm_start=f_start;
year=f_year;
count=f_count;
output;
end;
run;
FredrikE
Rhodochrosite | Level 12

Does it need to be so complicated, why not just:?

 

proc sql;
 create table want as
  select dept, min(cm_start) as cm_start, max(cm_end) as cm_end, year, count
  from have
  group by dept, year, count
  ;
quit;

//Fredrik

RW9
Diamond | Level 26 RW9
Diamond | Level 26

It depends, if there is a lot of data sql is not the best procedure to process it - been shown in other posts.  SQL isn't also known by everyone.  Everyone however who uses SAS knows Base SAS datastep language, and to be perfectly honest, there is very little difference between the two codes there anyways, a by group, you use a min/max function, I use a retained variable, big whoop.

Count
Obsidian | Level 7
Hi RW9, thank you for this. I'm not sure, why but when I applied this to my real data (sensitive so couldn't post real), it did not work. I will explore further and advise if I figure it out! It may be that there is some characteristic of the data that I did not capture appropriately with my small example.
andreas_lds
Jade | Level 19

What would be the expected result of data like this

bob42 1 3 2009 123
bob42 5 8 2009 123
bob42 9 10 2009 123

or is it sure that there are no breaks between cm_end and the next cm_start?

 

 

Count
Obsidian | Level 7

This is a valid point @andreas_lds and one I should have included in my post (apologies). The data provider assures me that there are no breaks and review of the data indicates this is true. Of course, that doesn't mean that an error could occur in pre-processing in the future. I believe (perhaps wrongly) that @FredrikE's code would account for such an instance and creating separate rows where a break occurs. This is sufficient for my purposes. 

himalayan
Fluorite | Level 6

Here's a proc summary solution

 

proc summary data=have nway;
   class Dept year count;
   output out=want(drop=_:) min(CM_start)=CM_start max(CM_end)=CM_end;
   run;quit;
proc print;run;
Count
Obsidian | Level 7

Thanks @himalayan. This worked well, and I don't regularly use proc summary so it was nice to see an alternative method! It was however a bit slower to process than the SQL method, so have accepted that solution.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 5220 views
  • 4 likes
  • 5 in conversation