Help using Base SAS procedures

Select max from multiple columns

Reply
Regular Contributor
Posts: 173

Select max from multiple columns

Hello Everyone,

I have a table that contains hundreds of columns with duplicate unique keys (customer ID). I would like to get rid of the duplicates by selecting the row that contains the max values in a number of columns. For each customer, I would like to keep the row that has the max value in date1, date2, date3, date4, date5,.... etc.

The tricky part is there are hundreds of dates I need to check. Is there an efficient way to do this?


Here is a sample record:
cust_id date1 date2 date3 date4 date5 date6 date7 date8 ... date300
11111 2/2 3/3 4/4 5/5 6/6 7/7 8/8 9/9 .... 12/12
11111 2/1 3/1 4/1 5/1 6/1 7/1 8/1 9/1 .... 12/01

I would like to keep the first row since it has the max value in every column.


Any input will be greatly appreciated !
Thank you  Smiley Happy
Super Contributor
Posts: 578

Re: Select max from multiple columns

If the fact that a row has a maximum value for one column implies that it has the maximum value for all columns..... then

data tmp;

set have;

maxval=max(of date1-date300);

run;

proc sort in=tmp out=tmp2;

by cust_id descending maxval;

run;

proc sort in=tmp2 out=want nodupkey;

by cust_id;

run;

Regular Contributor
Posts: 173

Re: Select max from multiple columns

Thank you DBailey.

Unfortunately that is not the case.  You brought up a good point.  Now I think about it, this is more like the real data:

cust_id date1 date2 date3 date4 date5 date6 date7 date8 ... date300

11111 2/2 3/3 4/1 5/5 6/6 7/1 8/8 9/9 .... 12/12

11111 2/1 3/1 4/9 5/1 6/1 7/9 8/1 9/1 .... 12/01

Note  4/9 and 7/9.  What i really need to do is to select the max value from each column.  So the ideal output would be:

11111 2/2 3/3 4/9 5/5 6/6 7/9 8/8 9/9 .... 12/12

Is there a way to do this?

Trusted Advisor
Posts: 1,610

Re: Select max from multiple columns

Proc Means or Proc Summary with a BY statement should get you the max value for each cust_id

Regular Contributor
Posts: 173

Re: Select max from multiple columns

Thank you PaigeMiller.

I need the output to be a dataset with all of the columns, not just some statistics. 
Can proc means or proc summary select the max value from each column and output the entire dataset in its original layout?

Trusted Advisor
Posts: 1,610

Re: Select max from multiple columns

No, you'd have to combine the output data set from PROC MEANS or PROC SUMMARY with the original data. For example:

data combined;

     merge originaldata procmeansoutput;

     by cust_id;

run;

Frequent Contributor
Frequent Contributor
Posts: 94

Re: Select max from multiple columns

See this example:

data t1;

input id d1 d2 d3 d4;

cards;

111 2 4 6 7

222 4 5 6 7

111 3 2 5 8

222 5 6 3 2

;

run;

proc sql;

      create table t2 as select

      id,

      max(d1) as d1,

      max(d2) as d2,

      max(d3) as d3,

      max(d4) as d4

      from t1

      group by id

      order by id

;

quit;

This will give the output:

111

3

4

6

8

222

5

6

6

7

It will involve writing all your variable names in the proc sql step. There are ways to get those in SAS. However, you can also do that in Excel or Word.

Frequent Contributor
Posts: 81

Re: Select max from multiple columns

Hi,

This logicworks for your problem, do that for other ids too:-

Here i have worked on 8 observations and you can change this number to 300 or whatever you require:-

=================================================================================================

data new;

informat cust_id 5. date1 mmddyy10. date2 mmddyy10. date3 mmddyy10. date4 mmddyy10.  date5 mmddyy10.  date6 mmddyy10.

date7 mmddyy10.  date8 mmddyy10.;

input cust_id date1 date2 date3 date4  date5  date6  date7  date8 ;

format date1-date8 DATE9.;

datalines;

11111 02/02/2012 03/03/2012 04/04/2012 5/5/2012 6/6/2011 7/7/2012 8/8/2010 9/9/2012

11111 02/01/2009 03/01/2000 04/01/2013 5/1/2012 6/1/2012 7/1/2013 8/1/2012 9/1/2010

;

run;

proc print ;

run;

%macro dset();

proc sql;

drop table final;

quit;

%do i=1 %to 8;

proc sql;

create table final&i. as select distinct cust_id, max(date&i.) as dt&i. format date9. from new group by cust_id;

quit;

proc sql;

select dt&i. into: date&i. from final&i.;

quit;

%end;

data final;

set final1-final%eval(&i.-1);

run;

%do i=1 %to 8;

data final;

set final;

cst_id=cust_id;

date&i.="&&date&i.";

run;

%end;

proc sort data=final nodupkey;

by cst_id;

run;

proc print data=final;

run;

%mend;

%dset;

===============================================================================================================

Respected Advisor
Posts: 3,124

Re: Select max from multiple columns

Here is another approach, using Array():

if your data is already sorted or clustered by cust_id,

data have;

input cust_id (date1 date2 date3 date4 date5 date6 date7 date8) (:mmddyy10.);

format date1-date8 mmddyy10.;

datalines;

11111 02/02/2012 03/03/2012 04/04/2012 5/5/2012 6/6/2011 7/7/2012 8/8/2010 9/9/2012

11111 02/01/2009 03/01/2000 04/01/2013 5/1/2012 6/1/2012 7/1/2013 8/1/2012 9/1/2010

;

data want;

do until (last.cust_id);

  set have;

  by cust_id;

  array h date1-date8;

array temp(8) _temporary_;

if first.cust_id then call missing(of temp(*));

  do i=1 to dim(h);

temp(i)=max(temp(i),h(i));

end;

end;

do i=1 to dim(h);

  h(i)=temp(i);

end;

drop i;

run;

Haikuo

Ask a Question
Discussion stats
  • 8 replies
  • 2187 views
  • 0 likes
  • 6 in conversation