Re: Select max from multiple columns

Fluorite | Level 6

Select max from multiple columns

Posted 02-11-2013 01:58 PM (33113 views)

Hello Everyone,

I have a table that contains hundreds of columns with duplicate unique keys (customer ID). I would like to get rid of the duplicates by selecting the row that contains the max values in a number of columns. For each customer, I would like to keep the row that has the max value in date1, date2, date3, date4, date5,.... etc.

The tricky part is there are hundreds of dates I need to check. Is there an efficient way to do this?

Here is a sample record:
cust_id date1 date2 date3 date4 date5 date6 date7 date8 ... date300
11111 2/2 3/3 4/4 5/5 6/6 7/7 8/8 9/9 .... 12/12
11111 2/1 3/1 4/1 5/1 6/1 7/1 8/1 9/1 .... 12/01

I would like to keep the first row since it has the max value in every column.

Any input will be greatly appreciated !
Thank you

8 REPLIES 8

Lapis Lazuli | Level 10

Re: Select max from multiple columns

Posted 02-11-2013 02:17 PM (32891 views) | In reply to KevinC_

If the fact that a row has a maximum value for one column implies that it has the maximum value for all columns..... then

data tmp;

set have;

maxval=max(of date1-date300);

run;

proc sort in=tmp out=tmp2;

by cust_id descending maxval;

run;

proc sort in=tmp2 out=want nodupkey;

by cust_id;

run;

Fluorite | Level 6

Re: Select max from multiple columns

Posted 02-11-2013 02:26 PM (32891 views) | In reply to DBailey

Thank you DBailey.

Unfortunately that is not the case. You brought up a good point. Now I think about it, this is more like the real data:

cust_id date1 date2 date3 date4 date5 date6 date7 date8 ... date300

11111 2/2 3/3 4/1 5/5 6/6 7/1 8/8 9/9 .... 12/12

11111 2/1 3/1 4/9 5/1 6/1 7/9 8/1 9/1 .... 12/01

Note 4/9 and 7/9. What i really need to do is to select the max value from each column. So the ideal output would be:

11111 2/2 3/3 4/9 5/5 6/6 7/9 8/8 9/9 .... 12/12

Is there a way to do this?

Diamond | Level 26

Re: Select max from multiple columns

Posted 02-11-2013 02:30 PM (32891 views) | In reply to KevinC_

Proc Means or Proc Summary with a BY statement should get you the max value for each cust_id

--
Paige Miller

Fluorite | Level 6

Re: Select max from multiple columns

Posted 02-11-2013 02:55 PM (32891 views) | In reply to PaigeMiller

Thank you PaigeMiller.

I need the output to be a dataset with all of the columns, not just some statistics.
Can proc means or proc summary select the max value from each column and output the entire dataset in its original layout?

Diamond | Level 26

Re: Select max from multiple columns

Posted 02-11-2013 02:59 PM (32891 views) | In reply to KevinC_

No, you'd have to combine the output data set from PROC MEANS or PROC SUMMARY with the original data. For example:

data combined;

merge originaldata procmeansoutput;

by cust_id;

run;

--
Paige Miller

Calcite | Level 5

Re: Select max from multiple columns

Posted 02-12-2013 09:33 AM (32891 views) | In reply to KevinC_

See this example:

data t1;

input id d1 d2 d3 d4;

cards;

111 2 4 6 7

222 4 5 6 7

111 3 2 5 8

222 5 6 3 2

;

run;

proc sql;

create table t2 as select

id,

max(d1) as d1,

max(d2) as d2,

max(d3) as d3,

max(d4) as d4

from t1

group by id

order by id

;

quit;

This will give the output:

111	3	4	6	8
222	5	6	6	7

It will involve writing all your variable names in the proc sql step. There are ways to get those in SAS. However, you can also do that in Excel or Word.

Obsidian | Level 7

Re: Select max from multiple columns

Posted 02-14-2013 12:03 AM (32891 views) | In reply to KevinC_

Hi,

This logicworks for your problem, do that for other ids too:-

Here i have worked on 8 observations and you can change this number to 300 or whatever you require:-

=================================================================================================

data new;

informat cust_id 5. date1 mmddyy10. date2 mmddyy10. date3 mmddyy10. date4 mmddyy10. date5 mmddyy10. date6 mmddyy10.

date7 mmddyy10. date8 mmddyy10.;

input cust_id date1 date2 date3 date4 date5 date6 date7 date8 ;

format date1-date8 DATE9.;

datalines;

11111 02/02/2012 03/03/2012 04/04/2012 5/5/2012 6/6/2011 7/7/2012 8/8/2010 9/9/2012

11111 02/01/2009 03/01/2000 04/01/2013 5/1/2012 6/1/2012 7/1/2013 8/1/2012 9/1/2010

;

run;

proc print ;

run;

%macro dset();

proc sql;

drop table final;

quit;

%do i=1 %to 8;

proc sql;

create table final&i. as select distinct cust_id, max(date&i.) as dt&i. format date9. from new group by cust_id;

quit;

proc sql;

select dt&i. into: date&i. from final&i.;

quit;

%end;

data final;

set final1-final%eval(&i.-1);

run;

%do i=1 %to 8;

data final;

set final;

cst_id=cust_id;

date&i.="&&date&i.";

run;

%end;

proc sort data=final nodupkey;

by cst_id;

run;

proc print data=final;

run;

%mend;

%dset;

===============================================================================================================

Onyx | Level 15

Re: Select max from multiple columns

Posted 02-14-2013 09:34 AM (32891 views) | In reply to KevinC_

Here is another approach, using Array():

if your data is already sorted or clustered by cust_id,

data have;

input cust_id (date1 date2 date3 date4 date5 date6 date7 date8) (:mmddyy10.);

format date1-date8 mmddyy10.;

datalines;

11111 02/02/2012 03/03/2012 04/04/2012 5/5/2012 6/6/2011 7/7/2012 8/8/2010 9/9/2012

11111 02/01/2009 03/01/2000 04/01/2013 5/1/2012 6/1/2012 7/1/2013 8/1/2012 9/1/2010

;

data want;

do until (last.cust_id);

set have;

by cust_id;

array h date1-date8;

array temp(8) _temporary_;

if first.cust_id then call missing(of temp(*));

do i=1 to dim(h);

temp(i)=max(temp(i),h(i));

end;

do i=1 to dim(h);

h(i)=temp(i);

end;

drop i;

run;

Haikuo

Discussion stats

8 replies
‎02-11-2013 01:58 PM
33114 views
0 likes
6 in conversation

Registration is open

SAS Training: Just a Click Away