Solved: Selecting distinct Combinations

ilikesas · Posted 12-20-2014 11:17 PM

Hi,

suppose I have the following file:

Name	Date	someinfo
A	2010	a
A	2010	a
A	2009	b
A	2009	b

What I would like to get is:

Name	Date	someinfo
A	2010	a
A	2009	b

That is, I would like to get all the columns from my table but with the name/date combination being distinct

Thank you

Patrick · Posted 12-21-2014 12:01 AM

Hope that helps:

data have;

infile datalines truncover;

input name $ date someinfo $;

datalines;

A 2010 a

A 2009 b

;

run;

proc sql;

create table want1 as

select distinct *

from have;

quit;

proc sort data=have out=want2 nodupkey;

by _all_;

run;

proc sort data=have out=inter3;

by name date someinfo;

run;

data want3;

set inter3;

by name date someinfo;

if first.someinfo;

run;

And a fourth option:

data _null_;

if 0 then set have;

dcl hash h1(dataset:'have');

_rc=h1.defineKey(all:'y');

_rc=h1.defineData(all:'y');

_rc=h1.defineDone();

_rc=h1.output(dataset:'Want4');

stop;

run;

View solution in original post

Patrick · Posted 12-20-2014 11:34 PM

Proc sql; select distinct ....

ilikesas · Posted 12-20-2014 11:49 PM

Hi Patrick and thanks for the reply.

I did the select distinct but it only gave me the 2 selected columns, but what I would like to get if possible is all the columns (like in the small example that I put in the question), in other words to delete all the name/date combination duplicates

I also tried to do

data board_summary6;

set board_summary6;

by comp_name date;

if first.comp_name and first.date;

run;

but got an error message ...

Thank you

ilikesas · Posted 12-20-2014 11:57 PM

I actually could distinct select all the columns of interest but the thing is that since there are many of them I thought if its possible to make a shortcut by selecting the entire row corresponding to the name/date combination...

Patrick · Posted 12-21-2014 12:01 AM

Hope that helps:

data have;

infile datalines truncover;

input name $ date someinfo $;

datalines;

A 2010 a

A 2009 b

;

run;

proc sql;

create table want1 as

select distinct *

from have;

quit;

proc sort data=have out=want2 nodupkey;

by _all_;

run;

proc sort data=have out=inter3;

by name date someinfo;

run;

data want3;

set inter3;

by name date someinfo;

if first.someinfo;

run;

And a fourth option:

data _null_;

if 0 then set have;

dcl hash h1(dataset:'have');

_rc=h1.defineKey(all:'y');

_rc=h1.defineData(all:'y');

_rc=h1.defineDone();

_rc=h1.output(dataset:'Want4');

stop;

run;

Ksharp · Posted 12-21-2014 12:58 AM

And the fifth option:

proc summary data=have nway;
 class _all_;
 output out=want(drop=_:);
run;

And the sixth optioin:

proc freq data=have noprint;
 table name*date*someinfo/list out=want1(drop=count percent) nofreq norow nocol nopercent nocum;
run;

Xia Keshan

ilikesas · Posted 12-21-2014 08:20 AM

Thanks a lot to all, now I have a lot to choose from!!!

fengyuwuzu · Posted 02-17-2016 09:48 AM

I think you need to sort with the by variables first.

Tom · Posted 12-21-2014 12:04 AM

Not sure what you mean.

Sounds like you want to keep only one record for each distinct by group, but that you have extra non-key variables.

You can just use PROC SORT with the NODUPKEY option.

proc sort data=have out=want NODUPKEY ;

by name date;

run;

Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Re: Selecting distinct Combinations

Catch up on SAS Innovate 2026