Hi,
suppose I have the following file:
Name | Date | someinfo |
---|---|---|
A | 2010 | a |
A | 2010 | a |
A | 2009 | b |
A | 2009 | b |
What I would like to get is:
Name | Date | someinfo |
---|---|---|
A | 2010 | a |
A | 2009 | b |
That is, I would like to get all the columns from my table but with the name/date combination being distinct
Thank you
Hope that helps:
data have;
infile datalines truncover;
input name $ date someinfo $;
datalines;
A 2010 a
A 2010 a
A 2009 b
A 2009 b
;
run;
proc sql;
create table want1 as
select distinct *
from have;
quit;
proc sort data=have out=want2 nodupkey;
by _all_;
run;
proc sort data=have out=inter3;
by name date someinfo;
run;
data want3;
set inter3;
by name date someinfo;
if first.someinfo;
run;
And a fourth option:
data _null_;
if 0 then set have;
dcl hash h1(dataset:'have');
_rc=h1.defineKey(all:'y');
_rc=h1.defineData(all:'y');
_rc=h1.defineDone();
_rc=h1.output(dataset:'Want4');
stop;
run;
Proc sql; select distinct ....
Hi Patrick and thanks for the reply.
I did the select distinct but it only gave me the 2 selected columns, but what I would like to get if possible is all the columns (like in the small example that I put in the question), in other words to delete all the name/date combination duplicates
I also tried to do
data board_summary6;
set board_summary6;
by comp_name date;
if first.comp_name and first.date;
run;
but got an error message ...
Thank you
I actually could distinct select all the columns of interest but the thing is that since there are many of them I thought if its possible to make a shortcut by selecting the entire row corresponding to the name/date combination...
Hope that helps:
data have;
infile datalines truncover;
input name $ date someinfo $;
datalines;
A 2010 a
A 2010 a
A 2009 b
A 2009 b
;
run;
proc sql;
create table want1 as
select distinct *
from have;
quit;
proc sort data=have out=want2 nodupkey;
by _all_;
run;
proc sort data=have out=inter3;
by name date someinfo;
run;
data want3;
set inter3;
by name date someinfo;
if first.someinfo;
run;
And a fourth option:
data _null_;
if 0 then set have;
dcl hash h1(dataset:'have');
_rc=h1.defineKey(all:'y');
_rc=h1.defineData(all:'y');
_rc=h1.defineDone();
_rc=h1.output(dataset:'Want4');
stop;
run;
And the fifth option:
proc summary data=have nway; class _all_; output out=want(drop=_:); run;
And the sixth optioin:
proc freq data=have noprint; table name*date*someinfo/list out=want1(drop=count percent) nofreq norow nocol nopercent nocum; run;
Xia Keshan
Thanks a lot to all, now I have a lot to choose from!!!
I think you need to sort with the by variables first.
Not sure what you mean.
Sounds like you want to keep only one record for each distinct by group, but that you have extra non-key variables.
You can just use PROC SORT with the NODUPKEY option.
proc sort data=have out=want NODUPKEY ;
by name date;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.