I am struggling to explain my problem. Hopefully my details below will be clear enough to understand.
I have a large dataset with many rows. I have redundancy in one of the columns and I want to reduce them. Below is an example of my dataset with redundant rows.
Name Number
John 55
John 55
John 55
Ken 21
Ken 21
Ken 21
Ken 21
Steve 46
Steve 46
I don't want to lose the Number for each name. I just want to reduce the repeat columns with the Number.
I want to filer the date so the redundant rows are removed and I end up with the following:
Name Number
John 55
Ken 21
Steve 46
Any help would be GREATLY appreciated.
You can use the SORT procedure or the SQL procedure to do this. Try the following:
proc sort data=have out=want nodupkey;
by Name;
run;
proc sql;
create table want as
select distinct Name, Number
from have
order by Name;
quit;
It worked!!! Thanks!!!
Mark is correct, you would want to use the Nodup here, as I re-read the the post. I originally assumed (looking at the example data set) that the Number was the same for each individual Name; however, it sounds like that may not be the case. Instead of eliminating duplicate Names with the Nodupkey, the Nodup will eliminate duplicate rows that have the same values in both the Name and Number field. The same with select distinct *, instead of select distinct Name in the SQL procedure. In case there are multiple Numbers for individual Names, otherwise my code will give you the first Number of each Name, and not all possible combinations of Names and Numbers. Hope that makes sense!
There are several ways to do this. I have typed up almost the same exact solution as dcruik, you wouldn't want to us nodupkey though. For this example you want nodup.
data have;
infile cards;
input Name$ Number;
cards;
John 55
John 55
John 55
Ken 21
Ken 21
Ken 21
Ken 21
Steve 46
Steve 46
;
run;
proc sql;
create table want1 as
select distinct *
from have;
proc sort data=have out=want2 nodup;by name number;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.