BookmarkSubscribeRSS Feed
megalops
Calcite | Level 5

I am struggling to explain my problem.  Hopefully my details below will be clear enough to understand.

I have a large dataset with many rows.  I have redundancy in one of the columns and I want to reduce them.  Below is an example of my dataset with redundant rows. 

Name          Number

John              55

John              55

John              55

Ken               21

Ken               21

Ken               21

Ken               21

Steve            46

Steve            46

I don't want to lose the Number for each name.  I just want to reduce the repeat columns with the Number. 

I want to filer the date so the redundant rows are removed and I end up with the following:

Name          Number

John               55

Ken                21

Steve             46

Any help would be GREATLY appreciated. 

4 REPLIES 4
dcruik
Lapis Lazuli | Level 10

You can use the SORT procedure or the SQL procedure to do this.  Try the following:

proc sort data=have out=want nodupkey;

by Name;

run;

proc sql;

create table want as

select distinct Name, Number

from have

order by Name;

quit;

megalops
Calcite | Level 5

It worked!!!  Thanks!!!

dcruik
Lapis Lazuli | Level 10

Mark is correct, you would want to use the Nodup here, as I re-read the the post.  I originally assumed (looking at the example data set) that the Number was the same for each individual Name; however, it sounds like that may not be the case.  Instead of eliminating duplicate Names with the Nodupkey, the Nodup will eliminate duplicate rows that have the same values in both the Name and Number field.  The same with select distinct *, instead of select distinct Name in the SQL procedure.  In case there are multiple Numbers for individual Names, otherwise my code will give you the first Number of each Name, and not all possible combinations of Names and Numbers.  Hope that makes sense!

Steelers_In_DC
Barite | Level 11

There are several ways to do this.  I have typed up almost the same exact solution as dcruik, you wouldn't want to us nodupkey though.  For this example you want nodup.

data have;

infile cards;

input Name$ Number;

cards;

John              55

John              55

John              55

Ken               21

Ken               21

Ken               21

Ken               21

Steve            46

Steve            46

;

run;

proc sql;

create table want1 as

select distinct *

from have;

proc sort data=have out=want2 nodup;by name number;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 718 views
  • 0 likes
  • 3 in conversation