BookmarkSubscribeRSS Feed
megalops
Calcite | Level 5

I am struggling to explain my problem.  Hopefully my details below will be clear enough to understand.

I have a large dataset with many rows.  I have redundancy in one of the columns and I want to reduce them.  Below is an example of my dataset with redundant rows. 

Name          Number

John              55

John              55

John              55

Ken               21

Ken               21

Ken               21

Ken               21

Steve            46

Steve            46

I don't want to lose the Number for each name.  I just want to reduce the repeat columns with the Number. 

I want to filer the date so the redundant rows are removed and I end up with the following:

Name          Number

John               55

Ken                21

Steve             46

Any help would be GREATLY appreciated. 

4 REPLIES 4
dcruik
Lapis Lazuli | Level 10

You can use the SORT procedure or the SQL procedure to do this.  Try the following:

proc sort data=have out=want nodupkey;

by Name;

run;

proc sql;

create table want as

select distinct Name, Number

from have

order by Name;

quit;

megalops
Calcite | Level 5

It worked!!!  Thanks!!!

dcruik
Lapis Lazuli | Level 10

Mark is correct, you would want to use the Nodup here, as I re-read the the post.  I originally assumed (looking at the example data set) that the Number was the same for each individual Name; however, it sounds like that may not be the case.  Instead of eliminating duplicate Names with the Nodupkey, the Nodup will eliminate duplicate rows that have the same values in both the Name and Number field.  The same with select distinct *, instead of select distinct Name in the SQL procedure.  In case there are multiple Numbers for individual Names, otherwise my code will give you the first Number of each Name, and not all possible combinations of Names and Numbers.  Hope that makes sense!

Steelers_In_DC
Barite | Level 11

There are several ways to do this.  I have typed up almost the same exact solution as dcruik, you wouldn't want to us nodupkey though.  For this example you want nodup.

data have;

infile cards;

input Name$ Number;

cards;

John              55

John              55

John              55

Ken               21

Ken               21

Ken               21

Ken               21

Steve            46

Steve            46

;

run;

proc sql;

create table want1 as

select distinct *

from have;

proc sort data=have out=want2 nodup;by name number;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 767 views
  • 0 likes
  • 3 in conversation