Help using Base SAS procedures

Filter data so only relevant rows remain

Reply
Occasional Contributor
Posts: 9

Filter data so only relevant rows remain

I am struggling to explain my problem.  Hopefully my details below will be clear enough to understand.

I have a large dataset with many rows.  I have redundancy in one of the columns and I want to reduce them.  Below is an example of my dataset with redundant rows. 

Name          Number

John              55

John              55

John              55

Ken               21

Ken               21

Ken               21

Ken               21

Steve            46

Steve            46

I don't want to lose the Number for each name.  I just want to reduce the repeat columns with the Number. 

I want to filer the date so the redundant rows are removed and I end up with the following:

Name          Number

John               55

Ken                21

Steve             46

Any help would be GREATLY appreciated. 

Frequent Contributor
Posts: 130

Re: Filter data so only relevant rows remain

You can use the SORT procedure or the SQL procedure to do this.  Try the following:

proc sort data=have out=want nodupkey;

by Name;

run;

proc sql;

create table want as

select distinct Name, Number

from have

order by Name;

quit;

Occasional Contributor
Posts: 9

Re: Filter data so only relevant rows remain

It worked!!!  Thanks!!!

Frequent Contributor
Posts: 130

Re: Filter data so only relevant rows remain

Mark is correct, you would want to use the Nodup here, as I re-read the the post.  I originally assumed (looking at the example data set) that the Number was the same for each individual Name; however, it sounds like that may not be the case.  Instead of eliminating duplicate Names with the Nodupkey, the Nodup will eliminate duplicate rows that have the same values in both the Name and Number field.  The same with select distinct *, instead of select distinct Name in the SQL procedure.  In case there are multiple Numbers for individual Names, otherwise my code will give you the first Number of each Name, and not all possible combinations of Names and Numbers.  Hope that makes sense!

Valued Guide
Posts: 860

Re: Filter data so only relevant rows remain

There are several ways to do this.  I have typed up almost the same exact solution as dcruik, you wouldn't want to us nodupkey though.  For this example you want nodup.

data have;

infile cards;

input Name$ Number;

cards;

John              55

John              55

John              55

Ken               21

Ken               21

Ken               21

Ken               21

Steve            46

Steve            46

;

run;

proc sql;

create table want1 as

select distinct *

from have;

proc sort data=have out=want2 nodup;by name number;

Ask a Question
Discussion stats
  • 4 replies
  • 291 views
  • 0 likes
  • 3 in conversation