Solved: Re: Filter Data and delete duplicates

jozuleta · Posted 10-07-2018 05:45 PM

Hi all,

I created a table and filtered out the rows which have blanks in columns gvkey , ibitic , isin. Now I want to delete all the duplicate rows which have identical data for column gvkey. In other words... I would like to keep just the rows with unique values for gvkey

My code until now:

PROC SQL;
 CREATE TABLE WORK.COMP_DataUS AS
 SELECT gvkey , ibtic , isin , sedol
 FROM COMP.SECURITY;
RUN;
QUIT;
data WORK.COMP_DataUS;
 set COMP_DataUS;
 where not missing(gvkey) AND(ibtic) AND(isin);
run; /* Output: 24,936 rows */

this is how my dataset looks like:

Basically I would like to have a table without the the blue columns (but for the whole table and not just for these two examples).

Thanks in advance for the support.

Best regards

Jorge

singhsahab · Posted 10-08-2018 01:26 PM

I'm hoping this will work for you !!

proc sort data=COMP_DataUS nodup out=want;
by gvkey;
where gvkey is not missing;
run;

View solution in original post

PaigeMiller · Posted 10-07-2018 06:09 PM

UNTESTED CODE

Assumes the data set COMP_DATAUS is sorted by GVKEY

proc freq data=comp_dataus;
    table gvkey/noprint out=_a_;
run;

data want;
    merge comp_dataus _a_;
    by gvkey;
    if count>1 then delete;
run;

--
Paige Miller

jozuleta · Posted 10-08-2018 07:11 PM

a good other way! Thanks, too!

singhsahab · Posted 10-08-2018 01:26 PM

I'm hoping this will work for you !!

proc sort data=COMP_DataUS nodup out=want;
by gvkey;
where gvkey is not missing;
run;

Filter Data and delete duplicates

Re: Filter Data and delete duplicates

Re: Filter Data and delete duplicates

Re: Filter Data and delete duplicates

Re: Filter Data and delete duplicates

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away