## Filter data

Solved
Occasional Contributor
Posts: 17

# Filter data

Hi, I am new to SAS and would like some help in filtering this set of data from bankscope which has a problem with duplications. There are instances where a certain bank in a certain year has 3 rows of financial data, hence there is a need to remove/delete duplicates according to the following priority of filtering - C1/C2, C*, U1/U2, U*. From table 1 to table 2.

Table 1

 Year ISIN Name Consolidation code 2000 Uk123 ABC C1 2000 Uk123 ABC C* 2000 Uk123 ABC U1 2001 Uk123 ABC C2 2001 Uk123 ABC U1 2001 UK345 DEF U1 2001 UK345 DEF C* 2002 UK345 DEF U*

Table 2

 Year ISIN Name Consolidation code 2000 Uk123 ABC C1 2001 Uk123 ABC C2 2001 UK345 DEF C*

Accepted Solutions
Solution
‎05-26-2015 08:32 AM
Super User
Posts: 10,766

## Re: Filter data

data have;

input

Year ISIN \$ Name \$ cc \$;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

data temp;

set have;

if cc in ('C1' 'C2') then order=1;

else if cc='C*' then order=2;

else if cc in ('U1' 'U2') then order=3;

else if cc='U*' then order=4;

run;

proc sort data=temp;by name year order;run;

data want;

set temp;

by name year;

if first.year;

run;

Xia Keshan

All Replies
Super User
Posts: 5,876

## Re: Filter data

Create a look-up table which maps your codes to priority order.

Apply the order to your original data (format, join, etc)

Sort on year, bank and priority

Proc sort nodupkey, by year bank.

Data never sleeps
Occasional Contributor
Posts: 17

## Re: Filter data

Linus, thanks for your help. But I am really new at this, do you mind sharing with me how do you code them?

Contributor
Posts: 27

## Re: Filter data

data have;
input
Year ISIN \$ Name \$ cc \$;
cards;
2000 Uk123 ABC C1
2000 Uk123 ABC C*
2000 Uk123 ABC U1
2001 Uk123 ABC C2
2001 Uk123 ABC U1
2001 UK345 DEF U1
2001 UK345 DEF C*
2002 UK345 DEF U*
;
proc print;
run;

data order;
input cc \$ rank;
cards;
C1 1
C2 1
C* 2
U1 3
U2 3
U* 4
;
proc print;
run;

proc sort data=have;by cc;run;
proc sort data=order;by cc;run;

data have2;
merge have (in=a) order;
by cc;
if a;
run;
proc print;
run;

proc sort data=have2; by year isin name rank;run;
proc print data=have2;run;

proc sort data=have2 out=have3;by isin year rank;run;
proc print data=have3;
run;

data want;
set have3;
by isin year;
if first.isin or first.year;
run;
proc print;
run;

Solution
‎05-26-2015 08:32 AM
Super User
Posts: 10,766

## Re: Filter data

data have;

input

Year ISIN \$ Name \$ cc \$;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

data temp;

set have;

if cc in ('C1' 'C2') then order=1;

else if cc='C*' then order=2;

else if cc in ('U1' 'U2') then order=3;

else if cc='U*' then order=4;

run;

proc sort data=temp;by name year order;run;

data want;

set temp;

by name year;

if first.year;

run;

Xia Keshan

Posts: 3,167

## Re: Filter data

The trick is to get your data properly sorted,

data have;

input

Year ISIN \$ Name \$ cc \$;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

PROC SQL;

CREATE TABLE HAVE2 AS

SELECT * FROM HAVE

ORDER BY YEAR, NAME, SUBSTR(CC,1,1), SUBSTR(CC,2,1) DESC

;

QUIT;

DATA WANT;

SET HAVE2;

BY YEAR NAME;

IF FIRST.NAME;

RUN;

🔒 This topic is solved and locked.