DATA Step, Macro, Functions and more

Filter data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

Filter data

Hi, I am new to SAS and would like some help in filtering this set of data from bankscope which has a problem with duplications. There are instances where a certain bank in a certain year has 3 rows of financial data, hence there is a need to remove/delete duplicates according to the following priority of filtering - C1/C2, C*, U1/U2, U*. From table 1 to table 2.

Table 1

YearISINNameConsolidation code
2000Uk123ABCC1
2000Uk123ABCC*
2000Uk123ABCU1
2001Uk123ABCC2
2001Uk123ABCU1
2001UK345DEFU1
2001UK345DEFC*
2002UK345DEFU*

Table 2

YearISINNameConsolidation code
2000Uk123ABCC1
2001Uk123ABCC2
2001UK345DEFC*

Accepted Solutions
Solution
‎05-26-2015 08:32 AM
Super User
Posts: 10,028

Re: Filter data

data have;

input

Year ISIN $ Name $ cc $;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

data temp;

set have;

if cc in ('C1' 'C2') then order=1;

  else if cc='C*' then order=2;

   else if cc in ('U1' 'U2') then order=3;

    else if cc='U*' then order=4;

run;

proc sort data=temp;by name year order;run;

data want;

set temp;

by name year;

if first.year;

run;

Xia Keshan

View solution in original post


All Replies
Super User
Posts: 5,429

Re: Filter data

Create a look-up table which maps your codes to priority order.

Apply the order to your original data (format, join, etc)

Sort on year, bank and priority

Proc sort nodupkey, by year bank.

Data never sleeps
Occasional Contributor
Posts: 17

Re: Filter data

Linus, thanks for your help. But I am really new at this, do you mind sharing with me how do you code them?

Contributor
Posts: 27

Re: Filter data


data have;
input
Year ISIN $ Name $ cc $;
cards;
2000 Uk123 ABC C1
2000 Uk123 ABC C*
2000 Uk123 ABC U1
2001 Uk123 ABC C2
2001 Uk123 ABC U1
2001 UK345 DEF U1
2001 UK345 DEF C*
2002 UK345 DEF U*
;
proc print;
run;


data order;
input cc $ rank;
cards;
C1 1
C2 1
C* 2
U1 3
U2 3
U* 4
;
proc print;
run;

proc sort data=have;by cc;run;
proc sort data=order;by cc;run;


data have2;
merge have (in=a) order;
by cc;
if a;
run;
proc print;
run;

proc sort data=have2; by year isin name rank;run;
proc print data=have2;run;

proc sort data=have2 out=have3;by isin year rank;run;
proc print data=have3;
run;

data want;
set have3;
by isin year;
if first.isin or first.year;
run;
proc print;
run;

Solution
‎05-26-2015 08:32 AM
Super User
Posts: 10,028

Re: Filter data

data have;

input

Year ISIN $ Name $ cc $;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

data temp;

set have;

if cc in ('C1' 'C2') then order=1;

  else if cc='C*' then order=2;

   else if cc in ('U1' 'U2') then order=3;

    else if cc='U*' then order=4;

run;

proc sort data=temp;by name year order;run;

data want;

set temp;

by name year;

if first.year;

run;

Xia Keshan

Respected Advisor
Posts: 3,156

Re: Filter data

The trick is to get your data properly sorted,

data have;

input

Year ISIN $ Name $ cc $;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

PROC SQL;

CREATE TABLE HAVE2 AS

SELECT * FROM HAVE

ORDER BY YEAR, NAME, SUBSTR(CC,1,1), SUBSTR(CC,2,1) DESC

;

QUIT;

DATA WANT;

SET HAVE2;

BY YEAR NAME;

IF FIRST.NAME;

RUN;


🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 303 views
  • 6 likes
  • 5 in conversation