Hi, I am new to SAS and would like some help in filtering this set of data from bankscope which has a problem with duplications. There are instances where a certain bank in a certain year has 3 rows of financial data, hence there is a need to remove/delete duplicates according to the following priority of filtering - C1/C2, C*, U1/U2, U*. From table 1 to table 2.
Table 1
Year | ISIN | Name | Consolidation code |
2000 | Uk123 | ABC | C1 |
2000 | Uk123 | ABC | C* |
2000 | Uk123 | ABC | U1 |
2001 | Uk123 | ABC | C2 |
2001 | Uk123 | ABC | U1 |
2001 | UK345 | DEF | U1 |
2001 | UK345 | DEF | C* |
2002 | UK345 | DEF | U* |
Table 2
Year | ISIN | Name | Consolidation code |
2000 | Uk123 | ABC | C1 |
2001 | Uk123 | ABC | C2 |
2001 | UK345 | DEF | C* |
data have;
input
Year ISIN $ Name $ cc $;
cards;
2000 Uk123 ABC C1
2000 Uk123 ABC C*
2000 Uk123 ABC U1
2001 Uk123 ABC C2
2001 Uk123 ABC U1
2001 UK345 DEF U1
2001 UK345 DEF C*
2002 UK345 DEF U*
;
run;
data temp;
set have;
if cc in ('C1' 'C2') then order=1;
else if cc='C*' then order=2;
else if cc in ('U1' 'U2') then order=3;
else if cc='U*' then order=4;
run;
proc sort data=temp;by name year order;run;
data want;
set temp;
by name year;
if first.year;
run;
Xia Keshan
Create a look-up table which maps your codes to priority order.
Apply the order to your original data (format, join, etc)
Sort on year, bank and priority
Proc sort nodupkey, by year bank.
Linus, thanks for your help. But I am really new at this, do you mind sharing with me how do you code them?
data have;
input
Year ISIN $ Name $ cc $;
cards;
2000 Uk123 ABC C1
2000 Uk123 ABC C*
2000 Uk123 ABC U1
2001 Uk123 ABC C2
2001 Uk123 ABC U1
2001 UK345 DEF U1
2001 UK345 DEF C*
2002 UK345 DEF U*
;
proc print;
run;
data order;
input cc $ rank;
cards;
C1 1
C2 1
C* 2
U1 3
U2 3
U* 4
;
proc print;
run;
proc sort data=have;by cc;run;
proc sort data=order;by cc;run;
data have2;
merge have (in=a) order;
by cc;
if a;
run;
proc print;
run;
proc sort data=have2; by year isin name rank;run;
proc print data=have2;run;
proc sort data=have2 out=have3;by isin year rank;run;
proc print data=have3;
run;
data want;
set have3;
by isin year;
if first.isin or first.year;
run;
proc print;
run;
data have;
input
Year ISIN $ Name $ cc $;
cards;
2000 Uk123 ABC C1
2000 Uk123 ABC C*
2000 Uk123 ABC U1
2001 Uk123 ABC C2
2001 Uk123 ABC U1
2001 UK345 DEF U1
2001 UK345 DEF C*
2002 UK345 DEF U*
;
run;
data temp;
set have;
if cc in ('C1' 'C2') then order=1;
else if cc='C*' then order=2;
else if cc in ('U1' 'U2') then order=3;
else if cc='U*' then order=4;
run;
proc sort data=temp;by name year order;run;
data want;
set temp;
by name year;
if first.year;
run;
Xia Keshan
The trick is to get your data properly sorted,
data have;
input
Year ISIN $ Name $ cc $;
cards;
2000 Uk123 ABC C1
2000 Uk123 ABC C*
2000 Uk123 ABC U1
2001 Uk123 ABC C2
2001 Uk123 ABC U1
2001 UK345 DEF U1
2001 UK345 DEF C*
2002 UK345 DEF U*
;
run;
PROC SQL;
CREATE TABLE HAVE2 AS
SELECT * FROM HAVE
ORDER BY YEAR, NAME, SUBSTR(CC,1,1), SUBSTR(CC,2,1) DESC
;
QUIT;
DATA WANT;
SET HAVE2;
BY YEAR NAME;
IF FIRST.NAME;
RUN;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.