BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
danielhu
Calcite | Level 5

Hi, I am new to SAS and would like some help in filtering this set of data from bankscope which has a problem with duplications. There are instances where a certain bank in a certain year has 3 rows of financial data, hence there is a need to remove/delete duplicates according to the following priority of filtering - C1/C2, C*, U1/U2, U*. From table 1 to table 2.

Table 1

YearISINNameConsolidation code
2000Uk123ABCC1
2000Uk123ABCC*
2000Uk123ABCU1
2001Uk123ABCC2
2001Uk123ABCU1
2001UK345DEFU1
2001UK345DEFC*
2002UK345DEFU*

Table 2

YearISINNameConsolidation code
2000Uk123ABCC1
2001Uk123ABCC2
2001UK345DEFC*
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

data have;

input

Year ISIN $ Name $ cc $;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

data temp;

set have;

if cc in ('C1' 'C2') then order=1;

  else if cc='C*' then order=2;

   else if cc in ('U1' 'U2') then order=3;

    else if cc='U*' then order=4;

run;

proc sort data=temp;by name year order;run;

data want;

set temp;

by name year;

if first.year;

run;

Xia Keshan

View solution in original post

5 REPLIES 5
LinusH
Tourmaline | Level 20

Create a look-up table which maps your codes to priority order.

Apply the order to your original data (format, join, etc)

Sort on year, bank and priority

Proc sort nodupkey, by year bank.

Data never sleeps
danielhu
Calcite | Level 5

Linus, thanks for your help. But I am really new at this, do you mind sharing with me how do you code them?

bharathtuppad
Obsidian | Level 7


data have;
input
Year ISIN $ Name $ cc $;
cards;
2000 Uk123 ABC C1
2000 Uk123 ABC C*
2000 Uk123 ABC U1
2001 Uk123 ABC C2
2001 Uk123 ABC U1
2001 UK345 DEF U1
2001 UK345 DEF C*
2002 UK345 DEF U*
;
proc print;
run;


data order;
input cc $ rank;
cards;
C1 1
C2 1
C* 2
U1 3
U2 3
U* 4
;
proc print;
run;

proc sort data=have;by cc;run;
proc sort data=order;by cc;run;


data have2;
merge have (in=a) order;
by cc;
if a;
run;
proc print;
run;

proc sort data=have2; by year isin name rank;run;
proc print data=have2;run;

proc sort data=have2 out=have3;by isin year rank;run;
proc print data=have3;
run;

data want;
set have3;
by isin year;
if first.isin or first.year;
run;
proc print;
run;

Ksharp
Super User

data have;

input

Year ISIN $ Name $ cc $;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

data temp;

set have;

if cc in ('C1' 'C2') then order=1;

  else if cc='C*' then order=2;

   else if cc in ('U1' 'U2') then order=3;

    else if cc='U*' then order=4;

run;

proc sort data=temp;by name year order;run;

data want;

set temp;

by name year;

if first.year;

run;

Xia Keshan

Haikuo
Onyx | Level 15

The trick is to get your data properly sorted,

data have;

input

Year ISIN $ Name $ cc $;

cards;

2000 Uk123 ABC C1

2000 Uk123 ABC C*

2000 Uk123 ABC U1

2001 Uk123 ABC C2

2001 Uk123 ABC U1

2001 UK345 DEF U1

2001 UK345 DEF C*

2002 UK345 DEF U*

;

run;

PROC SQL;

CREATE TABLE HAVE2 AS

SELECT * FROM HAVE

ORDER BY YEAR, NAME, SUBSTR(CC,1,1), SUBSTR(CC,2,1) DESC

;

QUIT;

DATA WANT;

SET HAVE2;

BY YEAR NAME;

IF FIRST.NAME;

RUN;


SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1832 views
  • 6 likes
  • 5 in conversation