BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mmh
Obsidian | Level 7 mmh
Obsidian | Level 7

Dear Altruists.
I am a new SAS user.
I have the following dataset:

GVKey Report_Date Quarter
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 30-09-21 Q3
1001 05-10-21 Q3
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 30-06-21 Q2
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4

I want to remove the duplicates and keep only the last entry based on the variable "Quarter" for a given firm.
I want two resulting datasets. The first is a clean dataset as follows:

GVKey Report_Date Quarter
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4

The second dataset will contain only the duplicate values as follows:

GVKey Report_Date Quarter
1001 30-09-21 Q3
1001 05-10-21 Q3
1002 30-06-21 Q2

I thank you in advance for your kind support!

1 ACCEPTED SOLUTION

Accepted Solutions
AndreaVianello
Obsidian | Level 7
data have;
input GVKey $ RD $ Quarter$;
infile datalines dlm = ' ';
Report_date=mdy( substr(rd,4,2) , substr(rd,1,2) , substr(rd,7,2) );
format Report_date ddmmyy10.;
datalines;
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 30-09-21 Q3
1001 05-10-21 Q3
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 30-06-21 Q2
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4
; run;

data out_last out_dup;
set have;
by GVKey Quarter;
if last.quarter=1 then output out_last;
else output out_dup;
run;

View solution in original post

3 REPLIES 3
AndreaVianello
Obsidian | Level 7
data have;
input GVKey $ RD $ Quarter$;
infile datalines dlm = ' ';
Report_date=mdy( substr(rd,4,2) , substr(rd,1,2) , substr(rd,7,2) );
format Report_date ddmmyy10.;
datalines;
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 30-09-21 Q3
1001 05-10-21 Q3
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 30-06-21 Q2
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4
; run;

data out_last out_dup;
set have;
by GVKey Quarter;
if last.quarter=1 then output out_last;
else output out_dup;
run;
Kurt_Bremser
Super User

Since the FIRST. and LAST. automatic variables contain only boolean values (1=true, 0=false), you can use last.quarter on its own without comparing it with 1.

if last.quarter
then output out_last;
else output out_dup;

 

mmh
Obsidian | Level 7 mmh
Obsidian | Level 7
Thanks a lot Andrea!
It was really kind of you 🙂