BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mmh
Obsidian | Level 7 mmh
Obsidian | Level 7

Dear Altruists.
I am a new SAS user.
I have the following dataset:

GVKey Report_Date Quarter
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 30-09-21 Q3
1001 05-10-21 Q3
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 30-06-21 Q2
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4

I want to remove the duplicates and keep only the last entry based on the variable "Quarter" for a given firm.
I want two resulting datasets. The first is a clean dataset as follows:

GVKey Report_Date Quarter
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4

The second dataset will contain only the duplicate values as follows:

GVKey Report_Date Quarter
1001 30-09-21 Q3
1001 05-10-21 Q3
1002 30-06-21 Q2

I thank you in advance for your kind support!

1 ACCEPTED SOLUTION

Accepted Solutions
AndreaVianello
Obsidian | Level 7
data have;
input GVKey $ RD $ Quarter$;
infile datalines dlm = ' ';
Report_date=mdy( substr(rd,4,2) , substr(rd,1,2) , substr(rd,7,2) );
format Report_date ddmmyy10.;
datalines;
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 30-09-21 Q3
1001 05-10-21 Q3
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 30-06-21 Q2
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4
; run;

data out_last out_dup;
set have;
by GVKey Quarter;
if last.quarter=1 then output out_last;
else output out_dup;
run;

View solution in original post

3 REPLIES 3
AndreaVianello
Obsidian | Level 7
data have;
input GVKey $ RD $ Quarter$;
infile datalines dlm = ' ';
Report_date=mdy( substr(rd,4,2) , substr(rd,1,2) , substr(rd,7,2) );
format Report_date ddmmyy10.;
datalines;
1001 31-03-21 Q1
1001 30-06-21 Q2
1001 30-09-21 Q3
1001 05-10-21 Q3
1001 10-10-21 Q3
1001 31-12-21 Q4
1002 31-03-21 Q1
1002 30-06-21 Q2
1002 01-07-21 Q2
1002 30-09-21 Q3
1002 31-12-21 Q4
; run;

data out_last out_dup;
set have;
by GVKey Quarter;
if last.quarter=1 then output out_last;
else output out_dup;
run;
Kurt_Bremser
Super User

Since the FIRST. and LAST. automatic variables contain only boolean values (1=true, 0=false), you can use last.quarter on its own without comparing it with 1.

if last.quarter
then output out_last;
else output out_dup;

 

mmh
Obsidian | Level 7 mmh
Obsidian | Level 7
Thanks a lot Andrea!
It was really kind of you 🙂

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 827 views
  • 3 likes
  • 3 in conversation