BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DiG
Fluorite | Level 6 DiG
Fluorite | Level 6

Hello,

I have difficulties in removing duplicates in my dataset. There are certain conditions that I don’t know how to code.

I have a table that lists companies in the form of company id, fiscal year end date (FYE), and date when the company reported some financial statements. Each company may report several statements on a given day for the same FYE, like in rows 2-3 or 5-7.

Also, some companies can have more than one FYE falling in the same calendar year like in rows 9-10, 11-12 and 14-15.

#

company ID

FYE

reporting date

amount

1

330A

  31DEC2005

  31MAR2006

456

2

330A

  31DEC2006

  31MAY2007

393

3

330A

  31DEC2006

  31MAY2007

695

4

350B

  31MAY2008

  10OCT2008

45

5

350B

  31MAY2009

  22JUL2009

86

6

350B

  31MAY2009

  22JUL2009

23334

7

350B

  31MAY2009

  22JUL2009

866

8

350B

  31MAY2010

  16OCT2011

299

9

840A

  20AUG2011

  31DEC2011

3245

10

840A

  31OCT2011

  31DEC2011

876

11

840A

20AUG2012

  31NOV2012

34

12

840A

  31OCT2012

  31NOV2012

87

13

840A

  20AUG2013

  30SEP2013

222

14

125N

  20JUL2009

  28NOV2009

64

15

125N

  20SEP2009

  18SEP2010

88777

16

125N

  20JUL2010

  18SEP2010

754

My goal is to ‘clean’ those that have several FYE in the given calendar year leaving only records with the latest FYE. Cases with multiple financial reports for the same FYE should stay as they are. This is what the table should look like after the cleaning:

#

company ID

FYE

reporting date

amount

1

330A

  31DEC2005

  31MAR2006

456

2

330A

  31DEC2006

  31MAY2007

393

3

330A

  31DEC2006

  31MAY2007

695

4

350B

  31MAY2008

  10OCT2008

45

5

350B

  31MAY2009

  22JUL2009

86

6

350B

  31MAY2009

  22JUL2009

23334

7

350B

  31MAY2009

  22JUL2009

866

8

350B

  31MAY2010

  16OCT2011

299

9

840A

  31OCT2011

  31DEC2011

876

10

840A

  31OCT2012

  31NOV2012

87

11

840A

  20AUG2013

  30SEP2013

222

12

125N

  20SEP2009

  18SEP2010

88777

13

125N

  20JUL2010

  18SEP2010

754

Can anybody help me please?

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

It could be as simple as the following:

proc sql; 

  create table want as

  select * from have

group by company, year(fye)

having fye=max(fye)

;

quit;

Not tested, as I have issues to input your data due to the format, but you 've got the idea.

Haikuo

View solution in original post

8 REPLIES 8
Haikuo
Onyx | Level 15

It could be as simple as the following:

proc sql; 

  create table want as

  select * from have

group by company, year(fye)

having fye=max(fye)

;

quit;

Not tested, as I have issues to input your data due to the format, but you 've got the idea.

Haikuo

esjackso
Quartz | Level 8

I had the data inputted already and code matches the output requested.

EJ

DiG
Fluorite | Level 6 DiG
Fluorite | Level 6

Yes, this is what I needed. Thank you.

Vince28_Statcan
Quartz | Level 8

The issue with the above is that year function only extracts the 4 digit year whilst fiscal years are not necessarily from 01/01-31/12. This could lead to, for example, having FYE 06/31/2012 and 12/31/2011 deemed as 2 distinct fiscal years whilst then 06/31/2011 would be an ignored data point for the "bad" 12/31/2011 FYE.

I don't quite have an easy solution on top of my head to further account for this though.

DiG
Fluorite | Level 6 DiG
Fluorite | Level 6

Looks like Hai.kuo's code works for me.

Vince28_Statcan
Quartz | Level 8


It depends on how the data is setup and any additionnal knowledge you have about it. In Canada, companies can each have their own distinct fiscal year period and depending on their size they are forced, by law, to report data either on a monthly or yearly basis (there might be additionnal periodicity cases, I'm not working in the business survey department). Point being, depending on what you ought to do with your timeseries considering the FYE as the latest reported FYE date of a given year may or may not achieve the desired analysis.

If my fiscal year is from sept01 to aug31 and that I report on a monthly or quarterly basis, the approach presented by Hai.Kuo would consider the FYE of my company to be on dec31 or so if it reports monthly or on november 30 for quarterly. Again, it is very possible that the way your data is setup, this solution is sufficient. I'm merely giving you a big warning not to draw conclusions too fast.

Vincent

Haikuo
Onyx | Level 15

Good point, . I believe that can be solved using customized date interval as long as all of the companies are using the same fical year. If not, then the code complexity will have to be bumped up to a higher level.

Say if Fical year starts on 6/1:

%let date='31may2013'd;

data _null_;

year=year(intnx('year.6',&date,0,'begin'));

put year=;

run;

Haikuo

DiG
Fluorite | Level 6 DiG
Fluorite | Level 6

Thank you Hai.kuo! It worked!! Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1003 views
  • 7 likes
  • 4 in conversation