SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Eliminating some observations

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

Eliminating some observations

Hello SAS Support Communities,

I'm working with restatement observations.  Some observations cover 1 period (for example, January 1, 2014 through December 31, 2014).  Others cover multiple periods (for example, January 1, 2014 through June 30, 2015).  The issue is some companies have fiscal year-ends (i.e., year-ends other than December 31).  
 
Here's what I did.  I created fiscal year observations for each year.  In my second example above, a 2014 observation would be created and a 2015 observation would be created.  
 
Currently, here's an example of the problem in my code.  If you look at the attached SAS file, see observations 42 and 43 (both with restatement_key 33333.  The restatement period is from 10/1/2009 through 3/31/2010.  This company's fiscal year-end is June 30th (see the datadate column).  As a result, the restatement period is within the same fiscal year (7/1/2009 through 6/30/2010).  Therefore, I should ONLY have 1 observation for this restatement_key (not two).  How do I go about doing that.  As you can see, my issue can only occur if the same restatement_key is located in more than 1 observation. 
 
I'm using SAS 9.3.
  
God bless, best regards, and thanks a lot for your help,
Jadallah
Attachment

Accepted Solutions
Solution
‎05-31-2017 10:40 AM
PROC Star
Posts: 1,094

Re: Eliminating some observations

I think the first issue is because I started out thinking that the FY boundary was start of FY, then I realized it was end of FY. Try changing the line

 

TestYear = year(Restated_Period_Begin) - 1; /* Get the year before the beginning of the restated period */

 

to

 

TestYear = year(Restated_Period_Begin); /* Get the year before the beginning of the restated period */

 

and see if that fixes 33333, without breaking anything else.

 

Regarding DelayDays, I didn't do anything about it. I have absolutely no idea what it's supposed to be...I suggest you give that a try yourself.

 

Tom

View solution in original post


All Replies
Trusted Advisor
Posts: 1,616

Re: Eliminating some observations

Without having to actually download your data and open it in SAS, let me state the general steps you want to use.

 

  1. Sort by restatement_key and datadate
  2. In a new data step, include (SET) the now sorted dataset with the same BY statement, and then the line IF LAST.DATADATE;

 

Since you didn't really specify what would happen if the end of the restatement period is the same day as the end of the fiscal year, the program might fail in that case ... but since we don't know what you'd do in that situation, we can't tell you what the code is that would do what you want in that situation.

Contributor
Posts: 38

Re: Eliminating some observations

Thanks for the information.  Unfortunately, it did not work.  If the end of the restatement period is the same day as the end of the fiscal year, that counts as being in the same fiscal year.  For example, look at observations 69 and 70 (restatement_key 33949).  The restatement period is from 10/1/2009 to 9/30/2010.  The fiscal year-end is 9/30/2010.  Therefore, this should only be one observation (not two).  The restatement period falls in between the two datadates (9/30/2009 and 9/30/2010).  That's why I'm guessing last.datadate did not work.  The number of observations stayed in the same.  What do you think I should do?

Trusted Advisor
Posts: 1,616

Re: Eliminating some observations

As far as I can tell, the method will work, and your statement that it didn't work doesn't convince me. This method will produce one observation per restatement_key. So you need to show us your code.

Contributor
Posts: 38

Re: Eliminating some observations

libname dissert1 "c:/dissert1";

run;

data dissert1.sas1;
set dissert1.sas;
rstate_b = restated_period_begin;
rstate_e = restated_period_ended;
RstateDays = Datdif (rstate_b, rstate_e, 'act/act');
y1 = year(restated_period_begin);
y2 = year(restated_period_ended);
run;

proc sort data=dissert1.sas1 out=dissert1.sas2;
by cik_code;
run;

/*create observation for each year restated [include begin year and end year] */
data dissert1.sas3;
set dissert1.sas2;
by cik_code;
if cik_code then do;
do fyear=y1 to y2;
output;
end;
end;
run;

data dissert1.sas4;
set dissert1.sas3;
drop y1;
drop y2;
run;

data dissert1.sas5;
set dissert1.sas4;
drop Other_Significant_Issues;
drop Errors___Accounting_and_Clerical;
drop Financial_Fraud__Irregularities_;
drop Accounting_Rule__GAAP_FASB__Appl;
drop Date_of_8_K_Item_4_02;
run;

data dissert1.sas6;
set dissert1.sas5;
SIC = SUBSTRN (SIC_Code, 1,2);
Run;

proc sort data=dissert1.sas6 out=dissert1.sas7 noduplicate; by cik_code restatement_key;
run;

/* Sort imported restatement data with disclosure date and disclosure type.
*/

proc sort data=dissert1.new out=dissert1.new1 noduplicate; by cik_code restatement_key;
run;

data dissert1.new2;
merge dissert1.sas7 dissert1.new1;
by cik_code restatement_key;
run;

data dissert1.new3;
set dissert1.new2;
if Big_R = 0;
run;

/* Sort imported restatement data with disclosure date and disclosure type.
*/

proc sort data=dissert1.opinions out=dissert1.opinions1 noduplicate; by cik_code fyear;
run;

proc sort data=dissert1.new3 out=dissert1.new4 noduplicate; by cik_code fyear;
run;

Data dissert1.new5;
Merge dissert1.new4 (in = inone) dissert1.opinions1 (in = intwo);
By cik_code fyear;
if inone and intwo;
run;

/* For now, do analysis ignoring whether auditor changed or not. Ask AA about audit opinions on Monday.
*/

/* Bring in Compustat data so I can do fyr.
*/

data dissert1.compustat2017a;
set dissert1.compustat2017;
if indfmt = 'INDL';
if curcd = 'USD';
run;

proc sort data=dissert1.compustat2017a out=dissert1.compustat2017b noduplicate; by cik fyear;
run;

/* Rename cik_code to cik to be able to merge Compustat with Audit Analytics.
*/

data dissert1.new4a;
set dissert1.new4;
cik = cik_code*1;
run;

data dissert1.new4b;
set dissert1.new4a (rename=(cik=temp));
cik=put(temp,z10.);
drop temp;
run;

proc sort data=dissert1.new4b out=dissert1.new4c noduplicate; by cik fyear;
run;

Data dissert1.new4d;
Merge dissert1.new4c (in = inone) dissert1.compustat2017b (in = intwo);
By cik fyear;
if inone and intwo;
run;

/* Calculate number of days between fiscal year-end and restatement disclosure date.
*/

data dissert1.new4e;
set dissert1.new4d;
DelayDays = Datdif (datadate, disclosure_date, 'act/act');
run;

data dissert1.new4f;
set dissert1.new4e;
if delaydays lt 0 then delete;
run;

/* Delete prior to 2009 and after 2015 to get repeat restaters.
*/

data dissert1.new4g;
set dissert1.new4f;
if fyear lt 2009 then delete;
if fyear gt 2015 then delete;
run;

proc sort data=dissert1.new4g out=dissert1.new4h noduplicate; by restatement_key cik fyear;
run;

data dissert1.new4i;
set dissert1.new4h;
keep cik restatement_key restated_period_begin restated_period_ended rstate_b rstate_e RstateDays fyear
Disclosure Disclosure_Date datadate fyr DelayDays;
run;

proc sort data=dissert1.new4i out=dissert1.new4j noduplicate; by restatement_key datadate;
run;

data dissert1.new4k;
set dissert1.new4j;
by restatement_key datadate;
if last.datadate;
run;

PROC Star
Posts: 1,094

Re: Eliminating some observations

I love problems like this!

 

Give this a spin; I'm a little vague on a couple of your requirements, but I think it's pretty close. And shouldn't take many changes to get it exactly right.

 

Note that the first two steps convert the data you provided to an assumed starting point. The third step removes records with a FY end of Feb 29...I have no idea what to do with them!

 

Tom

 

/* Set the data back to the presumed starting condition */

proc sort data=begin out=have;

by Restatement_Key DATADATE;

run;

data have;

set have(rename=(DATADATE=FYDate)); /* Rename DATADATE to a date that represents end of fiscal year */

by Restatement_Key;

if first.Restatement_Key; /* Only keep one observation */

run;

data have;

set have;

if month(FYDate) = 2 & day(FYDate) = 29 then delete; /* No clue what to do about these */

run;

/* Now process the data to get the correct results */

data want;

set have;

FYMonth = month(FYDate); /* Get the month and day of fiscal year end */

FYDay = day(FYDate);

TestYear = year(Restated_Period_Ended); /* Get the year of the end of the restated period */

TestDate = mdy(FYMonth, FYDay, TestYear); /* Use this year to get a fiscal year end date with year of restated period end */

if Restated_Period_Ended > TestDate then EndYear = TestYear + 1; /* This period ends in the year after the restated period end */

else EndYear = TestYear; /* This period ends in the year of the restated period end */

TestYear = year(Restated_Period_Begin) - 1; /* Get the year before the beginning of the restated period */

TestDate = mdy(FYMonth, FYDay, TestYear); /* Use this year to get a fiscal year start date with year of restated period start */

if Restated_Period_Begin > TestDate then StartYear = TestYear + 1; /* This period ends in the year after the restated period start */

else StartYear = TestYear; /* This period ends in the year of the restated period start */

do FiscalYear = StartYear to EndYear;

DATADATE = mdy(FYMonth, FYDay, FiscalYear);

format FYDate DATADATE date9.;;

output;

end;

run;

Contributor
Posts: 38

Re: Eliminating some observations

[ Edited ]

Thanks so much Tom!  We're a lot closer than we were before.

 

2 things:

1. Look at attached observations 50 and 51 (for example) based on your updated code.  Restatement_key 33333 covers the 10/1/2009 through 3/31/2010 period.  The fiscal year-end is 6/30.  Therefore, there should only be 1 observation (not 2).

2. Look at attached observations 52 and 53 (for example) based on your updated code.  Restatement_key 33414 covers the 7/1/2009 through 3/31/2010 period.  The fiscal year-end is 9/30.  Therefore, it is correct that there are two observations (one for 7/1/2009 through 9/30/2009 and one for 10/1/2009 through 3/31/2010).  However, the DelayDays should not be equal for the two observations.  For the first observation, the DelayDays should be 370 (9/30/2009 through 10/5/2010 - the disclosure date).  This one is correct.  

For the second observation, the DelayDays should be 5 (9/30/2010 through 10/5/2010 - the disclosure date).  In other words, the datadate needs to be updated for the second observation.

 

What do you advise that I do?  

 

God bless, best regards, and thanks so much,

Jadallah

Attachment
Solution
‎05-31-2017 10:40 AM
PROC Star
Posts: 1,094

Re: Eliminating some observations

I think the first issue is because I started out thinking that the FY boundary was start of FY, then I realized it was end of FY. Try changing the line

 

TestYear = year(Restated_Period_Begin) - 1; /* Get the year before the beginning of the restated period */

 

to

 

TestYear = year(Restated_Period_Begin); /* Get the year before the beginning of the restated period */

 

and see if that fixes 33333, without breaking anything else.

 

Regarding DelayDays, I didn't do anything about it. I have absolutely no idea what it's supposed to be...I suggest you give that a try yourself.

 

Tom

Contributor
Posts: 38

Re: Eliminating some observations

Thanks again Tom!  We are really close.  I will look at it closer in the morning and send an update.

Contributor
Posts: 38

Re: Eliminating some observations

I am good to go for now on this issue.  Thanks so much Tom!  I may post something later about a different issue.  You are awesome Tom!  God bless you and your family!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 422 views
  • 0 likes
  • 3 in conversation