Merging Question

Reply
Contributor
Posts: 65

Merging Question

Hello SAS Support Communities,

Here's what I'm looking for.  I currently have two datasets (compustat1 and restatements1).  I want to merge them and get compustat2.  There are three variables to create in compustat 2 (AO, FO, and AD).  
 
Let me start with the compustat1 dataset.  The datadate is the end of the fiscal year (for example, 03/31/2000).  Let's look at restatement_key 11335 in the restatements1 dataset.  That restatement started on 01/01/2000 and ended on 06/30/2004.  Therefore, there was a restatement during the following years:
03/31/2000
03/31/2001
03/31/2002
03/31/2003
03/31/2004
03/31/2005
 
Therefore, the AO variable should be 1 for all of those years.
 
The FO variable should only be 1 in the first year of a restatement (i.e., 03/31/2000 for restatement_key 11335).
 
The AD variable should only be 1 in the year that the restatement was disclosed (disclosure_date) (i.e., 03/31/2005 for restatement_key 11335).
 
Sometimes, there is more than 1 restatement_key for the same period.  Look at restatement_key 12789 in the restatements1 dataset.  That restatement covers 02/01/2002 through 06/30/2003.  Therefore, the AO variable should be 1 in the following years:
03/31/2002
03/31/2003
03/31/2004
 
They already are 1 from restatement_key 11335, but you get the point.
 
FO should be 1 for the year 03/31/2002 for restatement_key 12789.  If only restatement_key 11335 was there, 03/31/2002 would be 0.  However, since 03/31/2002 is the first year of a new restatement, FO should be 1.
 
AD should be 1 for the year 03/31/2005 for restatement_key 12789.  That is the same as restatement_key 11335 since they were both disclosed in the same period, but you get the point.
 
Please let me know how I can merge the two datasets by CIK_Code (company identifer) to get these three new variables to be added to the compustat1 dataset (as shown in compustat2).
 
God bless, best regards, and thank you so much,
Jadallah
Attachment
Attachment
Attachment
Super Contributor
Super Contributor
Posts: 266

Re: Merging Question

I'm not going to open zip files from an unknown source in this day and age and I doubt anyone else will either. Post some data in a data step and you will likely get a better response.

 

 

Contributor
Posts: 65

Re: Merging Question

Hello HB,

Thanks for the response.  The problem is that SAS Support Communities does not let me post a SAS dataset unless it is in a zip file.  If it helps, I have attached what I am looking for in Excel (combining the Compustat1 and Restatements1 tabs to get the Compustat2 tab).

 

God bless you and your family, best regards, and take care,

Jadallah

PROC Star
Posts: 2,319

Re: Merging Question

Don't post all your data. No one has the time to peruse full data sets anyway. Give us a 20-row example of what you want to achieve. 

Contributor
Posts: 65

Re: Merging Question

Hi Chris,

Thank you for the message.  I only posted an example.  One dataset has two observations in it and the other two each have nine observations.

 

God bless, best regards, and take care,

Jadallah 

Respected Advisor
Posts: 4,697

Re: Merging Question

[ Edited ]

@jjadall1

Also that I can understand where the guys are coming from, I feel they are a bit harsh on you  (@ChrisNZ@HB).

 

The best way to post data is to provide a SAS data step which creates sample data. There is a SAS macro available which converts a SAS table to a SAS data step creating the data.

https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/

 

I've downloaded and unzipped your data and then used this macro via below code.

libname sample 'C:\temp\sampledata';

%data2datastep(compustat1, sample, ,C:\temp\sampledata\compustat1.sas);
%data2datastep(compustat2, sample, ,C:\temp\sampledata\compustat2.sas);
%data2datastep(restatements1, sample, ,C:\temp\sampledata\restatements1.sas);

Here the result creating the data which now can be used by everyone in this forum.

data WORK.COMPUSTAT1;
  infile datalines dsd truncover;
  input CIK_Code:32. fyear:32. fyr:32. datadate:32.;
  label CIK_Code="CIK_Code" fyear="fyear" fyr="fyr" datadate="datadate";
datalines4;
1042173,1998,3,19980331
1042173,1999,3,19990331
1042173,2000,3,20000331
1042173,2001,3,20010331
1042173,2002,3,20020331
1042173,2003,3,20030331
1042173,2004,3,20040331
1042173,2005,3,20050331
1042173,2006,3,20060331
;;;;


data WORK.COMPUSTAT2;
  infile datalines dsd truncover;
  input CIK_Code:32. fyear:32. fyr:32. datadate:32. AO:32. FO:32. AD:32.;
  label CIK_Code="CIK_Code" fyear="fyear" fyr="fyr" datadate="datadate" AO="AO" FO="FO" AD="AD";
datalines4;
1042173,1998,3,19980331,0,0,0
1042173,1999,3,19990331,0,0,0
1042173,2000,3,20000331,1,1,0
1042173,2001,3,20010331,1,0,0
1042173,2002,3,20020331,1,1,0
1042173,2003,3,20030331,1,0,0
1042173,2004,3,20040331,1,0,0
1042173,2005,3,20050331,1,0,1
1042173,2006,3,20060331,0,0,0
;;;;

data WORK.RESTATEMENTS1;
  infile datalines dsd truncover;
  input CIK_Code:32. Restatement_Key:32. Restated_Period_Begin:DATE9. Restated_Period_Ended:DATE9. Disclosure_Date:DATE9.;
  format Restated_Period_Begin DATE9. Restated_Period_Ended DATE9. Disclosure_Date DATE9.;
  label CIK_Code="CIK_Code" Restatement_Key="Restatement_Key" Restated_Period_Begin="Restated_Period_Begin" Restated_Period_Ended="Restated_Period_Ended" Disclosure_Date="Disclosure_Date";
datalines4;
1042173,11335,01JAN2000,30JUN2004,19OCT2004
1042173,12789,01FEB2002,30JUN2003,01MAY2004
;;;;

 

 And on a side note:

Browsing through your previous posts it looks to me you often only post the problem without demonstrating your own effort. Please try to first solve the problem on your own and post questions only when you get stuck and need direction (eventually also posting some of your not yet working code). If you don't do this then you risk that over time more and more people will start to ignore your questions as this forum is not about doing your work for you (unless you pay for it) but to help and support you in doing your own work. 

Contributor
Posts: 65

Re: Merging Question

Thank you for the help Patrick.  That's a good point regarding effort.  The following code works for my variables (AO, FO, and AD) IF the FYR and FYEAR variables from compustat1 can be joined in with restatements1:

data Etr.restatement1;
set Etr.restatement;
rpb = restated_period_begin;
rpe = restated_period_ended;
RstateDays = (Datdif (rpb, rpe, 'act/act'))+1;
if month(restated_period_begin) le FYR then y1 = year(restated_period_begin);
else y1 = year(restated_period_begin) + 1;
if month(restated_period_ended) le FYR then y2 = year(restated_period_ended);
else y2 = year(restated_period_ended) + 1;
if month(disclosure_date) le FYR then y3 = year(disclosure_date);
else y3 = year(disclosure_date) + 1;
run;

data Etr.restatement1a;
set Etr.restatement1;
If RstateDays LT 350 then delete;
run;

data Etr.restatement2;
set Etr.restatement1a;
drop rpb;
drop rpe;
run;

proc sort data=Etr.restatement2 out=Etr.restatement2a;
by cik_code;
run;

/*create observation for each year restated [include begin year and end year] */
data Etr.restatement3;
set Etr.restatement2a;
by cik_code;
if cik_code then do;
do fyear=y1 to y3;
output;
end;
end;
run;

proc sort data= Etr.restatement3 out= Etr.restatement4 noduplicate;
by restatement_key cik fyear;
run;

data Etr.restatement5;
set Etr.restatement4;
if y1 = fyear then FO=1;
else FO=0;
run;

data Etr.restatement6;
set Etr.restatement5;
if y3 = fyear then AD=1;
else AD=0;
run;

data Etr.restatement7;
set Etr.restatement6;
if y1 le fyear le y2 then AO=1;
else AO=0;
run;

 

The problem is that a simple merge by cik_code does not work.

 

God bless, best regards, and thanks a lot,

Jadallah

Contributor
Posts: 65

Re: Merging Question

Hello Patrick,

I made significant progress with the following code:

data etr1.restatements2;

set etr1.restatements1;

y1 = year(restated_period_begin);

y1a = year(restated_period_begin)+1;

y2 = year(restated_period_ended);

y2a = year(restated_period_ended)+1;

y3 = year(disclosure_date);

y3a = year(disclosure_date)+1;

m1 = month(restated_period_begin);

m2 = month(restated_period_ended);

m3 = month(disclosure_date);

run;

 

 proc sort data=etr1.restatements2 out=etr1.restatements2a;

 by cik_code;

 run;

 

 /*create observation for each year restated [include begin year and end year] */

 data etr1.restatements3;

 set etr1.restatements2a;

 by cik_code;

 if cik_code then do;

 do fyear=y1 to y3a;

 output;

 end;

 end;

 run;

 

 proc sort data=etr1.restatements3 out=etr1.restatements4 noduplicate;

 by cik_code fyear;

 run;

 

 proc sort data=etr1.compustat1 out=etr1.compustat1a noduplicate;

 by cik_code fyear;

 run;

 

data etr1.combined;

merge etr1.compustat1a etr1.restatements4;

by cik_code fyear;

run;

 

data etr1.combined1;

set etr1.combined;

if m1 le fyr then fyear1 = y1;

if m1 gt fyr then fyear1 = y1a;

if m2 le fyr then fyear2 = y2;

if m2 gt fyr then fyear2 = y2a;

if m3 le fyr then fyear3 = y3;

if m3 gt fyr then fyear3 = y3a;

run;

 

 proc sort data=etr1.combined1 out=etr1.combined2 noduplicate;

 by restatement_key;

 run;

 

data etr1.combined3;

set etr1.combined2;

if fyear = fyear1 then FO = 1;

else FO = 0;

run;

 

 data etr1.combined4;

 set etr1.combined3;

 if fyear1 le fyear le fyear2 then AO=1;

 else AO=0;

 run;

 

data etr1.combined5;

set etr1.combined4;

if fyear = fyear3 then AD = 1;

else AD = 0;

run;

 

 proc sort data=etr1.combined5 out=etr1.combined6 noduplicate;

 by cik_code fyear;

 run;

 

Now all I need is the following:

There should only be 1 observation per cik_code fyear combination.  Some cik_code fyear combinations are included in multiple observations.  They need to be condensed as follows:

AO, AD, and FO need to keep maximum values.  For example, look at observations 11 and 12.  These should be condensed into 1 observation with FO = 0, AO = 1, and AD = 1.  Does that make sense?  It may be an array code.

 

God bless you and your family, best regards, and thank you for your help,

Jadallah

Contributor
Posts: 65

Re: Merging Question

Hello Kim,

If I'm not mistaken, my code is now working.  Here's a continuation of the code I sent earlier.  Please run the code from earlier and then this code below.  Please let me know if you see any problems.  By running it, I got the results I was looking for.
 
proc sort data=etr1.combined6 out=etr1.combined7 noduplicate;
 by cik_code fyear descending FO;
 run;
 
/*To calculate totals of AO, FO, and AD */
 
proc means data= etr1.combined7 noprint;
var AO FO AD ;
by cik_code fyear ;
output out = etr1.sum_of_variables
sum(AO)= AO_sum
sum(FO) = FO_sum
sum(AD) = AD_sum ;
run;
 
 
*merge;
 
proc sort data = etr1.combined7;
by cik_code fyear ;
run;
 
 
proc sort data = etr1.sum_of_variables;
by cik_code fyear ;
run;
 
data etr1.combined8;
merge etr1.combined7 etr1.sum_of_variables (drop = _TYPE_ _FREQ_);
by cik_code fyear ;
run;
 
data etr1.combined9;
set etr1.combined8;
keep cik_code fyear fyr datadate AO_sum FO_sum AD_sum;
run;
 
 proc sort data=etr1.combined9 out=etr1.combined10 noduplicate;
 by cik_code fyear;
 run;
 
data etr1.combined11;
set etr1.combined10;
if AO_sum = 0 then AO = 0;
if AO_sum ge 1 then AO = 1;
if FO_sum = 0 then FO = 0;
if FO_sum ge 1 then FO = 1;
if AD_sum = 0 then AD = 0;
if AD_sum ge 1 then AD = 1;
run;
 
data etr1.combined12;
set etr1.combined11;
drop AO_sum;
drop FO_sum;
drop AD_sum;
run;
 
 
God bless, best regards, and thank you,
Jadallah
Ask a Question
Discussion stats
  • 8 replies
  • 164 views
  • 0 likes
  • 4 in conversation