BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
klndsy
Calcite | Level 5

Hello! I have a coding question for SAS 9.4.

 

I am trying adjust my quarterly ROA variable for each firm by industry means for each quarter. Thus, my end goal is to have IAROA= ROA - Industry Average for each quarter for each firm.

 

For a visual, I have a primary dataset that looks similar to this :

Firm   Year.Quarter  Industry    ROA

1        2003.1            01             0.0125

1        2003.2            01             0.0359

1        2003.3            01             0.0451

1        2003.4            01             0.0215

2        2003.1            02             0.125

2        2003.2            02             0.135

2        2003.3            02             0.097

2        2003.4            02             0.102

3        2003.1            01             0.029

3        2003.2            01             0.024

3        2003.3            01             0.035

3        2003.4            01             0.028         

 

... and so forth through 2014 and 75 industry types.

 

Then, I have found the industry ROA means for each quarter (2003-2014). using PROC MEANS.  Specifically I use

 

proc means data=data.master mean noprint;
output out=IROA mean=;
by industry;
class yearquarter;
var roa ;

 

So, for example, I have:

Industry  year.quarter       Industry ROA (IROA)

01           2003.1                 .0171    

01           2003.2                 .0172  

01           2003.3                 .0173    

01           2003.4                 .0174    

01           2004.1                   

01           2004.2                  

01           2004.3                     

01           2004.4                 

02           2003.1                 .111          

02           2003.2                 .112   

02           2003.3                 .113    

02           2003.4                 .114    

02           2004.1                   

02           2004.2                  

02           2004.3                     

02           2004.4  

          

... and so forth over all industries through 2014.

 

My end goal is to have:

Firm   Year.Quarter  Industry    ROA          IROA        Industry Adjusted ROA

1        2003.1            01             0.0125      .0171         -.0046     (= .0125-.0171)

1        2003.2            01             0.0359      .0172         .0187      (= .0359-.0172) 

1        2003.3            01             0.0451      .0173         .0278

1        2003.4            01             0.0215      .0174         .0041

2        2003.1            02             0.125        .111           .014

2        2003.2            02             0.135        .112           .023

2        2003.3            02             0.097        .113           -.016

2        2003.4            02             0.102        .114           -.012

3        2003.1            01             0.029        .0171         .0119

3        2003.2            01             0.024        .0172         .0068

3        2003.3            01             0.035        .0173         .0177

3        2003.4            01             0.028        .0174         .0106

 

As you can see, all firms will have the IROA value depending on what industry they belong to and the year.quarter of the observation.  For example, Firms 1 and 3 belong to the same industry, so IROA takes on the same value for the same quarters.

 

Since I have thousand of firms, 75ish industries, and my data spans 11 years, it is not feasible for me to code this manually.

 

 

 

A) Is there a macro that works well?

or

B) I can find the quarterly industry means, but I have not been able to merge it back into the primary dataset so every firm with the matching industry and quarter display the correct value for the industry roa variable. Only the first firm receives the industry variable value an the rest of the firms have "."

 

Please let me know if there is any pertinent information missing that will allow you to help me with this code piece! 

I thank you in advance!!!

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @klndsy,

 

I think, your code should work in principle (it could be simplified here and there, though) if the data are what they should be. The output dataset in your attached screenshot, however, does not look like the result of a merge by SIC2 YEAR, but rather like the result of a merge by SIC2 YQ (see obs. no. 1553 in particular!). One possible explanation is that some variable values shown in the screenshot are in fact formatted values. If, for example, variable YEAR contains (varying!) SAS date values (like those in YQ) which are only formatted with format YEAR., a merge by SIC2 YEAR could indeed yield results very similar to those shown in the screenshot.

 

Comparing YQ with QUARTER we see sort of an inconsistency which could impact your quarterly results as well: Variable YQ contains SAS date values, but sometimes they are varying within one quarter: YQ=15764='28FEB2003'd in obs. no. 1553 and YQ=15795='31MAR2003'd in obs. no. 1554 belong to the same quarter, but only a variable with a constant value (such as '2003.1') throughout a quarter would be suitable for merging by quarter. Again, it is not clear if variable QUARTER contains values such as 2003.1 (numeric or character?) or if it contains SAS date values, formatted with format YYQP..

 

I think, once this issue is clarified, you are very close to getting the intended results.

 

Just to suggest an alternative approach: You could let PROC STDIZE compute both the IROA values and the differences ROA - IROA. Please find below some sample code. However, you are so close to the finish line with your more elementary approach that you may want to continue on this route.

/* Create test data */

data have;
input firm yearquarter industry $ ROA;
cards;
1 2003.1 01 0.0125
1 2003.2 01 0.0359
1 2003.3 01 0.0451
1 2003.4 01 0.0215
2 2003.1 02 0.125
2 2003.2 02 0.135
2 2003.3 02 0.097
2 2003.4 02 0.102
3 2003.1 01 0.029
3 2003.2 01 0.024
3 2003.3 01 0.035
3 2003.4 01 0.028 
;

/* Sort input data and perform computations */

proc sort data=have;
by industry yearquarter;
run;

proc stdize data=have method=mean oprefix sprefix=IndAdj
            out=have_adj outstat=stats(where=(_type_='LOCATION') rename=(ROA=IROA));
var ROA;
by industry yearquarter;
run;

/* Merge firm-quarter level with industry-quarter level data and restore original sort order */

data want;
merge have_adj
      stats(drop=_type_);
by industry yearquarter;
label IndAdjROA='Industry Adjusted ROA';
run;

proc sort data=want;
by firm yearquarter;
run;

I'll be back tomorrow (CET).

View solution in original post

10 REPLIES 10
Reeza
Super User

Post your code. You should be able to merge the results back in and then do your substraction in a data step.

 

I would recommend perhaps a SQL join and then you could even do the subtraction in one step. If you want, you can even calculate them all in one step. 

klndsy
Calcite | Level 5

Hi Reeza; thank you for your reply.

 

So far, I have been trying this with annual ROA (ideally I would like quarterly), so my code reflects that. I figured once I get it figured out with annual, then I can apply it to my quarterly data.  Additionally, sic2 is the industry variable.

 

I have done:

 

proc sort data=compustatrevised3; by sic2 year; run;
proc means data=compustatrevised3 mean noprint;
output out=IROA mean=;
by sic2;
class year;
var roa roa2;
run;

 

data IROA2;
set IROA (drop=_freq_);
if _type_=0 then delete;
rename roa=indroa roa2=indroa2;
run;

 

proc sort data=compustatrevised3; by sic2 year; 
proc sort data=IROA2; by sic2 year; run;


data trialmerge;
merge compustatrevised3 IROA2;
by sic2 year;
run;

 

 

I have tried multiple variations of the merge command, but this is the one that gets me closest.  However, it only fills the industry roa mean variable (indroa) for the first few firms of each year.  Even then, I can't exactly figure out the problem because most of the time it will only fill the first firm, but sometimes it will fill 2, 3 or 4 firms with the industry roa value!  I attached a screenshot (word doc) of my merged output problem.

 

 

Reeza
Super User

Is your primary data sample reflective of your actual data format/structure?

klndsy
Calcite | Level 5

Yes, but it is currently sorted differently and includes a lot more variables.

 

I have tens of thousands of firms spanning many different industries over a period of 2003-2014 with quarterly data. Each firm has a unique ROA each quarter (with some firms missing observations based on data availability). 

 

My actual data set is sorted by industry(sic2) then year.quarter.

 

For a visual, my actual dataset looks similarly sorted to this (plus  many other variables) :

Firm   Year.Quarter  Industry    ROA

1        2003.1            01             0.0125

3        2003.1            01             0.029

1        2003.2            01             0.0359

3        2003.2            01             0.024

1        2003.3            01             0.0451

3        2003.3            01             0.035

1        2003.4            01             0.0215

3        2003.4            01             0.028

 

2        2003.1            02             0.125

2        2003.2            02             0.135

2        2003.3            02             0.097

2        2003.4            02             0.102

 

FreelanceReinh
Jade | Level 19

Hi @klndsy,

 

I think, your code should work in principle (it could be simplified here and there, though) if the data are what they should be. The output dataset in your attached screenshot, however, does not look like the result of a merge by SIC2 YEAR, but rather like the result of a merge by SIC2 YQ (see obs. no. 1553 in particular!). One possible explanation is that some variable values shown in the screenshot are in fact formatted values. If, for example, variable YEAR contains (varying!) SAS date values (like those in YQ) which are only formatted with format YEAR., a merge by SIC2 YEAR could indeed yield results very similar to those shown in the screenshot.

 

Comparing YQ with QUARTER we see sort of an inconsistency which could impact your quarterly results as well: Variable YQ contains SAS date values, but sometimes they are varying within one quarter: YQ=15764='28FEB2003'd in obs. no. 1553 and YQ=15795='31MAR2003'd in obs. no. 1554 belong to the same quarter, but only a variable with a constant value (such as '2003.1') throughout a quarter would be suitable for merging by quarter. Again, it is not clear if variable QUARTER contains values such as 2003.1 (numeric or character?) or if it contains SAS date values, formatted with format YYQP..

 

I think, once this issue is clarified, you are very close to getting the intended results.

 

Just to suggest an alternative approach: You could let PROC STDIZE compute both the IROA values and the differences ROA - IROA. Please find below some sample code. However, you are so close to the finish line with your more elementary approach that you may want to continue on this route.

/* Create test data */

data have;
input firm yearquarter industry $ ROA;
cards;
1 2003.1 01 0.0125
1 2003.2 01 0.0359
1 2003.3 01 0.0451
1 2003.4 01 0.0215
2 2003.1 02 0.125
2 2003.2 02 0.135
2 2003.3 02 0.097
2 2003.4 02 0.102
3 2003.1 01 0.029
3 2003.2 01 0.024
3 2003.3 01 0.035
3 2003.4 01 0.028 
;

/* Sort input data and perform computations */

proc sort data=have;
by industry yearquarter;
run;

proc stdize data=have method=mean oprefix sprefix=IndAdj
            out=have_adj outstat=stats(where=(_type_='LOCATION') rename=(ROA=IROA));
var ROA;
by industry yearquarter;
run;

/* Merge firm-quarter level with industry-quarter level data and restore original sort order */

data want;
merge have_adj
      stats(drop=_type_);
by industry yearquarter;
label IndAdjROA='Industry Adjusted ROA';
run;

proc sort data=want;
by firm yearquarter;
run;

I'll be back tomorrow (CET).

klndsy
Calcite | Level 5

Hi FreelandReinhard! Thank you for your time!  Yes, my variables Year and YQ are both formatted variables from datadate which has the format 01012003. I thought I took care of this using put and input, but clearly I did not!  I have exhausted everything I can think of with this issue. Do you have any suggestions???

AskoLötjönen
Quartz | Level 8

You can do all calculations with one Proc SQL: Industry average ROA are calculated in subquery:

 

proc sql;

create table want as

select P.Firm,

P.YearQuarter,

P.Industry,

P.ROA,

I.IROA,

(P.roa-I.IROA) as IAROA

from primary P

left join (select Sub.Industry, Sub.YearQuarter, mean(Sub.ROA) as IROA from Primary Sub group by Sub.Industry, Sub.YearQuarter) I

on P.Industry = I.Industry

and P.YearQuarter = I.YearQuarter

order p.Firm, P.YearQuarter;

quit;

FreelanceReinh
Jade | Level 19

@AskoLötjönen: Sure, this would be the most straightforward approach. It just seemed that the OP is more familiar with the data step and PROC MEANS.

FreelanceReinh
Jade | Level 19

I think, the easiest and safest way to prepare the aggregations and merges by year and by quarter is to create variables which contain year and quarter (incl. year), respectively.

 

Example:

/* Sample data */
data old;
date='24FEB2016'd; /* numeric variable with values like 20508, possibly derived */
run;               /* from a raw date value like rawdate='24022016' by using    */
                   /* the INPUT function: date=input(rawdate, ddmmyy8.)         */

/* To be created */
data new;
set old;
year=year(date); /* numeric variable with values like 2016 */
quarter=put(date, yyqp.); /* character variable with values like '2016.1' */
run;

 

You say that YQ is formatted. My impression was rather that YQ is unformatted (screenshot shows SAS date values), but QUARTER could be formatted.

klndsy
Calcite | Level 5

Y'all were so wonderful and helpful. It is all working beautifully now. Thank you very much!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 6717 views
  • 0 likes
  • 4 in conversation