SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

calculate a new variable using variables * cases

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

calculate a new variable using variables * cases

I have a data set looking like this:

yearshropt
19922.46
19922.3.
1992.86
1993.6.5
1993..
19932.30
19932.60
19933.58.9

I want to output a new table looking like this:

yearnew variable
1992
1993

where the new variable is the average of shr * opt in all the years.

e.g.    the new variable for 1992 equals (2.4 * 6 + 2.3 * . + . * 86)/3

   and the new variable for 1993 equals ( . * 6.5 + . * . + 2.3 * 0 + 2.6 * 0 + 3.5 * 8.9)/5

I want to output the new variable for each year.

Any idea how to do this?

Thanks.


Accepted Solutions
Solution
‎08-12-2011 04:02 PM
Super Contributor
Posts: 578

Re: calculate a new variable using variables * cases

By the way...including null values like you're doing effectively sets the value for that row to zero and includes it.  That's why the normal average function excludes null values as they aren't technically the same as zero.

proc sql;

select

     gvkey,

      year,

      sum(shr*opt)/count(*) as NewVar

from work.detail_data

group by gvkey, year;

quit;

View solution in original post


All Replies
PROC Star
Posts: 7,362

calculate a new variable using variables * cases

There are definitely fancier ways of doing this but, hopefully, this will be more useful in showing you how such results are calculated.

data have;

  input year          shr          opt;

  cards;

1992          2.4          6

1992          2.3          .

1992          .          86

1993          .          6.5

1993          .          .

1993          2.3          0

1993          2.6          0

1993          3.5          8.9

;

run;

data temp (keep=year data);

  set have;

  if shr > 0 then do;

    data=shr;

    output;

  end;

  if opt > 0 then do;

    data=opt;

    output;

  end;

run;

proc means data=temp nway mean;

  var data;

  class year;

  output out=want (drop=_Smiley Happy

    mean=average n=count;

run;

Super Contributor
Posts: 578

calculate a new variable using variables * cases

I believe this code is pretty straightforward:

data work.detail_data;

input

      Year

      shr

      opt;

datalines;

1992 2.4 6

1992 2.3 .

1992 . 86

1993 . 6.5

1993 2.3 0

1993 2.6 0

1993 3.5 8.9

;

run;

proc sql;

select

      year,

      avg(shr*opt) as NewVar

from work.detail_data

group by year;

quit;

It produces this output:

YearNewVar
199214.4
199310.38333
PROC Star
Posts: 7,362

calculate a new variable using variables * cases

I obviously didn't notice the asterisks in the definition.  However, you should be able to modify that yourself and now have two methods and can choose depending upon your preferences.

But, now that I know you want to multiply the two values, what effect to you want missing values to have?  What if both values are missing?  I.e., should missing  and/or zero values be included, should they cancel out the results for the entire equation, and/or should they be used or dismissed in the numerator? .. in the denominator?

Super Contributor
Posts: 578

calculate a new variable using variables * cases

I think you're right.  The standard average function doesn't calculate the same way the OP's formula does.  To accomplish that, I think he would have to change my formula from avg(shr*opt) to sum(shr*opt)/count(*).

Contributor
Posts: 45

calculate a new variable using variables * cases

In the example, ( . * 6.5 + . * . + 2.3 * 0 + 2.6 * 0 + 3.5 * 8.9)/5 would equal to (. + . + 0 + 0 + 3.5 * 8.9)/5 = 3.5 * 8.9 / 5 .

Does this make sense? Basically, if either of the two existing variables is missing, this makes the product of that line a missing value. But I think this wouldn't affect the entire equation. In the case of zeores, the product would be zero. But since I don't know for sure if the equation for each year contains how many missing values/zeroes/"good" values, I have to count all of them.

Super Contributor
Posts: 578

calculate a new variable using variables * cases

I think the normal SAS average function excludes null values from the denominator as well.  So, your calculation would end up being divided by 3 instead of 5.  If that's what you want, then you could use it.  If not, then you'll have to calculate it as sum(shr*opt)/count(*) which would equate to 3.5*8.9.5

Contributor
Posts: 45

calculate a new variable using variables * cases

Thanks. Including null values is what I want. I ran your code, and it gave me very good results.

Now the thing is, a few different companies have these year-data. It;s like this:

gvkeyyearshropt
10519922.46
10519922.3.
1051992.86
1051993..
10519932.30
10519933.58.9
2581992.3.5
25819923.50
258199368.9
2581994..
258199402.3

If only group by year, the sas would return me average values for all the company in each year. What if I want average value of each company in each year?

In the case above, the output would be like:

gvkeyyearnewvar
1051992
1051993
2581992
2581993
2581994

Any ideas? Thank you.

BTW, sometimes sql does lots of things with relatively simple codes! A good idea to use sql in this case!

Solution
‎08-12-2011 04:02 PM
Super Contributor
Posts: 578

Re: calculate a new variable using variables * cases

By the way...including null values like you're doing effectively sets the value for that row to zero and includes it.  That's why the normal average function excludes null values as they aren't technically the same as zero.

proc sql;

select

     gvkey,

      year,

      sum(shr*opt)/count(*) as NewVar

from work.detail_data

group by gvkey, year;

quit;

Contributor
Posts: 45

calculate a new variable using variables * cases

I thought it was something like this! I tried using & but it gave me error.

Thank you. It did give me what I want. Do you have any idea how to output the result in a new table?

Is it something like

proc sql;

create table work.newtable as select * from ........?

Thanks.

PROC Star
Posts: 7,362

Re: calculate a new variable using variables * cases

JolSAS,

Yes, sometimes SQL can do things that are difficult in a datastep.  Of course, sometimes one can do things in a datastep that are difficult if not impossible to do with SQL.  However, my bigger concern if I were you, is what are the two values you are multiplying.  With your chosen computational technique you may be throwing some babies out with the bath water.  Similarly, one of the values may be meant as a weight.  The forum doesn't know what the data represent.

PROC Star
Posts: 7,362

Re: calculate a new variable using variables * cases

Yes, you only need to insert the line

create table work.newtable as

before your select statement

If you don't want the output shown on your screen, change the proc sql; line to

proc sql noprint;

Contributor
Posts: 45

calculate a new variable using variables * cases

The shr is shares of options owned and the opt is value realized from option exercising.

Yes, shr is a weight and I'm trying to compute the weighted average over the years. The reason that one company, in one year, could have multiple lines, is because one company, in one year, have multiple executives. One executive could have no share of options, while another could have some. Therefore, there are numbers and missing values. As a matter of fact, I cannot be 100% sure on this either. I think the best I can do is to try both sum/count and the average function, and see which produces the better result. In fact, I just asked my professor for his opinion.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 764 views
  • 0 likes
  • 3 in conversation