BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JOLSAS
Quartz | Level 8

I have a data set looking like this:

yearshropt
19922.46
19922.3.
1992.86
1993.6.5
1993..
19932.30
19932.60
19933.58.9

I want to output a new table looking like this:

yearnew variable
1992
1993

where the new variable is the average of shr * opt in all the years.

e.g.    the new variable for 1992 equals (2.4 * 6 + 2.3 * . + . * 86)/3

   and the new variable for 1993 equals ( . * 6.5 + . * . + 2.3 * 0 + 2.6 * 0 + 3.5 * 8.9)/5

I want to output the new variable for each year.

Any idea how to do this?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
DBailey
Lapis Lazuli | Level 10

By the way...including null values like you're doing effectively sets the value for that row to zero and includes it.  That's why the normal average function excludes null values as they aren't technically the same as zero.

proc sql;

select

     gvkey,

      year,

      sum(shr*opt)/count(*) as NewVar

from work.detail_data

group by gvkey, year;

quit;

View solution in original post

12 REPLIES 12
art297
Opal | Level 21

There are definitely fancier ways of doing this but, hopefully, this will be more useful in showing you how such results are calculated.

data have;

  input year          shr          opt;

  cards;

1992          2.4          6

1992          2.3          .

1992          .          86

1993          .          6.5

1993          .          .

1993          2.3          0

1993          2.6          0

1993          3.5          8.9

;

run;

data temp (keep=year data);

  set have;

  if shr > 0 then do;

    data=shr;

    output;

  end;

  if opt > 0 then do;

    data=opt;

    output;

  end;

run;

proc means data=temp nway mean;

  var data;

  class year;

  output out=want (drop=_:)

    mean=average n=count;

run;

DBailey
Lapis Lazuli | Level 10

I believe this code is pretty straightforward:

data work.detail_data;

input

      Year

      shr

      opt;

datalines;

1992 2.4 6

1992 2.3 .

1992 . 86

1993 . 6.5

1993 2.3 0

1993 2.6 0

1993 3.5 8.9

;

run;

proc sql;

select

      year,

      avg(shr*opt) as NewVar

from work.detail_data

group by year;

quit;

It produces this output:

YearNewVar
199214.4
199310.38333
art297
Opal | Level 21

I obviously didn't notice the asterisks in the definition.  However, you should be able to modify that yourself and now have two methods and can choose depending upon your preferences.

But, now that I know you want to multiply the two values, what effect to you want missing values to have?  What if both values are missing?  I.e., should missing  and/or zero values be included, should they cancel out the results for the entire equation, and/or should they be used or dismissed in the numerator? .. in the denominator?

DBailey
Lapis Lazuli | Level 10

I think you're right.  The standard average function doesn't calculate the same way the OP's formula does.  To accomplish that, I think he would have to change my formula from avg(shr*opt) to sum(shr*opt)/count(*).

JOLSAS
Quartz | Level 8

In the example, ( . * 6.5 + . * . + 2.3 * 0 + 2.6 * 0 + 3.5 * 8.9)/5 would equal to (. + . + 0 + 0 + 3.5 * 8.9)/5 = 3.5 * 8.9 / 5 .

Does this make sense? Basically, if either of the two existing variables is missing, this makes the product of that line a missing value. But I think this wouldn't affect the entire equation. In the case of zeores, the product would be zero. But since I don't know for sure if the equation for each year contains how many missing values/zeroes/"good" values, I have to count all of them.

DBailey
Lapis Lazuli | Level 10

I think the normal SAS average function excludes null values from the denominator as well.  So, your calculation would end up being divided by 3 instead of 5.  If that's what you want, then you could use it.  If not, then you'll have to calculate it as sum(shr*opt)/count(*) which would equate to 3.5*8.9.5

JOLSAS
Quartz | Level 8

Thanks. Including null values is what I want. I ran your code, and it gave me very good results.

Now the thing is, a few different companies have these year-data. It;s like this:

gvkeyyearshropt
10519922.46
10519922.3.
1051992.86
1051993..
10519932.30
10519933.58.9
2581992.3.5
25819923.50
258199368.9
2581994..
258199402.3

If only group by year, the sas would return me average values for all the company in each year. What if I want average value of each company in each year?

In the case above, the output would be like:

gvkeyyearnewvar
1051992
1051993
2581992
2581993
2581994

Any ideas? Thank you.

BTW, sometimes sql does lots of things with relatively simple codes! A good idea to use sql in this case!

DBailey
Lapis Lazuli | Level 10

By the way...including null values like you're doing effectively sets the value for that row to zero and includes it.  That's why the normal average function excludes null values as they aren't technically the same as zero.

proc sql;

select

     gvkey,

      year,

      sum(shr*opt)/count(*) as NewVar

from work.detail_data

group by gvkey, year;

quit;

JOLSAS
Quartz | Level 8

I thought it was something like this! I tried using & but it gave me error.

Thank you. It did give me what I want. Do you have any idea how to output the result in a new table?

Is it something like

proc sql;

create table work.newtable as select * from ........?

Thanks.

art297
Opal | Level 21

JolSAS,

Yes, sometimes SQL can do things that are difficult in a datastep.  Of course, sometimes one can do things in a datastep that are difficult if not impossible to do with SQL.  However, my bigger concern if I were you, is what are the two values you are multiplying.  With your chosen computational technique you may be throwing some babies out with the bath water.  Similarly, one of the values may be meant as a weight.  The forum doesn't know what the data represent.

art297
Opal | Level 21

Yes, you only need to insert the line

create table work.newtable as

before your select statement

If you don't want the output shown on your screen, change the proc sql; line to

proc sql noprint;

JOLSAS
Quartz | Level 8

The shr is shares of options owned and the opt is value realized from option exercising.

Yes, shr is a weight and I'm trying to compute the weighted average over the years. The reason that one company, in one year, could have multiple lines, is because one company, in one year, have multiple executives. One executive could have no share of options, while another could have some. Therefore, there are numbers and missing values. As a matter of fact, I cannot be 100% sure on this either. I think the best I can do is to try both sum/count and the average function, and see which produces the better result. In fact, I just asked my professor for his opinion.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1804 views
  • 0 likes
  • 3 in conversation