Solved
Contributor
Posts: 24

# Summary statistics with missing and multiple values

[ Edited ]

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

Is there any expert that can help me with the code? Thanks in advance!

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).

Accepted Solutions
Solution
‎04-06-2018 03:12 PM
Super User
Posts: 23,773

## Re: Complicated Question

In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.

Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.

@dapenDaniel wrote:

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

Is there any expert that can help me with the code? Thanks in advance!

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).

All Replies
Solution
‎04-06-2018 03:12 PM
Super User
Posts: 23,773

## Re: Complicated Question

In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.

Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.

@dapenDaniel wrote:

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

Is there any expert that can help me with the code? Thanks in advance!

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).

Super User
Posts: 23,773

Contributor
Posts: 24

## Re: Complicated Question

Hi Reeza, Thank you for your kind help! It works very well!!

I still have two further questions.

1. What does type mean in my output?

2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?

The following below is my code.

proc means data = have noprint;
by ID Year;
var value;
output out = want mean= /autoname;
run;

Super User
Posts: 23,773

## Re: Complicated Question

@dapenDaniel wrote:

Hi Reeza, Thank you for your kind help! It works very well!!

I still have two further questions.

1. What does type mean in my output?

If you have BY variables SAS can calculate totals at different levels, ie for full data, for each unique ID, for each unique year and then for each unique year*ID. 0 will be the full data set and then 1 will be one of the variables (ID) and 2 will be the statistics by Year, 3 will be the cross types (year*ID). You can use the NWAY option on the PROC MEANS statement to keep only what you want.

@dapenDaniel wrote:

2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?

Missing values are not included in the mean calcuation. N is the statistics that reflects the count of non-missing NMISS is the number of missing and _FREQ_ which is automatically in the data set is the total number of observations (N+NMISS).

Contributor
Posts: 24

## Re: Complicated Question

Very clear and detailed. Thank you for your help!!

☑ This topic is solved.