turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Summary statistics with missing and multiple value...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-04-2018 06:34 PM - last edited on 04-04-2018 06:41 PM by Reeza

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

Is there any expert that can help me with the code? Thanks in advance!

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).

Accepted Solutions

Solution

04-06-2018
03:12 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dapenDaniel

04-04-2018 06:38 PM

In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.

Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.

@dapenDaniel wrote:

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

Is there any expert that can help me with the code? Thanks in advance!

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).

All Replies

Solution

04-06-2018
03:12 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dapenDaniel

04-04-2018 06:38 PM

In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.

Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.

@dapenDaniel wrote:

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

Is there any expert that can help me with the code? Thanks in advance!

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dapenDaniel

04-04-2018 06:41 PM

My examples of how to find an average and save to a data set:

https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas

Documentation and examples here:

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

04-04-2018 08:37 PM

Hi Reeza, Thank you for your kind help! It works very well!!

I still have two further questions.

1. What does type mean in my output?

2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?

The following below is my code.

proc means data = have noprint;

by ID Year;

var value;

output out = want mean= /autoname;

run;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dapenDaniel

04-05-2018 10:52 AM

@dapenDaniel wrote:

Hi Reeza, Thank you for your kind help! It works very well!!

I still have two further questions.

1. What does type mean in my output?

If you have BY variables SAS can calculate totals at different levels, ie for full data, for each unique ID, for each unique year and then for each unique year*ID. 0 will be the full data set and then 1 will be one of the variables (ID) and 2 will be the statistics by Year, 3 will be the cross types (year*ID). You can use the NWAY option on the PROC MEANS statement to keep only what you want.

@dapenDaniel wrote:

2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?

Missing values are not included in the mean calcuation. N is the statistics that reflects the count of non-missing NMISS is the number of missing and _FREQ_ which is automatically in the data set is the total number of observations (N+NMISS).

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

04-06-2018 03:11 PM

Very clear and detailed. Thank you for your help!!