Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).
Is there any expert that can help me with the code? Thanks in advance!
Goals:
1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.
2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.
3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).
In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.
Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.
@dapenDaniel wrote:
Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).
Is there any expert that can help me with the code? Thanks in advance!
Goals:
1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.
2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.
3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).
In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.
Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.
@dapenDaniel wrote:
Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).
Is there any expert that can help me with the code? Thanks in advance!
Goals:
1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.
2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values.
3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17).
My examples of how to find an average and save to a data set:
https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
Documentation and examples here:
Hi Reeza, Thank you for your kind help! It works very well!!
I still have two further questions.
1. What does type mean in my output?
2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?
The following below is my code.
proc means data = have noprint;
by ID Year;
var value;
output out = want mean= /autoname;
run;
@dapenDaniel wrote:
Hi Reeza, Thank you for your kind help! It works very well!!
I still have two further questions.
1. What does type mean in my output?
If you have BY variables SAS can calculate totals at different levels, ie for full data, for each unique ID, for each unique year and then for each unique year*ID. 0 will be the full data set and then 1 will be one of the variables (ID) and 2 will be the statistics by Year, 3 will be the cross types (year*ID). You can use the NWAY option on the PROC MEANS statement to keep only what you want.
@dapenDaniel wrote:
2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?
Missing values are not included in the mean calcuation. N is the statistics that reflects the count of non-missing NMISS is the number of missing and _FREQ_ which is automatically in the data set is the total number of observations (N+NMISS).
Very clear and detailed. Thank you for your help!!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.