BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

 

Is there any expert that can help me with the code? Thanks in advance!

 

 

 

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values. 

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17). 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.

 

Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.

 


@dapenDaniel wrote:

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

 

Is there any expert that can help me with the code? Thanks in advance!

 

 

 

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values. 

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17). 

 

 


 

View solution in original post

5 REPLIES 5
Reeza
Super User

In SAS if you take the average of a single value, its the same as the value. If a value is missing then it excludes it. And if there are multiples they're included.

 

Point being, I think you can just take a straight average by ID and year is all you need. Give that a try and let us know if it works.

 


@dapenDaniel wrote:

Sorry I do not know what code I need to use. The following below are my goals. I attached the original data (Column A,B, and C) and the expected data (Column G, H and I).

 

Is there any expert that can help me with the code? Thanks in advance!

 

 

 

Goals:

1. If a specific ID has only one value in a certain year, Keep this value. For example, there is only one value for ID=975 in 1995.

2. If a specific ID has two or more values in a certain year, new value should be the average of these values. For example, there are two values for ID=975 in 1997. The new value for 1997 will be the average of two old values. 

3. If one of the value is missing, Delete it. For example, C18 is missing. So new value for ID=51F in 1996 should be the average of the previous three values (C15, C16, and C17). 

 

 


 

dapenDaniel
Obsidian | Level 7

Hi Reeza, Thank you for your kind help! It works very well!!

 

I still have two further questions.

 

1. What does type mean in my output?

 

2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?

 

 

The following below is my code.

 

proc means data = have noprint;
by ID Year;
var value;
output out = want mean= /autoname;
run;

 

 

Reeza
Super User

@dapenDaniel wrote:

Hi Reeza, Thank you for your kind help! It works very well!!

 

I still have two further questions.

 

1. What does type mean in my output?

 

 

 


If you have BY variables SAS can calculate totals at different levels, ie for full data, for each unique ID, for each unique year and then for each unique year*ID. 0 will be the full data set and then 1 will be one of the variables (ID) and 2 will be the statistics by Year, 3 will be the cross types (year*ID). You can use the NWAY option on the PROC MEANS statement to keep only what you want. 

 


@dapenDaniel wrote:

 

 

2. Take ID=51F as an example. There are 4 observations for 1996 in original data. The frequency is 4 but the mean value is for the previous 3. When SAS calculate the frequency, missing value is included. When SAS calculate the mean, missing value is not included. Is that right?

 

 


Missing values are not included in the mean calcuation. N is the statistics that reflects the count of non-missing NMISS is the number of missing and _FREQ_ which is automatically in the data set is the total number of observations (N+NMISS).

dapenDaniel
Obsidian | Level 7

Very clear and detailed. Thank you for your help!!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1598 views
  • 0 likes
  • 2 in conversation