Here is what I have to do:
Using the “schools1” data, create a new dataset called "schools2" which includes
average and median sat scores for sat1, sat2, sat3, and overall satavg with school as the
level of analysis. If you do this correctly, there should be 30 observations and 11
variables (including 8 different SAT variables) in your resulting dataset (use PROC
MEANS to do this).
So I think I have to basically get rid of the student variable and just have one row per school, but im really not sure how to do that using proc means, I've been trying for a couple hours but don't even really know where to start.
I added some of the code to make the data and a picture of the schools1 dataset.
Here are a couple of hints though the solution to this should be typically 5 lines of code counting a Run; statement.
If you don't mention a variable in most procedures it does not affect what you are doing at all. Ignored.
If you want to get analysis grouped BY some variable(s) use a BY statement.
If you have access to the SAS online documentation for Proc Means you should find examples.
Since this is pretty obviously homework I don't like to give answers. If show what you have tried I will make comments.
And here is something else related to your calculation of the SATAVG variable.
satavg = (sat1+sat2+sat3)/3;
Will return a missing value for any student that does not have all three SAT score values. The + operator returns a missing value when one or more of the variables is missing. But you can still calculate a Mean for the available SAT scores
satavg = mean(sat1,sat2,sat3);
Since SAS is intended for statistical analysis there are a number of similar functions to manipulate a group of variables such as Max, Min (guess what they do), N, std, and Sum. These functions will return the appropriate result even if one or more values is missing.
Was there a requirement to use some sort of Macro? Generally a question of this type indicates a skill level that should likely not be using Macro coding. Not an insult but the purpose of macro code in SAS is to generate code and if you do not know what you need to generate macros just make things more difficult.
Thanks for the reply, from the picture you can see I had already sorted the data by school, but I'm not really sure how to get each schools data into one row.
I didnt consider that my satavg could have blank outputs but it looks like none of the sat scores were missing so it was fine.
One of the questions did ask for a macro, but it seemed unnecessary unless I just did it wrong, this was the question:
Create a SAS macro called "describe" to calculate average sat3, draw a graph of the
distribution of the variable sat1, and draw a graph that shows the relationship between
the variables sat1 and sat2 - for the “schools” data.
As @ballardw said, you need to look for examples using PROC MEANS.
There is absolutely no reason for a macro here.
Data schools2; set schools1; satavg1 = mean(sat1); satavg2 = mean(sat2); satavg3 = mean(sat3); satmed1 = median(sat1); satmed2 = median(sat2); satmed3 = median(sat3); satmed = median(sat1,sat2,sat3); run; PROC SORT data=schools2; BY school; run; PROC MEANS data=schools2; BY school; VAR school -- satavg; OUTPUT out=schools2; run;
This is what I ended up doing there probably was an easier way but this seems to work.
The way I read your requirement in the attachment I didn't see a requirement of a median per student but perhaps you have more information than was posted.
Plus if you look you did not actually calculate much with the Median(sat1) or Mean(sat1) statements because the median or median of a single value is the value. So you don't need or want the Satavg1 or Satmed1 and similar variables at all.
You are close but the output data set doesn't include the medians With proc means you can restrict the statistics calculated in the data by adding the statistics to the OUTPUT statement such as Mean= Median = , which provides the requested statistics for each variable on the VAR statement and can request specific statistics for only some variables, but will require you provide output variable names. SAS makes the last easy by using the option /AUTONAME which will append the statistic as a suffix to the variable.
I would suggest placing just SAT1 SAT2 SAT3 and SATAVG on the var statement using the School1 data set.
The instructor did give you a hint as to how many variables you should have.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.
