I have a large dataset with test scores of individual students in different SCHOOL labeled GIRL as 0 or 1. I need to calculate the mean difference between boys and girls for each school using hedge's g (a fancier version of cohen's d), as well as the standard error and degrees of freedom of the pooled variance.
The calculations for those aren't necessarily the problem. But I cannot get SAS to calculate them for each of the schools.
PROC SORT DATA=exam; BY school GIRL; RUN; PROC MEANS DATA=exam MEAN STD N; ODS OUTPUT SUMMARY=SUM; BY school GIRL; VAR IQV; RUN;
I then transposed it such that each row represents 1 school with separate columns for boys and girls. The following formulas should be correct but it does not work:
/* COHEN HEDGES */ PROC IML; USE newdataset; READ ALL VAR {Ngirl IQgirl Stdgirl Nboy IQboy Stdboy}; CLOSE newdataset; N1=Ngirl; N2=Nboy; MU1=IQgirl; MU2=IQboy; S1=Stdgirl; S2=Stdboy; /*COHEN D*/ SIG_NUM=(N1-1)*(S1)*2+(N2-1)*(S2)*2; DF=N1+N2-2; SIG=SQRT(SIG_NUM/DF); D=(MU1-MU2)/SIG; VARD = (N1+N2)/(N1*N2)+ (D)**2/(2*(N1+N2)); /*HEDGES G*/ J=1-4/(4*DF-1); G=J*D; PRINT D[L='Cohen'] VARD[L='VAR D'] G[L='Hedges']; RUN;How can I get this last bit to work and how do I save it into its own dataset? Or should I take a very different approach entirely? (I am very new to SAS, so I don't know). I use the online SAS studio
But I cannot get SAS to calculate them for each of the schools.
The following formulas should be correct but it does not work:
Saying something doesn't work isn't very informative.
What does happen? Are you getting errors? If so, provide details. Are the results wrong? If so, provide details.
What debugging have you done?
There's little we can do to help you without sample data, and more explanation. Please provide sample data as working SAS data step code (examples and instructions). Do not provide data in other forms. People generally ignore the request to provide data in this specific form; do NOT ignore the request to provide the data in this specific form.
One thing is you really should share how you transpose the data. There just enough options that we can't be sure what you actually did. Since the output data set from your Proc means contains variables School, Girl Iqv_mean, Iqv_StdDev and Iqv_N how you get to Ngirl Nboy IQgirl IQboy Stdgirl and Stdboy may have some impact on the process.
Did you verify before going down this road that ALL schools had at least 2 boys and at least 2 girls? If you don't have any of one of the genders then you have an issue with the calculations because of missing values. (N1*N2 is going to missing if one of the N is missing and the division involving that will result in missing )
If you have only one boy or girl the StdDev will be 0 for that school/gender combination and may cause issues.
If you have exactly one boy and one girl per school you are going to get a 0 for degrees of freedom which will cause a divide by 0 exception.
Doesn't work is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the "<>" to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.
Why not post it at IML fourm.
https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/bd-p/sas_iml
And calling @Rick_SAS
> how do I save it into its own dataset
You can use the CREATE and APPEND statements in PROC IML to write data to a SAS data set. See
Writing data from a matrix to a SAS data set - The DO Loop
If you specify what is wrong with the code (and also supply example data), we can assist on your other issues.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.