Excuse my ignorance I’m new to SAS. Anyone have a suggestion on how to analyze
the following. The data consists of two data sets one being a control the other
representing implementation. A program was implemented to potentially increase attendance
in local schools. So each data set represents a list of schools and the attendance rate for each school. I took a
categorical approach to the data and used proc freq including the weight
command in order to generate a chi-sqre. I have a feeling my approach is incorrect
In a pedantic scholarly mode I would say that the analysis plan, the types of tests to be conducted with the data should have been decided upon before the data was collected.
The main thing to consider is what are you looking to compare? If it is a rate, then t-test for the mean rate between groups could be likely.
Chi-square would tell you if the distribution of responses was similar. Which works much better with categories than the almost certain different rate for each school.
I would recommend structuring your data into a single data set with a variable to indicate source, control or test, and start with tests of normality to see if t-test or other approach is needed, likely if the sample number of schools is small.
Basic way to combine the data:
data combined;
set
ControlData (in=incontrol)
TestData
;
if incontrol then source='Control';
else source='Test';
run;
The Source variable could then be used as a grouping or class variable in many procedures.
I hope that the weight variable is the basic number of enrolled children in the school.
If the schools represent different populations, such as elementary, middle / junior high, high school it might be helpful to include that as a category as you may different results between the grades.
In a pedantic scholarly mode I would say that the analysis plan, the types of tests to be conducted with the data should have been decided upon before the data was collected.
The main thing to consider is what are you looking to compare? If it is a rate, then t-test for the mean rate between groups could be likely.
Chi-square would tell you if the distribution of responses was similar. Which works much better with categories than the almost certain different rate for each school.
I would recommend structuring your data into a single data set with a variable to indicate source, control or test, and start with tests of normality to see if t-test or other approach is needed, likely if the sample number of schools is small.
Basic way to combine the data:
data combined;
set
ControlData (in=incontrol)
TestData
;
if incontrol then source='Control';
else source='Test';
run;
The Source variable could then be used as a grouping or class variable in many procedures.
I hope that the weight variable is the basic number of enrolled children in the school.
If the schools represent different populations, such as elementary, middle / junior high, high school it might be helpful to include that as a category as you may different results between the grades.
Thank you ballardw for your response. The analysis plan was decided prior to collection but I am an intern and am detached from that process. I found myself questioning the proposed approach and will just leave it at that. Thank you again for your help!
It is not uncommon that something in the results from the original plan raises questions. Sometimes they are additional interesting results, sometimes they are flaws in the data collection. So don't be afraid to experiment, but start with the plan.
Other ways to slice the data for comparisons could be by some category of school size (total enrollment), urban/suburban/rural locations, if you can get a good poverty index (School Free and Reduced Lunch participation rates may be available), ethic make-up.
Of course slicing the data more ways simultaneously requires more sample so may not be practical.
Thank you again ballardw!
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.