☑ This topic is solved.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 08-18-2024 06:12 PM
(1094 views)
/*
I have a dataset that contains summarized data on a group of people who received
healthcare services. The data is summarized by several characteristics of the people,
such as race, gender, marital status, etc. In my example below, I have created a
fake dataset of the summary race data.
Race is a character variable with different values of race (race A, race B, etc.).
Population_total is the number of individuals (n) in the particular race category.
Service_min is the number of healthcare service minutes summed across all individuals in the race category.
Min_per_pop is the average number of service minutes per individual in the race category:
min_per_pop = service_min / population_total
What is the best way to determine whether there is a significant difference in service
minutes across the categories of race, using this summary data?
*/
data summarydata;
length race $6 population_total 8 service_min 8 min_per_pop 8;
input race $ population_total service_min min_per_pop;
infile datalines dsd dlm='|' ;
datalines;
race_A|42188|94961594|2250.9148
race_B|13820|32049662|2319.0783
race_C|7062|9109865|1289.9837
race_D|350|516013|1474.3229
;
run;
/* I have tried a one-way ANOVA using both proc glm and proc anova per the following code,
but they do not return any p-value or significance test results. The F value and p-value are blank.
*/
proc glm data=summarydata;
class race;
model min_per_pop = race;
run;
quit;
proc anova data=summarydata;
class race;
model min_per_pop = race;
run;
quit;
/* I have also tried proc logistic, using the counts instead, but it creates this error:
ERROR: No valid observations due either to missing values in the response,
explanatory, frequency, or weight variable, or to nonpositive frequency or
weight values.
*/
proc logistic data=summarydata;
class race;
model service_min/population_total =race;
run;
/* What am I doing wrong? Or what is a better way to test for significant differences?
I also know there is a macro %SUM_GLM that can be used for a one-way ANOVA on summary
data, but it requires the standard deviation, which I do not have. I only have the 3 numeric measures above.
*/
1 ACCEPTED SOLUTION
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you don't have a standard deviation of the minutes (or the raw data), then you cannot perform a statistical test.
--
Paige Miller
Paige Miller
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you don't have a standard deviation of the minutes (or the raw data), then you cannot perform a statistical test.
--
Paige Miller
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need MEANS statement of PROC GLM to do ANOVA h-test and LSMEANS statment to "Test significant difference in mean across values of categorical varia".
I also noticed that there are only one obs for one race in your dataset,
You need include 'Population_total ' variable in PROC GLM via FREQ statement.
proc glm data=have ;
class race;
model min_per_pop = race;
means race / hovtest=levene(type=abs) tukey;
freq Population_total ;
quit;
And LSMEANS statement.
data have;
length race $6 population_total 8 service_min 8 min_per_pop 8;
input race $ population_total service_min min_per_pop;
infile datalines dsd dlm='|' ;
datalines;
race_A|42188|94961594|2250.9148
race_B|13820|32049662|2319.0783
race_C|7062|9109865|1289.9837
race_D|350|516013|1474.3229
;
run;
proc glm data=have ;
class race;
model min_per_pop = race;
means race / hovtest=levene(type=abs) tukey;
lsmeans race/adjust=tukey;
freq Population_total ;
quit;