BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
annisann
Calcite | Level 5
/* 
I have a dataset that contains summarized data on a group of people who received 
healthcare services. The data is summarized by several characteristics of the people,
such as race, gender, marital status, etc. In my example below, I have created a
fake dataset of the summary race data.
 
Race is a character variable with different values of race (race A, race B, etc.).
Population_total is the number of individuals (n)  in the particular race category. 
Service_min is the number of healthcare service minutes summed across all individuals in the race category.
Min_per_pop is the average number of service minutes per individual in the race category:
min_per_pop = service_min / population_total
 
What is the best way to determine whether there is a significant difference in service
minutes across the categories of race, using this summary data? 
*/
data summarydata;
length race $6 population_total 8 service_min 8 min_per_pop 8;
input race $ population_total service_min min_per_pop;
infile datalines dsd dlm='|' ;
datalines;
race_A|42188|94961594|2250.9148
race_B|13820|32049662|2319.0783
race_C|7062|9109865|1289.9837
race_D|350|516013|1474.3229
;
run;
/* I have tried a one-way ANOVA using both proc glm and proc anova per the following code, 
but they do not return any p-value or significance test results. The F value and p-value are blank.
*/
proc glm data=summarydata;
class race;
model min_per_pop = race;
run;
quit;

proc anova data=summarydata;
class race;
model min_per_pop = race;
run;
quit;
/* I have also tried proc logistic, using the counts instead, but it creates this error:
ERROR: No valid observations due either to missing values in the response,
       explanatory, frequency, or weight variable, or to nonpositive frequency or
       weight values.
*/
proc logistic data=summarydata;
class race;
model service_min/population_total =race;
run;
/* What am I doing wrong? Or what is a better way to test for significant differences?
 
I also know there is a macro %SUM_GLM that can be used for a one-way ANOVA on summary
data, but it requires the standard deviation, which I do not have. I only have the 3 numeric measures above.
*/
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

If you don't have a standard deviation of the minutes (or the raw data), then you cannot perform a statistical test.

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

If you don't have a standard deviation of the minutes (or the raw data), then you cannot perform a statistical test.

--
Paige Miller
Ksharp
Super User

You need MEANS statement of PROC GLM to do ANOVA h-test and LSMEANS statment  to "Test significant difference in mean across values of categorical varia".

I also noticed that there are only one obs for one race in your dataset,
You need include 'Population_total ' variable in PROC GLM via FREQ statement.


proc glm data=have ;
class race;
model min_per_pop = race;
means race / hovtest=levene(type=abs) tukey;
freq Population_total ;
quit;

And LSMEANS statement.

data have;
length race $6 population_total 8 service_min 8 min_per_pop 8;
input race $ population_total service_min min_per_pop;
infile datalines dsd dlm='|' ;
datalines;
race_A|42188|94961594|2250.9148
race_B|13820|32049662|2319.0783
race_C|7062|9109865|1289.9837
race_D|350|516013|1474.3229
;
run;
proc glm data=have ;
class race;
model min_per_pop  = race;
means race / hovtest=levene(type=abs) tukey;
lsmeans race/adjust=tukey;
freq Population_total ;
quit;

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 652 views
  • 0 likes
  • 3 in conversation