Hello,
I am a new SAS user and am trying to complete a school project. I have spent many hours trying but it's time to reach out for help.
I would hugely appreciate some help with the CODE to enter the unbalanced data using INPUT and DATALINES. I also please need help with the CODE to produce the ANOVA output for Two by Three factorial ANOVA using GLM for the assessment of differences in means between subgroups, inc interaction effect.
I have FOR EXAMPLE
2 genders (male, female) and 3 groups (Socioeconomic status: low , Med, High)
It is a small data set - males have 10 observations across the three groups
Females have a total of 14 observations across the three groups.
The reason it requires unbalanced data entry:
For the males Groups 2 and 3 have one less observation than group 1
For the females group 2 has one less data observation than groups 1 and 3
Below I have attached some of my many failed attempts so you can all see that I have at least tried.
Thanking you so much in advance for any help.
DATA Chocolate; *Tells SAS that we would like to call the new data set chocolate and save it in the working directory; INPUT Flavour $ Sweetness @@; *Tells SAS that we plan to input data with the a flavour and sweetness variable; DATALINES; O 8 O 7 O 5 O 6 O 6 O 4 O 7 O 7 O 6 O 6 M 8 M 7 M 3 M 8 M 6 M 4 M 6 M 5 M 5 M 6 M 7 M 3 M 7 M 3 ; *actually ives SAS the raw data to use; PROC PRINT DATA = Chocolate ; RUN; *to view the data set that was created; ***OR*** SO FAR THE BEST CHOICE ; LIBNAME Q2 'C:\Users\Dell\Documents\ASS270003'; DATA Q2.Chocolate; INPUT Flavour $ SweetnessLevel1-SweetnessLevel3 @@; DATALINES ; Orange 8 7 5 Orange 6 6 4 Orange 7 7 6 Orange 6 . . Mint 8 7 3 Mint 8 6 4 Mint 6 5 5 Mint 6 7 3 Mint 7 . 3 ; PROC PRINT data = Q2.Chocolate; RUN; ***OR...***; LIBNAME Q2 'C:\Users\Dell\Documents\ASS270003'; DATA Q2.Chocolate2; INPUT O $ M $ Response @; DATALINES ; 1 8 2 7 O 5 O 6 O 6 O 4 O 7 O 7 O 6 O 6 M 8 M 7 M 3 M 8 M 6 M 4 M 6 M 5 M 5 M 6 M 7 M 3 M 7 M 3 ; LIBNAME Q2 'C:\Users\Dell\Documents\ASS270003'; DATA Q2.Chocolate3; Do Flavour = 1 to 2; do SweetLevel = 1 to 3; Input SweetRating @@; Output; End; end; datalines; 8 7 5 6 6 4 7 7 6 6 8 7 3 8 6 4 6 5 5 6 7 3 7 3 ; PROC PRINT data = Q2.Chocolate3; RUN; LIBNAME Q2 'C:\Users\Dell\Documents\ASS270003'; DATA Q2.Chocolate4; INPUT Flavour $ SweetLevel SweetRate @@; CARDS; O 1 8 O 1 6 O 1 7 O 1 6 O 2 7 O 2 6 O 2 7 O 3 5 O 3 4 O 3 6 M 1 8 M 1 8 M 1 6 M 1 6 M 1 7 M 2 7 M 2 6 M 2 5 M 2 7 M 3 3 M 3 4 M 3 5 M 3 3 M 3 3 ; PROC FORMAT; *to improve viewability the variable flavour was formatted to have proper names; Value $Flavour 'O' = 'Orange' 'M' = 'Mint' ; PROC PRINT DATA = Q2.Chocolate4; Format Flavour $Flavour.; run; PROC GLM DATA= Q2.Chocolate4; Class Flavour SweetLevel Model SweetRate = Flavour|SweetLevel / SS3 SOLUTION; MEANS Flavour|SweetLevel / BON; LSMEANS Flavour*SweetLevel / SLICE=Flavour SLICE=SweetLevel ADJUST=BON;; ESTIMATE 'mu' intercept 1; run;
You talk about males and females, and also socioeconomic status, but I don't see male/female in your code, nor do I see socioeconomic status in your code. So I'm lost ... your problem description and your code do not overlap, one does not enlighten the other.
Yes I thought they would be easier for an example.
It's an assignment so I didn't want it linked to google too obviously 😉
The code is there so anyone willing to help is able to see I've made the effort to try.
So, no gender, no socioeconomic status. What is the objective of the code in the example?
@LearningHard wrote:
Please read the post properly before trying to belittle me.
Thank you for your time.
Impressive. You want help and refuse to provide data and description matching each other.
It is a school assignment - I can't just post their questions online, that would be a breach of the school efficacy I agreed to and could result in dismissal of my enrollment.
The example is perfectly applicable to the problem I am trying to solve. Within my initial post I said very clearly that the research I used was an EXAMPLE.
I have a two (groups) by three (sub groups) factorial ANOVA with interaction to be tested. The groups are unbalanced. I need help with the code to enter the variables via INPUT DATALINES, and then to produce GLM equivalent to a b/w groups ANOVA.
This should include testing of assumptions, which for Levene's I don't know how to do due to the unbalanced groups issue.
What I find 'impressive' is that everyone is willing to pick at me without attempting to help.
If you are indeed good enough at SAS to help, then you should not require a completely non functioning code to answer 'how to input unbalanced data with 2x3 groups' and 'how to produce GLM code including homog testing for unbalanced groups'.
As I also stated already - I simply put the code there to demonstrate I had tried and not just slacked off before asking for help. Shame on you all, really, really upsetting experience for a struggling Masters student to have to tolerate this. If you don't have something nice to say don't comment at all.
Hi @LearningHard,
I can appreciate that you don't want to share the exact wording of the problem. Without specifics, and especially given the nuances of statistical methods and how to apply them to data for a given problem, it's difficult for community members to provide code solutions.
Maybe it's best to try to learn from/adapt an example from the SAS documentation. See this example about unbalanced data (2x2) for ANOVA. Might not apply directly, but the documentation is a good start. Also, there are many conference papers that might contain some guidance. Search on support.sas.com and refine your search to the Papers and Proceedings:
Hi Chris,
Thanks so much for the constructive response - believe it or not that is the exact code the teacher used as an example in class. The problem is I couldn't get my head around using it for 2x3.
So in this example there is A1 and A2 plus B1 and B2, where I would need A1,A2 and A3 etc.
For my situation what I also found confusing with that formula was that the output looks like the cells for B1 and A1 are somehow related, but for my sample I have two independent groups with the three sub categories.
I have (for now) imported the data using infile to the csv file (not what the teacher wants) to at least try the GLM, and that seems to be working, but with all my make-shift data entry techniques GLM will not work. So it seems it's also rather particular about how you enter the data.
You wouldn't happen to have an idea on how were supposed to test for homog of variance by any chance? ANOVA won't report levenes for unbalanced 😞 I've spent the whole day on google, but sadly most the reports avail on line are far beyond my capabilities.
I can't help much with the stat questions. For the input using DATALINES, you could try the trick we tell folks to use for sharing data on the community. Once you have your data set in SAS (as you created via CSV), use these steps to create a DATA step program that includes code and DATALINES that anyone can run.
One more resource for statistical tests. If you know the stat or test you want, reference this SAS note (frequently asked-for statistics) to see which SAS methods provide it. From that, I see Levene's is covered in this note. But I am not qualified to advise on whether this can work with your data or is appropriate for your problem.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.