01-24-2013 05:01 AM
I am totally new to statistics and I am currently playing with SAS and I am a bit overwhelmed by some of the theory.
I have been presented with a data set containing the following columns concerning investments for institutions:
Type of Institution | Region | [etc...] | Funding Year 1 | Funding Year 2 | Funding Year 3
Each row would contain a nameless record, specifying the type of institution and the funding for each year.
I have been requested to calculate the size of the institution according to the amount of funding. For example
My first tests involved ANOVA and TTests to check if there is difference between funding for each individual year based on different factors such as region, institution type and SIZE.
So when checking for the factor size, I created a calculated column to hold the size value (A, B or C) bases solely in the funding for the year I was testing.
I then noticed the same institution could fall into different sizes among the years, because the funding could vary between the years. Well that is no big deal for this first test because since I am testing each year individually, what counts what the size in that given year.
Now comes the tricky part, which I need some desperate help. I need investigate how the funding has changed across years (Y1-Y2, Y1-Y3, Y2-Y3) and how this has change based on the same factors as before (e.g.: region, type and size).
Very important and mandatory rule: I can't use two way anova! This exercise is not based in two or more factors, but two or more groups.
In order to get the difference between years, for each pair of years I want to test I created a new calculated column to hold the value of year A minus year B. So far so good -- this will be my dependent variable for each test.
This all works fine except for one bit. The SIZE. Because the size can differ between the years, I am not sure what would be the best way to come up with a size that would be ok. First I though, "let me sum the funding of both years and divide by two". Well, but when it comes to the Y1-Y3 pairing, shouldn't I do Y1+Y2+Y3 and then divide by 3?
Another tricky bit is that for some years, the value of the funding is missing. And that would mean that any calculation involving a null value will result in null as well. I am not sure if this would be the right way to go.
Well to sum up I am a total NOOB in stats and I really need some help.
Please please advise.
01-24-2013 06:00 AM
Thanks for that. I can upload the dataset as well if you wish. But here is an example of the observations in a neat table:
per Student Y1
per Student Y2
per Student Y3
|Size Y1||Size Y2||Size Y3|
per Student (Y3-Y1)
|Type B||North America||$2,092,497||11,884||$2,864,282||13,438||$2,286,248||14,095||$176.08||$213.15||$203.21||2||2||2||$27.14|
For all of my previous tests, I have submitted the funding per student for each year, and used the Size for that given year as the CLASS (the factor variable).
Just to recap, the size of the college is defined by the range of investment for that given year (e.g.: investment equal or over 3000000 equal size 1).
However for this next test, I had to create a new variable with the difference between Year 3 and Year 1, which is the column at the end of the table in purple. I also have to analyse this new column by year. The question lies in which year to use.
Thanks for trying to help me with this one.
All the best,