Fluorite | Level 6

Statistical Technique for Data Comparison

Hello every one there,

Hopeful you good!

My Business entity board has planned to make a comparison of two methods of Own-source revenue collection namely Traditional method and The new "Revenue Collection Information System" which has been implemented for three years now!

The objective is to analyze the contribution and effectiveness of the new "Revenue Collection Information System" by comparing the revenues collected for those three years of its implementation and other three years before its implementation.

The sources of Revenues are:

1: Receipt on sales                  (xxx units of currency)

2: Levies                                   (xxxxxx units of currency)

3: Rent                                     (xx units of currency)

4: Charges and fines                (xxx units of currency)

5: Registration fees                   (xxx units of currency)

6: License issuance fee           (xxxx units of currency)

e.t.c

Which statistical technique is best to compare the two ways/systems of revenue collection?

NB: THE REVENUE SOURCES MIGHT NOT BE THE SAME FOR THE THREE YEARS, i.e. ONE YEAR MIGHT HAVE MANY OR FEWER REVENUE SOURCES

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

Re: Statistical Technique for Data Comparison

If you are trying to look it up  or read about it, look for "two-sample two-sided t test."  The two samples are "before" vs "After."  You want two-sided because you are interested in whether The means are different, not whether one is larger than the other (which would be a "one-sided" analysis).

5 REPLIES 5
Diamond | Level 26

Re: Statistical Technique for Data Comparison

To do any statistical testing, you need an estimate of error, often derived from replication of the data collection. In this case the only replication is the three different years before the change, and the three different years after the change.

But these are not true replicates, as you state the revenue sources can change from year to year. In addition, there are many macro-economic factors that may have changed. Only you can decide how serious these issues are. It is possible that these are serious issues which will cause great variability in your data, and swamp any change in signal that you are trying to find.

Nevertheless, I suppose you could average the three year's totals before the change and compare this to the average of the three year's totals after the change, and do a t-test to compare the averages, or a non-parametric test to answer the same question. Again, I'm very skeptical that your data meets the conditions required that you have true replicates, and you should not go ahead and do this test without thinking hard about this. Do not go ahead and say this was recommended by some guy PaigeMiller in the SAS communities, because honestly I am not recommending it; and if you choose to go ahead, the responsibility is yours that this is a valid test, not my responsibility at all.

--
Paige Miller
Fluorite | Level 6

Re: Statistical Technique for Data Comparison

Thank you PaigeMiller, you've helped me for a great start.

You just recommended a t-test, which kind of t-test will be appropriate!

Many Thanks
Diamond | Level 26

Re: Statistical Technique for Data Comparison

I think there is only one t-test. @Rick_SAS has given you code.

--
Paige Miller
SAS Super FREQ

Re: Statistical Technique for Data Comparison

If you are trying to look it up  or read about it, look for "two-sample two-sided t test."  The two samples are "before" vs "After."  You want two-sided because you are interested in whether The means are different, not whether one is larger than the other (which would be a "one-sided" analysis).

SAS Super FREQ

Re: Statistical Technique for Data Comparison

From your description, this sounds like an analysis of variance (ANOVA) or a two-sample t-test in which you ask whether the mean quantities (sales, levies, rent,...) are significantly different for the "Before" and "After" periods. Here is an example:

``````data Have;
input Year Rent;
if Year < 2019 then Group="Traditional";
else                Group="New Method ";
label Rent="Rent (Millions)";
datalines;
2016  1.13
2017  1.22
2018  1.30
2019  1.35
2020  1.35
2021  1.45
;

proc glm data=Have;
class Group;
model Rent = Group;
run;
``````

Discussion stats
• 5 replies
• 537 views
• 6 likes
• 3 in conversation