Hello our esteemed advisors,
I want to compare two data sets that have same variables but different IDs. I want to get a test of significance whether the variables have equal variances in both data sets. The test i desire to use include ttest or Mann Whitneys test. I have both continuous and categorical variables.
have tried Proc compare but since IDS are different, the procedure doesnt seem to work.
PROC COMPARE BASE=data1 COMPARE=data2 ALLSTATS MAXPRINT = (3,6);
id IDnum;
VAR x y z;
RUN;
I will be glad to get some advise.
First, don't code all in upper case, and use a code window - its the {i} above post area.
Second, post test data in the form of a datastep so that we can see what you are working with:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Third, if the data has no columns which match the other table, what is the logic to match them?
Fourth, describe the problem accurately, this setance for instance: "The test i desire to use include ttest or Mann Whitneys test. I have both continuous and categorical variables. " - makes no sense in terms of a proc compare. Proc compare merely compares to datasets, ttest and such like are statistical models on the data, something totally different.
The data sets are exactly the same, just split the main data into two sets one for model development and the second for model validation.
I desire to check whether there is difference in distribution of the variables after splitting the data. So the original data had 4800 observation and after splitting, data1 has 3200 and data2 two has 1600 observation.
For example checking whether the means a variable like body weight of the two data sets are the same etc.
Data new;
infile analysis;
input ID sex age weigh height;
datalines;
1 male 36 78 167
2 female 20 67 156
3 female 36 79 169
14 male 36 78 167
The data is in that format.
Thanks
Rename it have the same ID variable name.
PROC COMPARE BASE=data1(rename=(data1_id=IDnum)) COMPARE=data2(rename=(data2_id=IDnum)) ALLSTATS MAXPRINT = (3,6);
id IDnum;
VAR x y z;
RUN;
The variables are already similar and the data sets have exactly the same variables.
I have one concern, I want to compare two the variables not in terms of data structure but in terms of descriptive statistics eg is mean of weight in data1 equal to mean of weight in data2?
In single data sets I can use PROC ttest to get the results , but in this case I want to compare the two data sets.
i will be glad to be advised if there is any procedure available.
Hi @MUKASADAVID,
Recently I came across an article which might be applicable to what you're planning to do: https://blogs.worldbank.org/impactevaluations/should-we-require-balance-t-tests-baseline-observables.... The author discusses arguments for and against such tests and suggests an omnibus test of joint orthogonality as opposed to univariate comparisons. So, this might come down to PROC LOGISTIC or PROC PROBIT rather than (multiple runs of) PROC TTEST -- if you're still convinced that you need a significance test.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.