Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- COMPARING SIMILAR VARIABLES IN TWO DIFFERENT DATA SETS WITH DIFFERENT ...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-12-2018 03:44 AM
(577 views)

Hello our esteemed advisors,

I want to compare two data sets that have same variables but different IDs. I want to get a test of significance whether the variables have equal variances in both data sets. The test i desire to use include ttest or Mann Whitneys test. I have both continuous and categorical variables.

have tried Proc compare but since IDS are different, the procedure doesnt seem to work.

PROC COMPARE BASE=data1 COMPARE=data2 ALLSTATS MAXPRINT = (3,6);

id IDnum;

VAR x y z;

RUN;

I will be glad to get some advise.

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

First, don't code all in upper case, and use a code window - its the {i} above post area.

Second, post test data in the form of a datastep so that we can see what you are working with:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Third, if the data has no columns which match the other table, what is the logic to match them?

Fourth, describe the problem accurately, this setance for instance: "The test i desire to use include ttest or Mann Whitneys test. I have both continuous and categorical variables. " - makes no sense in terms of a proc compare. Proc compare merely compares to datasets, ttest and such like are statistical models on the data, something totally different.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The data sets are exactly the same, just split the main data into two sets one for model development and the second for model validation.

I desire to check whether there is difference in distribution of the variables after splitting the data. So the original data had 4800 observation and after splitting, data1 has 3200 and data2 two has 1600 observation.

For example checking whether the means a variable like body weight of the two data sets are the same etc.

Data new;

infile analysis;

input ID sex age weigh height;

datalines;

1 male 36 78 167

2 female 20 67 156

3 female 36 79 169

14 male 36 78 167

The data is in that format.

Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Rename it have the same ID variable name.

PROC COMPARE BASE=data1(rename=(data1_id=IDnum)) COMPARE=data2(rename=(data2_id=IDnum)) ALLSTATS MAXPRINT = (3,6);

id IDnum;

VAR x y z;

RUN;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The variables are already similar and the data sets have exactly the same variables.

I have one concern, I want to compare two the variables not in terms of data structure but in terms of descriptive statistics eg is mean of weight in data1 equal to mean of weight in data2?

In single data sets I can use PROC ttest to get the results , but in this case I want to compare the two data sets.

i will be glad to be advised if there is any procedure available.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @MUKASADAVID,

Recently I came across an article which might be applicable to what you're planning to do: https://blogs.worldbank.org/impactevaluations/should-we-require-balance-t-tests-baseline-observables.... The author discusses arguments for and against such tests and suggests an omnibus test of joint orthogonality as opposed to univariate comparisons. So, this might come down to PROC LOGISTIC or PROC PROBIT rather than (multiple runs of) PROC TTEST -- if you're still convinced that you need a significance test.

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.