Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- How can I compare variables from different tables?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-27-2019 09:15 AM
(1214 views)

Hi,

I would like to compare the sociodemographic characteristics (mean age, bmi, education distribution, etc.) between individuals included and excluded (due to aberrant/missing data) from my analysis.

I have a table "included" with around 43k individuals and a table "excluded" with around 5k individuals.

How should I proceed?

Thank you in advance,

Natasha Figueiredo

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

PROC COMPARE is not what is needed here. You aren't comparing line by line records, you're comparing statistical distributions and such. Usually the standard is to do t-tests and chi-square tests depending on the variable types.

It's usually easier if you include them in the same data set.

I would stack the data sets as follows:

```
data combined;
set included (in=A)
excluded (in=B);
length status $12.;
status='Included';
if B then status='Excluded';
run;
*tests if age is the same across the two groups;
proc ttest data=combined;
class included;
var age;
run;
```

Hope that helps get you started.

@natasha_nf wrote:

Hi,

I would like to compare the sociodemographic characteristics (mean age, bmi, education distribution, etc.) between individuals included and excluded (due to aberrant/missing data) from my analysis.

I have a table "included" with around 43k individuals and a table "excluded" with around 5k individuals.

How should I proceed?

Thank you in advance,

Natasha Figueiredo

7 REPLIES 7

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@natasha_nf see if this information assist you in how to prepare data for usage with proc compare.

https://www.sascrunch.com/proc-compare.html

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

PROC COMPARE is not what is needed here. You aren't comparing line by line records, you're comparing statistical distributions and such. Usually the standard is to do t-tests and chi-square tests depending on the variable types.

It's usually easier if you include them in the same data set.

I would stack the data sets as follows:

```
data combined;
set included (in=A)
excluded (in=B);
length status $12.;
status='Included';
if B then status='Excluded';
run;
*tests if age is the same across the two groups;
proc ttest data=combined;
class included;
var age;
run;
```

Hope that helps get you started.

@natasha_nf wrote:

Hi,

I would like to compare the sociodemographic characteristics (mean age, bmi, education distribution, etc.) between individuals included and excluded (due to aberrant/missing data) from my analysis.

I have a table "included" with around 43k individuals and a table "excluded" with around 5k individuals.

How should I proceed?

Thank you in advance,

Natasha Figueiredo

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Reeza!

Actually I've done something slightly different. I merged the original (with all included and excluded individuals) and the "excluded" tables and created a binary "included" variable.

data incexc;

merge original (in=a) excluded (in=b);

if a and not b then included='1';

else included='0';

run;

Then I performed all the t and x2 tests I needed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That's the exact same thing, you just used a different word that included/excluded.

I preferred the words since that goes as labels for output 😉

I preferred the words since that goes as labels for output 😉

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@natasha_nf wrote:

Thank you Reeza!

Actually I've done something slightly different. I merged the original (with all included and excluded individuals) and the "excluded" tables and created a binary "included" variable.

data incexc;

merge original (in=a) excluded (in=b);

if a and not b then included='1';

else included='0';

run;

Then I performed all the t and x2 tests I needed.

How could that MERGE work? You did not include a BY variable. So if ORIGINAL has 100 observations and EXCLUDED as 50 observations then the first 50 observations will be FROM both input datasets (A and B) and the last 50 will only be from the ORIGINAL dataset. Is your ORIGINAL dataset really sorted so that all of the EXCLUDED observations are first? Are they in the same order as the observations in EXCLUDED? If not the the values of the common variables will be overwritten with the values from EXCLUDED dataset.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Tom,

I must admit I don't really see what you mean, I'm sorry.

Anyway, I included the by ID variable and indeed, it changed the order the dataset is sorted. However, I did not affect my tests.

Thank you

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@natasha_nf wrote:

Hi Tom,

I must admit I don't really see what you mean, I'm sorry.

Anyway, I included the by ID variable and indeed, it changed the order the dataset is sorted. However, I did not affect my tests.

Thank you

To see the potential havoc of merging without a BY statement run this example and look at the the differences in the TEST1 and TEST2 datasets.

```
data original excluded ;
set sashelp.class ;
output original;
if sex='M' then output excluded;
run;
data test1;
merge original(in=in1) excluded(in=in2);
excluded=in2;
run;
data test2;
merge original(in=in1) excluded(in=in2);
by name;
excluded=in2;
run;
```

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.