- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi SAS experts:
I hope you can help me with this.
I have the following situation:
Table_1
Subjects | Var_1 | Var_2 | … | Var_n |
S_1 | ||||
S_2 | ||||
… | ||||
S_n |
Table_2
Subjects | Var_1 | Var_2 | … | Var_n |
S_1 | ||||
S_2 | ||||
… | ||||
S_n |
The Subjects (S_1... S_n) are the same in both tables, and Var_1, Var_2,… Var_n are the same in both tables (for example they can the same questions that Subjects S_1, S_2… S_n respond the same way or differently (YES or NO) within different reporting time points (Table_1 = time point 1, Tabel_2 = time point 2 etc.)).
Is there any efficient way to compare these two tables (time points) and to find the variables (questions in the example) with mismatch in subjects responses?
I know how to do this when there are limited number of Subjects and Variables (using proc sql) but my way becomes very inefficient if I have, let’s say 150 subjects and over 200 Variable…
Thank you very much for your help.
Batta
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use PROC COMPARE to do this:
proc compare base=table_1 compare=table_2;
id subjects;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use PROC COMPARE to do this:
proc compare base=table_1 compare=table_2;
id subjects;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!!!
I even didn't know about this procedure.
Very useful!
Kind regards!
Batta
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is an example that uses Proc compare with a couple of very small data sets that you should be able to run and see the results. The two data steps create data sets to use, one with some differences and then proc compare with options that I believe fit your wants.
data work.class1; set sashelp.class; run; /* make a data set with a few differences */ data work.class2; set sashelp.class; if name='Alice' then age=15; if name='James' then height= 60; if name='Louise' then sex=.; run; proc compare base=work.class1 compare=work.class2 noprint out=work.compareresult outdif outnoequal; ; id name; run;
By default Proc Compare is extremely verbose with some summary tables and then details for each variable/observation difference. The NOPRINT option suppresses the output to the result window with that text. The OUT= option names a data set to write results. Outdif is looking to show differences, by itself for numeric values it shows the actual numeric difference or 0 when compared or a X for character values that differ for every observation and variable. Adding the OUTNOEQUAL removes any observations where all of the values are equal. That also changes the output a bit to show E for the values that are the same for numeric. Character values will show a missing result of . for equal or and X for a difference. All variables will be shown.
Additional information of _type_ shows that the result is a DIF comparison and _OBS_ the observation in the set.
The E is actually a "special missing" not a letter and is a small topic all its own.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. I'll try this.
Although proc compare seem OK procedure, it does not create working table.
I can export it but it's difficult to use those result for any further analysis.
Thank you very much for your effort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Batta wrote:
Thank you. I'll try this.
Although proc compare seem OK procedure, it does not create working table.
I can export it but it's difficult to use those result for any further analysis.
Thank you very much for your effort.
What analysis do you want?
As shown proc compare will create data set output, just a bit complex to figure out all of the answers. Run proc freq on the output data set in my example and you get counts of non-missing values pretty easily and what the difference is for each numeric variable.
There is also and OUTSTATS option you can explore.