I have to validate the contents of a great number of datasets (>1,000) against the source to ensure data quality. Proc compare seems to be a very effective tool in identifying defects but I am trying to streamline the output. I have been looking at the options for output data and I am not finding the solution I need. I am looking for the information "outstats" provides but I need to have a row if a character value is not matching. “Outstats” doesn’t print this information from what I have read.
Have you tried printall/listall options?
Can you post the code you say isn't working?
How are you capturing tables, using output statements or ODS tables?
Have you tried printall/listall options?
Can you post the code you say isn't working?
proc compare
base=data
compare=data2
outnoequal outbase outcomp outdif LISTALL PRINTALL novalues outstats=PC
out=PCD ;run;
How are you capturing tables, using output statements or ODS tables?
I am writing the data out to unix
I created an example of what I am looking at.
Data
Character_VAR | Number_VAR |
Y | 132 |
Y | 456 |
Y | 46 |
Y | 2 |
75 | |
Y | 75 |
Y | 42 |
Y | 8585 |
N | 21572 |
N | |
N | 42 |
N | 452 |
N | 42 |
N | 42 |
N | 42 |
Data2
Character_VAR | Number_VAR | Extra_VAR |
Y | 132 | 1 |
Y | 456 | 1 |
Y | 46 | 1 |
N | 13 | 1 |
75 | 1 | |
Y | 75 | 1 |
Y | 42 | 1 |
Y | 8585 | 1 |
N | 21572 | 1 |
N | 1 | |
N | 42 | 11 |
Y | 452 | 1 |
N | 145 | 0 |
N | 42 | 0 |
N | 1 | 0 |
Results:
PC
_VAR_ | _TYPE_ | _BASE_ | _COMP_ | _DIF_ | _PCTDIF_ |
Number_VAR | N | 14 | 14 | 14 | 14 |
Number_VAR | MEAN | 2257.5 | 2262.714286 | 5.214285714 | 49.82993197 |
Number_VAR | STD | 6001.006169 | 5999.004166 | 30.42058658 | 161.1040973 |
Number_VAR | MAX | 21572 | 21572 | 103 | 550 |
Number_VAR | MIN | 2 | 1 | -41 | -97.61904762 |
Number_VAR | STDERR | 1603.836361 | 1603.301304 | 8.130243749 | 43.05688112 |
Number_VAR | T | 1.407562551 | 1.411284504 | 0.641344328 | 1.157304725 |
Number_VAR | PROBT | 0.182720142 | 0.181643492 | 0.532443888 | 0.267970401 |
Number_VAR | NDIF | 3 | 20 | ||
Number_VAR | DIFMEANS | 0.230976111 | 0.230443841 | 5.214285714 | |
Number_VAR | R,RSQ | 0.999987203 | 0.999974406 |
PCD
_TYPE_ | _OBS_ | Character_VAR | Number_VAR |
BASE | 4 | Y | 2 |
COMPARE | 4 | N | 13 |
DIF | 4 | X | 11 |
BASE | 12 | N | 452 |
COMPARE | 12 | Y | 452 |
DIF | 12 | X | |
BASE | 13 | N | 42 |
COMPARE | 13 | N | 145 |
DIF | 13 | . | 103 |
BASE | 15 | N | 42 |
COMPARE | 15 | N | 1 |
DIF | 15 | . | -41 |
My goal is to run a query of the PC dataset from the outstats to obtain a list of variables that did not match. Keep in mind I have to do this 1,000 times and concat the results for reporting and sizing magnitude.
Is all you want a list of variables that doesn't match rather than the observations?
You can sort of get it from here, but its also in the log warning, which may be an easier place to catch it:
data class;
set sashelp.class;
if name="Alfred" then age=20;
if name="Jane" then weight=30;
run;
ods trace on;
ods table comparesummary=test1;
proc compare data=class compare=sashelp.class printall;
run;
ods trace off;
I see what you are saying but because I am repeating this effort on >1,000 tables I cannot read the log or reports on each table and was hoping there was a way to output the character mismatches in outstats. If I can could do that I can capture in by table which variables do not match and for how many times in a dataset that I can code around to report out.
I don't think you should physically read it, but you can output the log to a file and parse it out, basically its data.
The output from the code I suggest, in Test1 has the output displayed, so you can parse that, but I agree it's not ideal at all.
I wonder if SAS creates an automatic variable somewhere that has the variables listed in WARNING that you would be able to capture.
BTW thank you for the quick responses!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.