BookmarkSubscribeRSS Feed
eclipse0rah
Calcite | Level 5

I have to validate the contents of a great number of datasets (>1,000) against the source to ensure data quality.  Proc compare seems to be a very effective tool in identifying defects but I am trying to streamline the output.  I have been looking at the options for output data and I am not finding the solution I need.  I am looking for the information "outstats" provides but I need to have a row if a character value is not matching.  “Outstats” doesn’t print this information from what I have read.

6 REPLIES 6
Reeza
Super User

Have you tried printall/listall options?

Can you post the code you say isn't working?

How are you capturing tables, using output statements or ODS tables?

eclipse0rah
Calcite | Level 5

Have you tried printall/listall options?

  • PRINTALL seems to be specific to the report and not the outstats dataset
  • LISTALL doesnt seem to have an impact on my outstats dataset

Can you post the code you say isn't working?

proc compare

base=data

compare=data2

outnoequal outbase outcomp outdif LISTALL PRINTALL novalues outstats=PC

out=PCD ;run;

How are you capturing tables, using output statements or ODS tables?
I am writing the data out to unix

I created an example of what I am looking at.

Data

Character_VARNumber_VAR
Y132
Y456
Y46
Y2
75
Y75
Y42
Y8585
N21572
N
N42
N452
N42
N42
N42

Data2

Character_VARNumber_VARExtra_VAR
Y1321
Y4561
Y461
N131
751
Y751
Y421
Y85851
N215721
N1
N4211
Y4521
N1450
N420
N10

Results:

PC

_VAR__TYPE__BASE__COMP__DIF__PCTDIF_
Number_VARN14141414
Number_VARMEAN2257.52262.7142865.21428571449.82993197
Number_VARSTD6001.0061695999.00416630.42058658161.1040973
Number_VARMAX2157221572103550
Number_VARMIN21-41-97.61904762
Number_VARSTDERR1603.8363611603.3013048.13024374943.05688112
Number_VART1.4075625511.4112845040.6413443281.157304725
Number_VARPROBT0.1827201420.1816434920.5324438880.267970401
Number_VARNDIF320
Number_VARDIFMEANS0.2309761110.2304438415.214285714
Number_VARR,RSQ0.9999872030.999974406

PCD

_TYPE__OBS_Character_VARNumber_VAR
BASE4Y2
COMPARE4N13
DIF4X11
BASE12N452
COMPARE12Y452
DIF12X
BASE13N42
COMPARE13N145
DIF13.103
BASE15N42
COMPARE15N1
DIF15.-41

My goal is to run a query of the PC dataset from the outstats to obtain a list of variables that did not match.  Keep in mind I have to do this 1,000 times and concat the results for reporting and sizing magnitude.

Reeza
Super User

Is all you want a list of variables that doesn't match rather than the observations?

You can sort of get it from here, but its also in the log warning, which may be an easier place to catch it:

data class;

    set sashelp.class;

    if name="Alfred" then age=20;

    if name="Jane" then weight=30;

run;

ods trace on;

ods table comparesummary=test1;

proc compare data=class compare=sashelp.class printall;

run;

ods trace off;

eclipse0rah
Calcite | Level 5

I see what you are saying but because I am repeating this effort on >1,000 tables I cannot read the log or reports on each table and was hoping there was a way to output the character mismatches in outstats.  If I can could do that I can capture in by table which variables do not match and for how many times in a dataset that I can code around to report out.

Reeza
Super User

I don't think you should physically read it, but you can output the log to a file and parse it out, basically its data.

The output from the code I suggest, in Test1 has the output displayed, so you can parse that, but I agree it's not ideal at all. 

I wonder if SAS creates an automatic variable somewhere that has the variables listed in WARNING that you would be able to capture.

eclipse0rah
Calcite | Level 5

BTW thank you for the quick responses!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2074 views
  • 0 likes
  • 2 in conversation