BookmarkSubscribeRSS Feed
crk419
Calcite | Level 5

Hello!

 

I did a proc compare statement to find out the percent difference between a variable in CB_DB3 and CBHSOB, to try to validate the new data against data from our database. There are many comparisons that are less than a 1% difference, due to rounding errors, so I want to change those to 0's. I have attached a screenshot of the output I am referring to.

 

I am trying to figure out a way to say "if -1 < * < 1 then *=0". Basically, I want to look at every observation in the dataset and if the observation is between -1 and 1 then change it to 0, so that I can see which observations and variables I need to take a closer look at. 

 

I tried to figure out how to do a macro or a do loop to have it look at all variables, since I have over 100 variables and would like to try to avoid writing them all out. 

 

I have been researching this issue for hours, and the closest example I came across is below, however, I would still have to type out every 100+ variable name plus &variablename. Is there any better way of doing this?

 


%macro mac1(inc=,firm=); /* Small, useless macro for illustrative purposes */
%put in macro mac1 inc= &inc firm= &firm;
%mend mac1;
/* Step 1: copy values of INCOME and COMPANY to macro variables for all
observations in which INCOME is greater than 0 */
data _null_;
set two end=last;
where income gt 0;
call symput('income'|| compress(put(_n_,3.)), compress(put(income,11.2)));
call symput('company'|| compress(put(_n_,3.)), trim(left(company)));
if last then call symput('num_values', compress(put(_n_,3.)));
run;
/* Step 2: call macro MAC1 once for each set of values of INCOME and COMPANY
that were written to macro variables */
%macro macloop1;
%local j;
%do j = 1 %to &num_values;

 

 

Thank you!!

 

9 REPLIES 9
ballardw
Super User

Did you examine the criterion= option of Proc Compare? You can set a limit of how different values must be to be considered "not equal" for the report summary.

 

data work.one;
   input x;
datalines;
1.234
23.567
123.45678
;

data work.two;
   input x;
datalines;
1.235
23.564
123.4538
;

title "Without criterion";
proc compare base=work.one
     compare=work.two;
run;
title "With criterion";
proc compare base=work.one
     compare=work.two
     criterion=0.01
;
run;title;

It is moderately obvious that large values of criterion are not a good idea. You didn't give an actual range of values involved but hopefully your rounding differences are not producing really big differences. Only one criterion value is available per procedure call which means that many differently ranged variables may not work well but it sounds like you are comparing only a few variables so this might be appropriate.

 

Note that you do still get a "not EXACTLY equal" but the number reported as unequal for the comparison changes.

crk419
Calcite | Level 5

I had already tried the criterion option, and just tried it again. It does not change my results.

ballardw
Super User

@crk419 wrote:

I had already tried the criterion option, and just tried it again. It does not change my results.


The perhaps you can provide 1) some example data sets, just a few observations with the issue you observe for those variables, 2) the code you attempted with criterion (hopefully not a bunch of macro calls) and 3) the expected/desired output for that example input.

 

Your picture of a data set doesn't really clear up which variable(s) may be involved or which values you would expect to be 0 in the output. From some of the pictured values criterion of 0.1 might work such as NETINBM or ILNDOM but where a visible range of results is 65, such as IFREPO or 33 for IGLSEC I would completely clueless what you expect. As I mentioned, a single value for criterion is extremely unlikely to work for all variables.  A few with similar behavior likely.

 

You macro really doesn't make a lot of sense to me. It appears that you are trying to create a different rule from each company.

Placing many values from a data set into macro variables and then attempting to manipulate the data is seldom the most productive approach.

 

Also you don't show any of the Proc compare code you may have attempted. Since you are showing an attempt at a per company rule with the macro code I must ask if you used a BY COMPANY approach to the proc compare code? In general it is a very good idea to sort both data sets prior to proc compare to get the data as close to matching as possible and the BY statement would likely be needed to help address any possible issue of differing number of records between two sets.

ChrisNZ
Tourmaline | Level 20

There are many comparisons that are less than a 1% difference, due to rounding errors, so I want to change those to 0's.

 

> I had already tried the criterion option, and just tried it again. It does not change my results.

 

Well it should. You are not using it properly it seems.

Note that the CRITERION option depends on the METHOD option.

 

crk419
Calcite | Level 5

@ballardw  @ChrisNZ 

 

I have a situation in which both datasets being compared have the same exact variables. I would like to get the percent difference for each variable between the two datasets to validate the data. I am trying to change all of the outcomes where the percent difference is in between -1% and 1% to 0 so that I can more easily view only the differences that are not due to rounding errors.

 

Please see attached for the different methods that I tried for the proc compare statement and their sample outputs.

 

Thank you!!

ballardw
Super User

Small examples of your data , maybe 5 variables and 10 records each of the input data sets might allow us to show code. Ensue that the values of the ID variables match in the t

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

There is no way I can test code against a picture of an output data set. Especially where it would likely be better to have the proc compare run properly.

 

I do note that that the code you show in the attachment does not include any criterion option.

crk419
Calcite | Level 5

For some reason when I try method=percent, it does not give me percents. Below the first example, I showed the method (outpercent) that does give me percent differences.

 

proc compare base=carlos.cb_db3 compare=carlos.cbhsob2 out=carlos.compareCBtest2 method=percent criterion=0.01 nosummary;
id CALLYM STNAME;
run;

 

{i}

data WORK.COMPARECBTEST2(label='Comparison of CARLOS.CB_DB3 with CARLOS.CBHSOB2');
infile datalines dsd truncover;

input _TYPE_:$8. _OBS_:32. CALLYM:BEST12. STNAME:$2. TABLEKEY:BEST12. NETINC:BEST12. ASSET:BEST12. DEP:BEST12. INTINC:BEST12. EINTEXP:BEST12. NIM:BEST12. NONII:BEST12. NONIX:BEST12. ELNATR:BEST12. IGLSEC:BEST12. ITAX:BEST12. EXTRA:BEST12. NETIMIN:BEST12. N
ETINBM:BEST12. ILNDOM:BEST12. ILNFOR:BEST12. ILS:BEST12. ILNLS:BEST12. ISC:BEST12. ITRADE:BEST12. IFREPO:BEST12. ICHBAL:BEST12. EDEPDOM:BEST12. EDEPFOR:BEST12. EDEP:BEST12. EFREPP:BEST12. ESUBND:BEST12. ISERCHG:BEST12. ESAL:BEST12. EPREMAGG:BEST12. EOTHNIN
T:BEST12. DRLNLS:BEST12. CRLNLS:BEST12. NTLNLS:BEST12. EQCDIVC:BEST12. EQCDIVP:BEST12. EQCDIV:BEST12. NUMEMP:BEST12. CHBAL:BEST12. SC:BEST12. LNLSGR:BEST12. LNATRES:BEST12. LNLSNET:BEST12. BKPREM:BEST12. ORE:BEST12. INTAN:BEST12. SCUST:BEST12. SCAGE:BEST12
. SCMUNI:BEST12. SCEQ:BEST12. SCMV:BEST12. LNRE:BEST12. LNDEP:BEST12. LNAG:BEST12. LNCI:BEST12. LNCON:BEST12. LNMUNI:BEST12. LS:BEST12. LNLS:BEST12. UNINC:BEST12. LNRECONS:BEST12. LNRERES:BEST12. LNREMULT:BEST12. LNREAG:BEST12. LNRENRES:BEST12. LNREDOM:BES
T12. LNREFOR:BEST12. LNCRCD:BEST12. LNCONOTH:BEST12. LNAUTO:BEST12. SUBND:BEST12. EQPP:BEST12. EQCS:BEST12. EQSUR:BEST12. EQCONSUB:BEST12. DEPNI:BEST12. DDT:BEST12. NTRTIME:BEST12. TRN:BEST12. NTR:BEST12. CHBALI:BEST12. FREPO:BEST12. TRADE:BEST12. FREPP:BE
ST12. DEPNIFOR:BEST12. DEPIFOR:BEST12. DEPFOR:BEST12. P3LNLS:BEST12. P9LNLS:BEST12. NALNLS:BEST12. NCLNLS:BEST12.;

format CALLYM BEST12. TABLEKEY BEST12. NETINC BEST12. ASSET BEST12. DEP BEST12. INTINC BEST12. EINTEXP BEST12. NIM BEST12. NONII BEST12. NONIX BEST12. ELNATR BEST12. IGLSEC BEST12. ITAX BEST12. EXTRA BEST12. NETIMIN BEST12. NETINBM BEST12. ILNDOM BEST12. I
LNFOR BEST12. ILS BEST12. ILNLS BEST12. ISC BEST12. ITRADE BEST12. IFREPO BEST12. ICHBAL BEST12. EDEPDOM BEST12. EDEPFOR BEST12. EDEP BEST12. EFREPP BEST12. ESUBND BEST12. ISERCHG BEST12. ESAL BEST12. EPREMAGG BEST12. EOTHNINT BEST12. DRLNLS BEST12. CRLNLS
BEST12. NTLNLS BEST12. EQCDIVC BEST12. EQCDIVP BEST12. EQCDIV BEST12. NUMEMP BEST12. CHBAL BEST12. SC BEST12. LNLSGR BEST12. LNATRES BEST12. LNLSNET BEST12. BKPREM BEST12. ORE BEST12. INTAN BEST12. SCUST BEST12. SCAGE BEST12. SCMUNI BEST12. SCEQ BEST12. S
CMV BEST12. LNRE BEST12. LNDEP BEST12. LNAG BEST12. LNCI BEST12. LNCON BEST12. LNMUNI BEST12. LS BEST12. LNLS BEST12. UNINC BEST12. LNRECONS BEST12. LNRERES BEST12. LNREMULT BEST12. LNREAG BEST12. LNRENRES BEST12. LNREDOM BEST12. LNREFOR BEST12. LNCRCD BES
T12. LNCONOTH BEST12. LNAUTO BEST12. SUBND BEST12. EQPP BEST12. EQCS BEST12. EQSUR BEST12. EQCONSUB BEST12. DEPNI BEST12. DDT BEST12. NTRTIME BEST12. TRN BEST12. NTR BEST12. CHBALI BEST12. FREPO BEST12. TRADE BEST12. FREPP BEST12. DEPNIFOR BEST12. DEPIFOR
BEST12. DEPFOR BEST12. P3LNLS BEST12. P9LNLS BEST12. NALNLS BEST12. NCLNLS BEST12.;
label _TYPE_="Type of Observation" _OBS_="Observation Number";
datalines;
DIF 1 200012 AK 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DIF 2 200012 AL 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
DIF 3 200012 AR 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 . 0 0 0 0 0 0 0 . 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0
DIF 4 200012 AS 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 . 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0
DIF 5 200012 AZ 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
;;;;

{i}

 

 

 

 

 

 

 

 

 

proc compare base=carlos.cb_db3 compare=carlos.cbhsob2 out=carlos.compareCB outpercent nosummary;
id CALLYM STNAME;
run;

 

{i}

data WORK.COMPARECB(label='Comparison of CARLOS.CB_DB3 with CARLOS.CBHSOB2');
infile datalines dsd truncover;

input _TYPE_:$8. _OBS_:32. CALLYM:BEST12. STNAME:$2. TABLEKEY:BEST12. NETINC:BEST12. ASSET:BEST12. DEP:BEST12. INTINC:BEST12. EINTEXP:BEST12. NIM:BEST12. NONII:BEST12. NONIX:BEST12. ELNATR:BEST12. IGLSEC:BEST12. ITAX:BEST12. EXTRA:BEST12. NETIMIN:BEST12. N
ETINBM:BEST12. ILNDOM:BEST12. ILNFOR:BEST12. ILS:BEST12. ILNLS:BEST12. ISC:BEST12. ITRADE:BEST12. IFREPO:BEST12. ICHBAL:BEST12. EDEPDOM:BEST12. EDEPFOR:BEST12. EDEP:BEST12. EFREPP:BEST12. ESUBND:BEST12. ISERCHG:BEST12. ESAL:BEST12. EPREMAGG:BEST12. EOTHNIN
T:BEST12. DRLNLS:BEST12. CRLNLS:BEST12. NTLNLS:BEST12. EQCDIVC:BEST12. EQCDIVP:BEST12. EQCDIV:BEST12. NUMEMP:BEST12. CHBAL:BEST12. SC:BEST12. LNLSGR:BEST12. LNATRES:BEST12. LNLSNET:BEST12. BKPREM:BEST12. ORE:BEST12. INTAN:BEST12. SCUST:BEST12. SCAGE:BEST12
. SCMUNI:BEST12. SCEQ:BEST12. SCMV:BEST12. LNRE:BEST12. LNDEP:BEST12. LNAG:BEST12. LNCI:BEST12. LNCON:BEST12. LNMUNI:BEST12. LS:BEST12. LNLS:BEST12. UNINC:BEST12. LNRECONS:BEST12. LNRERES:BEST12. LNREMULT:BEST12. LNREAG:BEST12. LNRENRES:BEST12. LNREDOM:BES
T12. LNREFOR:BEST12. LNCRCD:BEST12. LNCONOTH:BEST12. LNAUTO:BEST12. SUBND:BEST12. EQPP:BEST12. EQCS:BEST12. EQSUR:BEST12. EQCONSUB:BEST12. DEPNI:BEST12. DDT:BEST12. NTRTIME:BEST12. TRN:BEST12. NTR:BEST12. CHBALI:BEST12. FREPO:BEST12. TRADE:BEST12. FREPP:BE
ST12. DEPNIFOR:BEST12. DEPIFOR:BEST12. DEPFOR:BEST12. P3LNLS:BEST12. P9LNLS:BEST12. NALNLS:BEST12. NCLNLS:BEST12.;

format CALLYM BEST12. TABLEKEY BEST12. NETINC BEST12. ASSET BEST12. DEP BEST12. INTINC BEST12. EINTEXP BEST12. NIM BEST12. NONII BEST12. NONIX BEST12. ELNATR BEST12. IGLSEC BEST12. ITAX BEST12. EXTRA BEST12. NETIMIN BEST12. NETINBM BEST12. ILNDOM BEST12. I
LNFOR BEST12. ILS BEST12. ILNLS BEST12. ISC BEST12. ITRADE BEST12. IFREPO BEST12. ICHBAL BEST12. EDEPDOM BEST12. EDEPFOR BEST12. EDEP BEST12. EFREPP BEST12. ESUBND BEST12. ISERCHG BEST12. ESAL BEST12. EPREMAGG BEST12. EOTHNINT BEST12. DRLNLS BEST12. CRLNLS
BEST12. NTLNLS BEST12. EQCDIVC BEST12. EQCDIVP BEST12. EQCDIV BEST12. NUMEMP BEST12. CHBAL BEST12. SC BEST12. LNLSGR BEST12. LNATRES BEST12. LNLSNET BEST12. BKPREM BEST12. ORE BEST12. INTAN BEST12. SCUST BEST12. SCAGE BEST12. SCMUNI BEST12. SCEQ BEST12. S
CMV BEST12. LNRE BEST12. LNDEP BEST12. LNAG BEST12. LNCI BEST12. LNCON BEST12. LNMUNI BEST12. LS BEST12. LNLS BEST12. UNINC BEST12. LNRECONS BEST12. LNRERES BEST12. LNREMULT BEST12. LNREAG BEST12. LNRENRES BEST12. LNREDOM BEST12. LNREFOR BEST12. LNCRCD BES
T12. LNCONOTH BEST12. LNAUTO BEST12. SUBND BEST12. EQPP BEST12. EQCS BEST12. EQSUR BEST12. EQCONSUB BEST12. DEPNI BEST12. DDT BEST12. NTRTIME BEST12. TRN BEST12. NTR BEST12. CHBALI BEST12. FREPO BEST12. TRADE BEST12. FREPP BEST12. DEPNIFOR BEST12. DEPIFOR
BEST12. DEPFOR BEST12. P3LNLS BEST12. P9LNLS BEST12. NALNLS BEST12. NCLNLS BEST12.;
label _TYPE_="Type of Observation" _OBS_="Observation Number";
datalines;
PERCENT 1 200012 AK 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 . 0 0 0 . 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 . 0 . 0 0 . 0 0 0 0 0 0 0 . 0 . . . 0 0 0 0
PERCENT 2 200012 AL 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PERCENT 3 200012 AR 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 . 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0.0024536465 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 . 0 0 0 0 . 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0
PERCENT 4 200012 AS 0 0 0 0 0 0 0 0 0 0 . 0 . . . 0 . . 0 0 . 0 0 0 . 0 . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 0 . . 0 . 0 0 0 . . 0 0 . . 0 0 0 0 0 . 0 0 . 0 0 . . . 0 0 . 0 0 0 0 0 0 0 . . . . . 0 0 0 0
PERCENT 5 200012 AZ 0 0 0 0 0 0 0 0 0 0 0 0 0 . . 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . 0 0 . 0 . 0 0 . 0 0 0 0 0 0 0 0 0 . 0 0 0 0 0 0
;;;;

{i}

 

 

 

 

ChrisNZ
Tourmaline | Level 20

> For some reason when I try method=percent, it does not give me percents. Below the first example, I showed the method (outpercent) that does give me percent differences.

 

Option method=percent is for selecting non-matching values in the output

 

Option outpercent controls the output data set, which is created for every value.

 

 

Tom
Super User Tom
Super User

Sound like you just want to use a custom format to display the variables that contain the calculated percent difference.

proc format ;
 value mypercent 
   -0.01 - 0.01 = 'Zero'
  other=_same_
  ;
run;
proc print data=my_compare_results;
  format var1 var2 .... varn mypercent.;
run;

Note, again, that pasting images of your data does not help as much as actual text with the names of the variables and example values.   So you will need to modify the above code to use the name of your dataset and the names of your variables.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1738 views
  • 0 likes
  • 4 in conversation