Hello,
I have two character variables that I am comparing (Base and Compare). In one of them there are values that are only integers while the other has integers with a comma. i.e. 123456 vs 123,456. Since these represent the same number, I want to remove them from my discrepancy report - i.e I want to compare them and determine that they are not a discrepancy. Base and Compare have other values that are only text or combination of text and integers. These I want to leave as is. Any help with this is greatly appreciated.
Thanks.
You should store such data as numbers, then you can use INT() to discard the decimals or use a proper option in PROC COMPARE.
Why do you have character variables for numbers?
They are character because I transpose the output of the proc compare. There are all different types of data that I am comparing - dates, numbers, text etc. That is why the variables Base and Compare have both numbers (including sometimes dates) as well as text.
I actually did see that when I did an ODS excel output, all the numbers with commas were read into excel without the commas. This means that I can theoretically now write an "if" condition in excel to take these out (since now they are the same). But I would much rather do this up front in SAS. Any suggestions?
@nsns wrote:
They are character because I transpose the output of the proc compare. There are all different types of data that I am comparing - dates, numbers, text etc. That is why the variables Base and Compare have both numbers (including sometimes dates) as well as text.
I actually did see that when I did an ODS excel output, all the numbers with commas were read into excel without the commas. This means that I can theoretically now write an "if" condition in excel to take these out (since now they are the same). But I would much rather do this up front in SAS. Any suggestions?
This to me sort of begs the question. I have never found a good reason to transpose numeric and character values in a single transpose. And if the there are dates and other numeric variables I would likely hesitate, Much less the output of Proc Compare in general.
Perhaps the Proc Compare should be reduced to only compare the variables of like types (as a minimum).
It might have been easier to fix the data before comparing. Given that you woulod like to fix the output data set instead, what do you know about it? Do you know the name of the output data set, and the variables within? You will need to write a program, since that's the usual starting point to be able to write a program.
There are a few ways to do this:
** determine whether a variable (BASE) contains only digits ;
all_digits=(anydigit(BASE)>0 * notdigit(BASE)=0);
** remove commas, if any, from BASE and check whether it matches COMPARE ;
if compress(BASE,',') = COMPARE then.... ;
** check whether, after removing any commas, BASE is a valid number ;
if compress(BASE,',') * 1 > . then.... ;
You can do that like this (for the first version of the date you've got there, which is in 'DATE9.' format:
data test;
dtc='07May2022';
dt=input(dtc,date9.);
format dt date9.;
run;
I don't know offhand what the other format is, but I think it's something like DDMMYYYY - same principle, though.
Once you've done that, the underlying value of dt will just be an integer representing the number of days since Jan 1st, 1960. Any date, regardless of the format applied, will have that property. So if for ex. you have dt2=input(dtc2, ddmmyyyy.); ... you can simply check wither dt = dt2.
Thanks, That doesn't quite work with my dataset.
For example:
data aa;
input name $ Base $ Compare $ ;
cards;
x 123,456 123456
y 987654 987,654
date 16MAY2024 16/05/2024
match yes yes
;
run;
The first two observations are working with the all_digits format. But the dates are different. The dates are values in the base and compare variables.
(you are correct the format of the date is day month year).
If I understand right then you're after something like below.
/* Create sample data */
data have;
input Base :$20. Compare :$20.;
datalines;
123456 123,456
ABC123 ABC123
HelloWorld Hello,World
789012 789012
987654 987,654
Test123 Test123
SASCode SAS,Code
987654 9876,54
;
run;
data compare;
set have;
if 0 then
do;
/* define new vars with same length than orig vars */
base2 =base;
compare2=compare;
end;
/* define regex to remove thousand separator */
prxid=prxparse('s/,(\d{3})/$1/');
base2=prxchange(prxid,-1,strip(Base));
compare2=prxchange(prxid,-1,strip(Compare));
diff_flg= (base2 ne compare2);
run;
proc print data=compare;
run;
Wouldn't it have been easier to compare the values before converting them into strings??
You can use the ? or ?? modifier on the informat used with INPUT() function to ignore errors.
You can use the COMMA informat to read string with and without commas the same (it basically removes all of the commas and dollar signs and then uses normal numeric informat.)
You can use the ANYDT... series of informats to guess automatically which style the string is using.
if input(base,??comma32.) = input(compare,??comma32.)
and not missing(input(base,??comma32.)) then match=1;
else if input(base,??anydtdtm40.) = input(compare,??anydtdtm40.)
and not missing(input(base,??anydtdtm40.) then match=1;
else if base=compare then match=1;
else match=0;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.