Would like to do this in SAS EG 6.1 Query builder but if Proc SQL or BASE SAS makes more sense, I'll go in either direction.
I have two datasets:
data one;
input id country $ revenue commission;
datalines;
1 Germany 500 100
2 Germany 500 100
4 Germany 500 100
9 Germany 500 100
11 Mexico 500 100
12 Mexico 500 100
14 Mexico 500 100
17 Mexico 500 100
run;
data two;
input id country $ revenue commission;
datalines;
2 Germany 500 100
3 Germany 500 100
4 Germany 500 100
9 Germany 500 100
11 Mexico 500 100
13 Mexico 500 100
15 Mexico 500 100
17 Mexico 500 100
run;
Expected Output
1 Germany 500 100
3 Germany 500 100
12 Mexico 500 100
13 Mexico 500 100
14 Mexico 500 100
15 Mexico 500 100
Have tried various combinations of inner and left joins with new variables added without success, any help is much appreciated
Questions to get started ...
(1) Are you matching on ID and COUNTRY only, or are you also matching on the last two variables as well? (What if ID and COUNTRY match, but REVENUE doesn't? Would that be part of the output?)
(2) Why is 14 Mexico not part of the expected output?
I am matching on id, for simplicity just listing two countries would prefer not to match on country as I would need to clean up spelling and abbreviations accoss the tables.
Although I do not need in the output I can use other variables that are unique to each table for filtering.
Division(int) for data one, and Risk_Profile $ for data two.
Tried various combinations of is not missing, in a list etc. using the unique variables without success.
sorry 14 Mexico 500 100 should be listed in expected output
For clean data sorted by ID country this works pretty good:
proc compare base=one compare=two out=comp outnoequal outbase outcomp noprint ; id id country; run;
But if your country variable has as much issue with correct values as you imply then how do we know not to consider these as "matches"?
11 Mexico 500 100
11 Mecixo 500 100
You raise a good point Ballard, the data is less than perfect, what I can say for sure is the id is accurate in both data sets and will either match or not match, this exercise is ultimately trying to verify the amount of $$$ variation when there is a match and when there is a non match. When there is a non match, identifying the unique data set of origination.
OK, here's a standard way to find mismatches. It assumes your data sets are sorted by ID:
data mismatch;
merge one (in=in1) two (in=in2);
by id;
if in1=0 or in2=0;
run;
Note that this will find mismatches on ID. However, it will not find disagreements (for example, same ID in both data sets, but COUNTRY is different). That's a different issue entirely.
This worked like a charm for my problem. Thank you.
data one;
input id country $ revenue commission;
datalines;
1 Germany 500 100
2 Germany 500 100
4 Germany 500 100
9 Germany 500 100
11 Mexico 500 100
12 Mexico 500 100
14 Mexico 500 100
17 Mexico 500 100
run;
 
data two;
input id country $ revenue commission;
datalines;
2 Germany 500 100
3 Germany 500 100
4 Germany 500 100
9 Germany 500 100
11 Mexico 500 100
13 Mexico 500 100
15 Mexico 500 100
17 Mexico 500 100
run;
proc sql;
(
select *
 from one
except
select *
 from two
)
 
union
(
select *
 from two
except
select *
 from one
)
;
quit;
Thinking this through in my head, would need to standardize the variables in the tables and labels for variables across the tables to do the union based on wildcard or just select and standardize the variables I need.
You don't need standardize variable or the same variable name .
Only need the the same position and same type variable.
If you use UNION CORR then columns in your source tables don't need to be in the same order. All you need is to ensure that same named variable in different source tables are of the same data type (Char or Num).
You can also use a SELECT *. The result will then contain all the variables which exist in all source tables.
And another nice feature of a SAS SQL UNION:
If lengths for character variables in source tables differ then the resulting column will have the max length from all source tables (=no truncation or warning/error messages).
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
