BookmarkSubscribeRSS Feed
RickyS
Quartz | Level 8

Would like to do this in SAS EG 6.1 Query builder but if Proc SQL or BASE SAS makes more sense, I'll go in either direction. 

I have two datasets:

 

data one;

input id country $ revenue commission;

datalines;

1 Germany 500 100

2 Germany 500 100

4 Germany 500 100

9 Germany 500 100

11 Mexico 500 100

12 Mexico 500 100

14 Mexico 500 100

17 Mexico 500 100

run;

 

data two;

input id country $ revenue commission;

datalines;

2 Germany 500 100

3 Germany 500 100

4 Germany 500 100

9 Germany 500 100

11 Mexico 500 100

13 Mexico 500 100

15 Mexico 500 100

17 Mexico 500 100

run;

 

Expected Output

1 Germany 500 100

3 Germany 500 100

12 Mexico 500 100

13 Mexico 500 100

14 Mexico 500 100

15 Mexico 500 100

 

Have tried various combinations of inner and left joins with new variables added without success, any help is much appreciated

11 REPLIES 11
Astounding
PROC Star

Questions to get started ...

 

(1) Are you matching on ID and COUNTRY only, or are you also matching on the last two variables as well?  (What if ID and COUNTRY match, but REVENUE doesn't?  Would that be part of the output?)

 

(2) Why is 14 Mexico not part of the expected output?

RickyS
Quartz | Level 8

I am matching on id, for simplicity just listing two countries would prefer not to match on country as I would need to clean up spelling and abbreviations accoss the tables.

 

Although I do not need in the output I can use other variables that are unique to each table for filtering.

Division(int) for data one, and Risk_Profile $ for data two. 

 

Tried various combinations of is not missing, in a list etc. using the unique variables without success.

RickyS
Quartz | Level 8

sorry 14 Mexico 500 100 should be listed in expected output 

ballardw
Super User

For clean data sorted by ID country this works pretty good:

proc compare base=one compare=two out=comp outnoequal outbase outcomp  noprint ;
id id country;
run;

But if your country variable has as much issue with correct values as you imply then how do we know not to consider these as "matches"?

 

 

11 Mexico 500 100

11 Mecixo 500 100

RickyS
Quartz | Level 8

You raise a good point Ballard, the data is less than perfect, what I can say for sure is the id is accurate in both data sets and will either match or not match, this exercise is ultimately trying to verify the amount of $$$ variation when there is a match and when there is a non match.  When there is a non match, identifying the unique data set of origination.

 

Astounding
PROC Star

OK, here's a standard way to find mismatches.  It assumes your data sets are sorted by ID:

 

data mismatch;

merge one (in=in1) two (in=in2);

by id;

if in1=0 or in2=0;

run;

 

Note that this will find mismatches on ID.  However, it will not find disagreements (for example, same ID in both data sets, but COUNTRY is different).  That's a different issue entirely.

Abbie
Calcite | Level 5

This worked like a charm for my problem. Thank you.

Ksharp
Super User

data one;
input id country $ revenue commission;
datalines;
1 Germany 500 100
2 Germany 500 100
4 Germany 500 100
9 Germany 500 100
11 Mexico 500 100
12 Mexico 500 100
14 Mexico 500 100
17 Mexico 500 100
run;
 
data two;
input id country $ revenue commission;
datalines;
2 Germany 500 100
3 Germany 500 100
4 Germany 500 100
9 Germany 500 100
11 Mexico 500 100
13 Mexico 500 100
15 Mexico 500 100
17 Mexico 500 100
run;

proc sql;
(
select *
 from one
except
select *
 from two
)
 
union

(
select *
 from two
except
select *
 from one
)
;
quit;
RickyS
Quartz | Level 8

Thinking this through in my head, would need to standardize the variables in the tables and labels for variables across the tables to do the union based on wildcard or just select and standardize the variables I need.

Ksharp
Super User

You don't need standardize variable or the same variable name .

Only need the the same position and same type variable.

Patrick
Opal | Level 21

@RickyS 

If you use UNION CORR then columns in your source tables don't need to be in the same order. All you need is to ensure that same named variable in different source tables are of the same data type (Char or Num).

You can also use a SELECT *. The result will then contain all the variables which exist in all source tables.

 

And another nice feature of a SAS SQL UNION:

If lengths for character variables in source tables differ then the resulting column will have the max length from all source tables (=no truncation or warning/error messages).

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 5453 views
  • 2 likes
  • 6 in conversation