DATA Step, Macro, Functions and more

Full join - proc sql

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 102
Accepted Solution

Full join - proc sql

Dear all,

I want to join the following two datasets, I need to keep all records including non-matching ones.

My question to you is how could I create some identification variables, that will show whether a observation is coming from dataset one or dataset two.

thanks,

proc sql;
  create table full (drop=cusip2 year2) as
  select  a.*, b.cusip as cusip2, b.year as year2, b.bb
  from one as a full join two as b
on a.cusip=b.cusip and a.year=b.year;
quit;


Data one
Cusip Year aa
100 97 10
101 98 2
102 97 10
103 98 2

Data two
Cusip Year bb
101 98 34
102 97 2
103 98 3
104 99 5


Accepted Solutions
Solution
‎07-05-2013 09:18 AM
Regular Contributor
Posts: 195

Re: Full join - proc sql

Hi,

How about the following one...

proc sql;

   create table both as

   select coalesce(a.cusip,b.cusip) as cusip,

         coalesce(a.year,b.year) as year,

         aa,bb,

         case

                         when bb = . then "Two"

                         when aa = . then "One"

                         else "Match"

         end as source

   from one as a

       full join

       two as b

   on a.cusip = b.cusip;

quit;

-Urvish

View solution in original post


All Replies
Super User
Posts: 19,873

Re: Full join - proc sql

You can add in some case statements that checks if the variables are empty to identify which is the data source.

Data one;

input Cusip Year aa;

cards;

100 97 10

101 98 2

102 97 10

103 98 2

;

Data two;

input Cusip Year bb;

cards;

101 98 34

102 97 2

103 98 3

104 99 5

;

proc sql;

  create table full as

  select  a.*, b.cusip as cusip2, b.year as year2, b.bb

    ,case when b.cusip = a.cusip and a.year=b.year then 'Both'

       when a.cusip=. then 'B'

     else "A" end as source

  from one as a

full outer join two as b

    on a.cusip=b.cusip and a.year=b.year;

quit;

Frequent Contributor
Posts: 102

Re: Full join - proc sql


Thank you,

I used your code as below, and sas log produced an error, please help!

ERROR: Expression using equals (=) has components that are of different data types.

proc sql;

  create table full as

  select  a.*, b.cusip as cusip2, b.year as year2,  b.avgHORI, b.avgSPEC, b.avgcount

    ,case when b.cusip = a.cusip and a.year=b.year then 'Both'

       when a.cusip=. then 'B'

     else 'A' end as source

  from private2 as a full outer join forecast as b

    on a.cusip=b.cusip and a.year=b.year;

quit;

Super User
Super User
Posts: 7,078

Re: Full join - proc sql

Sounds like CUSIP is a character variable in your data.

Instead of comparing it to numeric missing use the IS NULL syntax of SQL or the CMISS() function of SAS.

when a.cusip is null then 'B'


when cmiss(a.cusip) then 'B'

Super User
Super User
Posts: 7,078

Re: Full join - proc sql

Try using NATURAL join.

Compare the results of these two queries.

proc sql ;

    select * from one natural join two ;

    select * from one natural full join two ;

quit;

Super User
Super User
Posts: 7,078

Re: Full join - proc sql

You can add an extra variable by using sub query.

select *

  from (select *,1 as in1 from one) a

    natural full join

        ( select *,1 as in2 from two) b

;

Super User
Posts: 10,046

Re: Full join - proc sql

How about:

Data one;
input Cusip Year aa;
cards;
100 97 10
101 98 2
102 97 10
103 98 2
;
Data two;
input Cusip Year bb;
cards;
101 98 34
102 97 2
103 98 3
104 99 5
;

proc sql;
  create table full  as
  select  a.*,bb,'Both' as source
   from one as a inner join two as b
    on a.cusip=b.cusip and a.year=b.year
outer union corresponding
  select  a.*,'A' as source
   from one as a 
    where cats(a.cusip,a.year) not in ( select cats(cusip,year) from two )
outer union corresponding
  select  b.*,'B' as source
   from two as b 
    where cats(b.cusip,b.year) not in ( select cats(cusip,year) from one )
;
quit;


Ksharp

Solution
‎07-05-2013 09:18 AM
Regular Contributor
Posts: 195

Re: Full join - proc sql

Hi,

How about the following one...

proc sql;

   create table both as

   select coalesce(a.cusip,b.cusip) as cusip,

         coalesce(a.year,b.year) as year,

         aa,bb,

         case

                         when bb = . then "Two"

                         when aa = . then "One"

                         else "Match"

         end as source

   from one as a

       full join

       two as b

   on a.cusip = b.cusip;

quit;

-Urvish

Frequent Contributor
Posts: 102

Re: Full join - proc sql

Posted in reply to UrvishShah

thanks to all of you who offered your help.

I like UrvishShah's answer the best , because it is easier for a beginner like me to use.

One more question,

my original post shows

data one has 3 variables: cusip year aa

data two has 3 variables: cusip year bb

but my actual data has

data one has many variables: cusip year aa1 aa2 aa3 .....

data two has many variables: cusip year bb1, bb2, .....

to Urvish or anyone who reads this,

do I just list on line 5 (see Urvish posting above).

aa1,aa2,aa3,....

or can I do

a.*, b.*,

I ask because I do not want to type so many variable names.

Lan

Super User
Super User
Posts: 7,078

Re: Full join - proc sql

Two choices to make it easier to use * in select statement when joining tables.

1) Use the NATURAL keyword.  SAS will identify the variables with the same names and use them all for joining.  Common variables will automatically be coalesced.

create table want as

  select * from one natural full join two

;

2) Use RENAME and DROP dataset options so that key variables from different datasets have different names.

create table want (drop=CUSIP1 YEAR1 CUSIP2 YEAR2) as

  select coalesce(cusip1,cusip2) as cusip

       , coalesce(year1,year2) as year

        , *

   from one (rename=(cusip=cusip1 year=year1))

     full join

        two (rename=(cusip=cusip2 year=year2))

   on cusip1 = cusip2 and year1=year2

;

      

Note that if you want to be positive about source of each observations you cannot depend on testing for missing values of variables from the source datasets unless you are positive that those variables can never have missing values.

create table want as

  select case when (in1=1 and in2=1) then 'BOTH'

              when (in1=1) then 'ONE'

              else 'TWO'

         end as source

       , *

  from (select *,1 as in1 from one)

    natural full join

       (select *,1 as in2 from two)

;

;

Frequent Contributor
Posts: 102

Re: Full join - proc sql

Thanks a lot, Tom .

Regular Contributor
Posts: 195

Re: Full join - proc sql

Use the %DO Iterative Loops within the macro to refrence many variables at one time...If you are not cleared about no of variables in your dataset then count the no of variables first of all and then save it in one macro variable...I have assumed that there are 20 variables in your dataset...

I have not tested it but should use...Hope it works...

%macro obs_indentify(no_of_vars =);

   proc sql;

      create table both as

     select coalesce(a.cusip,b.cusip) as cusip,

            coalesce(a.year,b.year) as year,

            aa,bb,

            case

                                 %do i = 1 %to &no_of_vars.;

                   when bb&i. = . then "Two"

                   when aa&i. = . then "One"                

               %end; else "Match"

          end as source

       from one as a

                  full join

        two as b

       on a.cusip = b.cusip;

    quit;

%mend;

%obs_identify(no_of_vars = 20);

-Urvish

Frequent Contributor
Posts: 102

Re: Full join - proc sql

Posted in reply to UrvishShah

Urvish,

In your macro, you specify 20 variables, does it count the common variables in both data one and data two.

that is,

e.g. data one has variables: cusip, year, aa, aa1, aa2. (5 vars)

e.g. data two has variables: cusip, year, bb, bb1, bb2. (5 vars)

in this case is no_of_vars =10, or 8 ?

2nd question, in your macro line 5, do we still need to list aa, bb, Or should we remove it,

thanks!

Super User
Posts: 10,046

Re: Full join - proc sql

One problem, if aa or bb has some missing value , Urvish's code will not work any more.

Data one;
input Cusip Year aa;
cards;
100 97 10
101 98 2
102 97 .
103 98 2
;
Data two;
input Cusip Year bb;
cards;
101 98 34
102 97 .
103 98 3
104 99 5
;

Ksharp

Regular Contributor
Posts: 195

Re: Full join - proc sql

Hi

Data one;

input Cusip Year aa;

cards;

100 97 10

101 98 2

102 97 .

103 98 2

;

Data two;

input Cusip Year bb;

cards;

101 98 .

102 97 .

103 98 3

104 99 5

;

proc sql;

   create table both as

   select coalesce(a.cusip,b.cusip) as cusip,

         coalesce(a.year,b.year) as year,

         aa,bb,

         case

                 when a.cusip = b.cusip then "Match"

                 when aa = . and bb = . then "Match"

             when bb = . then "Two"

             when aa = . then "One"

             else "Match"

         end as source

   from one as a

       full join

       two as b

   on a.cusip = b.cusip;

quit;

-Urvish

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 489 views
  • 0 likes
  • 5 in conversation