BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LanMin
Fluorite | Level 6

Dear all,

I want to join the following two datasets, I need to keep all records including non-matching ones.

My question to you is how could I create some identification variables, that will show whether a observation is coming from dataset one or dataset two.

thanks,

proc sql;
  create table full (drop=cusip2 year2) as
  select  a.*, b.cusip as cusip2, b.year as year2, b.bb
  from one as a full join two as b
on a.cusip=b.cusip and a.year=b.year;
quit;


Data one
Cusip Year aa
100 97 10
101 98 2
102 97 10
103 98 2

Data two
Cusip Year bb
101 98 34
102 97 2
103 98 3
104 99 5

1 ACCEPTED SOLUTION

Accepted Solutions
UrvishShah
Fluorite | Level 6

Hi,

How about the following one...

proc sql;

   create table both as

   select coalesce(a.cusip,b.cusip) as cusip,

         coalesce(a.year,b.year) as year,

         aa,bb,

         case

                         when bb = . then "Two"

                         when aa = . then "One"

                         else "Match"

         end as source

   from one as a

       full join

       two as b

   on a.cusip = b.cusip;

quit;

-Urvish

View solution in original post

17 REPLIES 17
Reeza
Super User

You can add in some case statements that checks if the variables are empty to identify which is the data source.

Data one;

input Cusip Year aa;

cards;

100 97 10

101 98 2

102 97 10

103 98 2

;

Data two;

input Cusip Year bb;

cards;

101 98 34

102 97 2

103 98 3

104 99 5

;

proc sql;

  create table full as

  select  a.*, b.cusip as cusip2, b.year as year2, b.bb

    ,case when b.cusip = a.cusip and a.year=b.year then 'Both'

       when a.cusip=. then 'B'

     else "A" end as source

  from one as a

full outer join two as b

    on a.cusip=b.cusip and a.year=b.year;

quit;

LanMin
Fluorite | Level 6


Thank you,

I used your code as below, and sas log produced an error, please help!

ERROR: Expression using equals (=) has components that are of different data types.

proc sql;

  create table full as

  select  a.*, b.cusip as cusip2, b.year as year2,  b.avgHORI, b.avgSPEC, b.avgcount

    ,case when b.cusip = a.cusip and a.year=b.year then 'Both'

       when a.cusip=. then 'B'

     else 'A' end as source

  from private2 as a full outer join forecast as b

    on a.cusip=b.cusip and a.year=b.year;

quit;

Tom
Super User Tom
Super User

Sounds like CUSIP is a character variable in your data.

Instead of comparing it to numeric missing use the IS NULL syntax of SQL or the CMISS() function of SAS.

when a.cusip is null then 'B'


when cmiss(a.cusip) then 'B'

Tom
Super User Tom
Super User

Try using NATURAL join.

Compare the results of these two queries.

proc sql ;

    select * from one natural join two ;

    select * from one natural full join two ;

quit;

Tom
Super User Tom
Super User

You can add an extra variable by using sub query.

select *

  from (select *,1 as in1 from one) a

    natural full join

        ( select *,1 as in2 from two) b

;

Ksharp
Super User

How about:

Data one;
input Cusip Year aa;
cards;
100 97 10
101 98 2
102 97 10
103 98 2
;
Data two;
input Cusip Year bb;
cards;
101 98 34
102 97 2
103 98 3
104 99 5
;

proc sql;
  create table full  as
  select  a.*,bb,'Both' as source
   from one as a inner join two as b
    on a.cusip=b.cusip and a.year=b.year
outer union corresponding
  select  a.*,'A' as source
   from one as a 
    where cats(a.cusip,a.year) not in ( select cats(cusip,year) from two )
outer union corresponding
  select  b.*,'B' as source
   from two as b 
    where cats(b.cusip,b.year) not in ( select cats(cusip,year) from one )
;
quit;


Ksharp

UrvishShah
Fluorite | Level 6

Hi,

How about the following one...

proc sql;

   create table both as

   select coalesce(a.cusip,b.cusip) as cusip,

         coalesce(a.year,b.year) as year,

         aa,bb,

         case

                         when bb = . then "Two"

                         when aa = . then "One"

                         else "Match"

         end as source

   from one as a

       full join

       two as b

   on a.cusip = b.cusip;

quit;

-Urvish

LanMin
Fluorite | Level 6

thanks to all of you who offered your help.

I like UrvishShah's answer the best , because it is easier for a beginner like me to use.

One more question,

my original post shows

data one has 3 variables: cusip year aa

data two has 3 variables: cusip year bb

but my actual data has

data one has many variables: cusip year aa1 aa2 aa3 .....

data two has many variables: cusip year bb1, bb2, .....

to Urvish or anyone who reads this,

do I just list on line 5 (see Urvish posting above).

aa1,aa2,aa3,....

or can I do

a.*, b.*,

I ask because I do not want to type so many variable names.

Lan

Tom
Super User Tom
Super User

Two choices to make it easier to use * in select statement when joining tables.

1) Use the NATURAL keyword.  SAS will identify the variables with the same names and use them all for joining.  Common variables will automatically be coalesced.

create table want as

  select * from one natural full join two

;

2) Use RENAME and DROP dataset options so that key variables from different datasets have different names.

create table want (drop=CUSIP1 YEAR1 CUSIP2 YEAR2) as

  select coalesce(cusip1,cusip2) as cusip

       , coalesce(year1,year2) as year

        , *

   from one (rename=(cusip=cusip1 year=year1))

     full join

        two (rename=(cusip=cusip2 year=year2))

   on cusip1 = cusip2 and year1=year2

;

      

Note that if you want to be positive about source of each observations you cannot depend on testing for missing values of variables from the source datasets unless you are positive that those variables can never have missing values.

create table want as

  select case when (in1=1 and in2=1) then 'BOTH'

              when (in1=1) then 'ONE'

              else 'TWO'

         end as source

       , *

  from (select *,1 as in1 from one)

    natural full join

       (select *,1 as in2 from two)

;

;

LanMin
Fluorite | Level 6

Thanks a lot, Tom .

UrvishShah
Fluorite | Level 6

Use the %DO Iterative Loops within the macro to refrence many variables at one time...If you are not cleared about no of variables in your dataset then count the no of variables first of all and then save it in one macro variable...I have assumed that there are 20 variables in your dataset...

I have not tested it but should use...Hope it works...

%macro obs_indentify(no_of_vars =);

   proc sql;

      create table both as

     select coalesce(a.cusip,b.cusip) as cusip,

            coalesce(a.year,b.year) as year,

            aa,bb,

            case

                                 %do i = 1 %to &no_of_vars.;

                   when bb&i. = . then "Two"

                   when aa&i. = . then "One"                

               %end; else "Match"

          end as source

       from one as a

                  full join

        two as b

       on a.cusip = b.cusip;

    quit;

%mend;

%obs_identify(no_of_vars = 20);

-Urvish

LanMin
Fluorite | Level 6

Urvish,

In your macro, you specify 20 variables, does it count the common variables in both data one and data two.

that is,

e.g. data one has variables: cusip, year, aa, aa1, aa2. (5 vars)

e.g. data two has variables: cusip, year, bb, bb1, bb2. (5 vars)

in this case is no_of_vars =10, or 8 ?

2nd question, in your macro line 5, do we still need to list aa, bb, Or should we remove it,

thanks!

Ksharp
Super User

One problem, if aa or bb has some missing value , Urvish's code will not work any more.

Data one;
input Cusip Year aa;
cards;
100 97 10
101 98 2
102 97 .
103 98 2
;
Data two;
input Cusip Year bb;
cards;
101 98 34
102 97 .
103 98 3
104 99 5
;

Ksharp

UrvishShah
Fluorite | Level 6

Hi

Data one;

input Cusip Year aa;

cards;

100 97 10

101 98 2

102 97 .

103 98 2

;

Data two;

input Cusip Year bb;

cards;

101 98 .

102 97 .

103 98 3

104 99 5

;

proc sql;

   create table both as

   select coalesce(a.cusip,b.cusip) as cusip,

         coalesce(a.year,b.year) as year,

         aa,bb,

         case

                 when a.cusip = b.cusip then "Match"

                 when aa = . and bb = . then "Match"

             when bb = . then "Two"

             when aa = . then "One"

             else "Match"

         end as source

   from one as a

       full join

       two as b

   on a.cusip = b.cusip;

quit;

-Urvish

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 2006 views
  • 0 likes
  • 5 in conversation