DATA Step, Macro, Functions and more

left join procedure when the commun variable include more values in the second file than in the first file

Reply
Contributor
Posts: 38

left join procedure when the commun variable include more values in the second file than in the first file

Hello, does the  "left join" procedure also work if the common variable include different values? Meaning that the second file include all the values that appear in the first file and more?

For example in the follow procedure:

proc sql;

  create table reg as

    select a.*,b.company_price

     from invest1 a

       left join invest2 b

       on put(a. b_s_d_w, date.) eq put   (b.DATADATE, date.)

           order by GVKEY, b_s_d_w;

quit;

run;

the files look like that:

invest1:

date                     

31/03/2012  

30/06/2012

Invest2:

Date                             company_price

.

.

.28/03/2012                       10

29/03/2012                        11

30/03/2012                        14

31/03/2012                        13                       

01/04/2012                        17

.

.

.

29/06/2012                      16

30/06/2012                      14

01/07/2012                      18

The file that I want to create should look like that:

Date                     company_price

31/03/2012                     13

30/06/2012                     14

For some reason , when I run this procedure I receive multiple results, the report look like that:

Date                   company_price

31/03/2012                       10

31/03/2012                      11

31/03/2012                      14

31/03/2012                      13

30/06/2012                      16

30/06/2012                      14

30/06/2012                      18

Any ideas what cause this problem?

Respected Advisor
Posts: 4,646

Re: left join procedure when the commun variable include more values in the second file than in the first file

What happens if you do:

proc sql;

  create table reg as

    select a.*, b.company_price

     from invest1 a

       left join invest2 b

       on a.b_s_d_w = b.DATADATE

           order by GVKEY, b_s_d_w;

quit;


PG

PG
Contributor
Posts: 38

Re: left join procedure when the commun variable include more values in the second file than in the first file

still receiving the same results...

Super User
Posts: 17,819

Re: left join procedure when the commun variable include more values in the second file than in the first file

Date2 has multiple matches for date variable in the second table. You've only shown us a subset but if you look closely at your second dataset you'll see that the date 31/03/2012 would occur multiple times in your second dataset.

Contributor
Posts: 38

Re: left join procedure when the commun variable include more values in the second file than in the first file

as I understand it the file "invest2" dont has multiple matches it just contain more values than needed in order to match the values at file "invest1".

at invest1 there is only two values: 31/03/2012 and 30/06/2012 and in the file invest2 there is daily values (28/03/2012, 29/03/2012...) that include the values 31/03/2012 and 30/06/2012 but also other values.

Super User
Posts: 17,819

Re: left join procedure when the commun variable include more values in the second file than in the first file

I'm confused, is your problem resolved, do you understand why you're receiving the results you're receiving?

If you want a single returned value you'll have to limit the query somehow and we don't have enough information to say how. My guess is you need to join on another field that's not shown.

Contributor
Posts: 38

Re: left join procedure when the commun variable include more values in the second file than in the first file

The result that I'm getting are meaningless, I don't understand why Im getting them.

In my original files I have in the first file 400 values of quarterly dates (31/03/2012, 30/06/2012, 31/12/2012, 21/03/2013, 30/06/2013....) and in the second file I have around 20,000 value of daily dates (and stock proce for each date).

the daily dates includ םf course the quarterly dates but many other dates as well.

I want the created table to take from the second file only the querterly dates and assign their stock price to the first file that contain quearterly dates.

hope its more clear now...

Respected Advisor
Posts: 4,646

Re: left join procedure when the commun variable include more values in the second file than in the first file

Could it be that the data tables do not contain exactly the values that you intended and that you are showing us? What does the following query return?

proc sql;

select *

from invest2

where DATADATE = "31MAR2012"d;

quit;

PG

PG
Contributor
Posts: 38

Re: left join procedure when the commun variable include more values in the second file than in the first file

In my original files I have in the first file 400 values of quarterly dates (31/03/2012, 30/06/2012, 31/12/2012, 21/03/2013, 30/06/2013....) and in the second file I have around 20,000 value of daily dates (and stock proce for each date).

the daily dates includ םf course the quarterly dates but many other dates as well.

I want the created table to take from the second file only the querterly dates and assign their stock price to the first file that contain quearterly dates.

I can't enter where statement because I have too many quarterly figures.

Respected Advisor
Posts: 4,646

Re: left join procedure when the commun variable include more values in the second file than in the first file

The simple query above is just a test. It should return one and only one observation from dataset invest2. What does it return?

PG
Contributor
Posts: 38

Re: left join procedure when the commun variable include more values in the second file than in the first file

Its return all the prices of all the daily prices and its attached to the querterly dates, and because there are more daily values than querterly values its dublicate the querterly values.

Super User
Posts: 17,819

Re: left join procedure when the commun variable include more values in the second file than in the first file

Although a left join only keeps records that are in the left table, if there are multiple matches it takes all the matches.

Here's an example to illustrate your problem and shows how to diagnose it:

data table1;

format date date9.;

input rec date anydtdte.;

cards;

1 31Mar2012

2 30Jun2012

;

run;

data table2;

format date date9.;

do date="01Mar2012"d to "31Dec2012"d;

price=date/365;

output;

end;

date="31Mar2012"d; price=220; output;

date="30Jun2012"d; price=200; output;

run;

proc sql;

  create table want as

  select a.*, b.price

  from table1 as a

  left join table2 as b

  on a.date=b.date;

quit;

*diagnose it;

proc sort data=table2;

by date;

run;

data dup_recs;

  set table2;

  by date;

  if not (first.date and last.date) then output;

run;

Contributor
Posts: 38

Re: left join procedure when the commun variable include more values in the second file than in the first file

I will try this although I don't have multiple matches I just have Needless values in the second file that the statement need to agnore them and for some resone it doesn't agnore. and because it don't agnore them its present the querterly figures several times.

Super User
Posts: 6,936

Re: left join procedure when the commun variable include more values in the second file than in the first file

You obviously have multiple matches for the same date in invest2. Therefore you have to set up a rule which one of those matches you want.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 13 replies
  • 327 views
  • 1 like
  • 4 in conversation