Hello, does the "left join" procedure also work if the common variable include different values? Meaning that the second file include all the values that appear in the first file and more?
For example in the follow procedure:
proc sql;
create table reg as
select a.*,b.company_price
from invest1 a
left join invest2 b
on put(a. b_s_d_w, date.) eq put (b.DATADATE, date.)
order by GVKEY, b_s_d_w;
quit;
run;
the files look like that:
invest1:
date
31/03/2012
30/06/2012
Invest2:
Date company_price
.
.
.28/03/2012 10
29/03/2012 11
30/03/2012 14
31/03/2012 13
01/04/2012 17
.
.
.
29/06/2012 16
30/06/2012 14
01/07/2012 18
The file that I want to create should look like that:
Date company_price
31/03/2012 13
30/06/2012 14
For some reason , when I run this procedure I receive multiple results, the report look like that:
Date company_price
31/03/2012 10
31/03/2012 11
31/03/2012 14
31/03/2012 13
30/06/2012 16
30/06/2012 14
30/06/2012 18
Any ideas what cause this problem?
What happens if you do:
proc sql;
create table reg as
select a.*, b.company_price
from invest1 a
left join invest2 b
on a.b_s_d_w = b.DATADATE
order by GVKEY, b_s_d_w;
quit;
PG
still receiving the same results...
Date2 has multiple matches for date variable in the second table. You've only shown us a subset but if you look closely at your second dataset you'll see that the date 31/03/2012 would occur multiple times in your second dataset.
as I understand it the file "invest2" dont has multiple matches it just contain more values than needed in order to match the values at file "invest1".
at invest1 there is only two values: 31/03/2012 and 30/06/2012 and in the file invest2 there is daily values (28/03/2012, 29/03/2012...) that include the values 31/03/2012 and 30/06/2012 but also other values.
I'm confused, is your problem resolved, do you understand why you're receiving the results you're receiving?
If you want a single returned value you'll have to limit the query somehow and we don't have enough information to say how. My guess is you need to join on another field that's not shown.
The result that I'm getting are meaningless, I don't understand why Im getting them.
In my original files I have in the first file 400 values of quarterly dates (31/03/2012, 30/06/2012, 31/12/2012, 21/03/2013, 30/06/2013....) and in the second file I have around 20,000 value of daily dates (and stock proce for each date).
the daily dates includ םf course the quarterly dates but many other dates as well.
I want the created table to take from the second file only the querterly dates and assign their stock price to the first file that contain quearterly dates.
hope its more clear now...
Could it be that the data tables do not contain exactly the values that you intended and that you are showing us? What does the following query return?
proc sql;
select *
from invest2
where DATADATE = "31MAR2012"d;
quit;
PG
In my original files I have in the first file 400 values of quarterly dates (31/03/2012, 30/06/2012, 31/12/2012, 21/03/2013, 30/06/2013....) and in the second file I have around 20,000 value of daily dates (and stock proce for each date).
the daily dates includ םf course the quarterly dates but many other dates as well.
I want the created table to take from the second file only the querterly dates and assign their stock price to the first file that contain quearterly dates.
I can't enter where statement because I have too many quarterly figures.
The simple query above is just a test. It should return one and only one observation from dataset invest2. What does it return?
Its return all the prices of all the daily prices and its attached to the querterly dates, and because there are more daily values than querterly values its dublicate the querterly values.
Although a left join only keeps records that are in the left table, if there are multiple matches it takes all the matches.
Here's an example to illustrate your problem and shows how to diagnose it:
data table1;
format date date9.;
input rec date anydtdte.;
cards;
1 31Mar2012
2 30Jun2012
;
run;
data table2;
format date date9.;
do date="01Mar2012"d to "31Dec2012"d;
price=date/365;
output;
end;
date="31Mar2012"d; price=220; output;
date="30Jun2012"d; price=200; output;
run;
proc sql;
create table want as
select a.*, b.price
from table1 as a
left join table2 as b
on a.date=b.date;
quit;
*diagnose it;
proc sort data=table2;
by date;
run;
data dup_recs;
set table2;
by date;
if not (first.date and last.date) then output;
run;
I will try this although I don't have multiple matches I just have Needless values in the second file that the statement need to agnore them and for some resone it doesn't agnore. and because it don't agnore them its present the querterly figures several times.
You obviously have multiple matches for the same date in invest2. Therefore you have to set up a rule which one of those matches you want.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.