BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasphd
Lapis Lazuli | Level 10

Hello, 

I want to join this two tables

Table 1 

id  offer            caldt
1 1997-02-01 1997-12-31
1 1999-02-01 1998-12-31

 

table 2 

id  mret            caldt

1   0.09          1997-12-31

 1   0.87         1998-01-31

...........

1    0.89           1998-12-31

 1     0.34              1999-01-31

 

result table 

 

id  mret            caldt              offer

1   0.09          1997-12-31       1997-02-01

 1   0.87         1998-01-31       1997-02-01

...........

1    0.89           1998-12-31        1999-02-01

1     0.34              1999-01-31     1999-02-01

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

So it's not always the last of the month. In this case, our lookup table best covers all days.

data table1;
input rank_no crsp_fundno mgr_dt :yymmdd10.;
format mgr_dt yymmddd10.;
cards;
1 53 1991-01-01
1 53 1999-01-01
2 105 1985-01-01
2 105 1985-06-01
;
run;

data table2;
input rank_no crsp_fundno caldt :yymmdd10. mret;
format caldt yymmddd10.;
cards;
1 53 1990-11-30 1.0459
1 53 1990-12-31 1.0250
1 53 1991-01-31 1.0435
1 53 1991-02-28 1.0499
1 53 1991-03-28 1.0078
1 53 1999-01-29 0.9810
1 53 1999-02-26 1.0002
1 53 1999-03-31 1.0238
2 105 1984-12-31 1.0238
2 105 1985-01-31 1.0238
2 105 1985-02-28 1.0238
2 105 1985-03-29 1.0238
2 105 1985-04-30 1.0238
2 105 1985-05-31 1.0238
2 105 1985-06-28 1.0238
2 105 1985-07-31 0.9990
;
run;

%let maxdate=%sysfunc(date());

data t1_int (keep=rank_no crsp_fundno oldmgrdt caldt rename=(oldmgrdt=mgr_dt));
set table1 (keep=rank_no crsp_fundno mgr_dt);
by rank_no crsp_fundno;
format oldmgrdt caldt yymmddd10.;
oldmgrdt = intnx('month',lag(mgr_dt),0,'end');
if not first.crsp_fundno
then do;
  do caldt = oldmgrdt to mgr_dt - 1;
    output;
  end;
end;
if last.crsp_fundno
then do;
  oldmgrdt = mgr_dt;
  do caldt = mgr_dt to &maxdate;
    output;
  end;
end;
run;

data table3;
merge
  t1_int (in=a)
  table2 (in=b)
;
by rank_no crsp_fundno caldt;
if b;
run;

proc print data=table3 noobs;
run;

And note, once again, how to provide test data so that your possible helpers do not have to waste time typing those steps themselves. Its not rocket science and just a matter of basic forum courtesy.

 

This is the result:

            crsp_
rank_no    fundno         caldt     mret         mgr_dt

   1          53     1990-11-30    1.0459             .
   1          53     1990-12-31    1.0250             .
   1          53     1991-01-31    1.0435    1991-01-31
   1          53     1991-02-28    1.0499    1991-01-31
   1          53     1991-03-28    1.0078    1991-01-31
   1          53     1999-01-29    0.9810    1999-01-01
   1          53     1999-02-26    1.0002    1999-01-01
   1          53     1999-03-31    1.0238    1999-01-01
   2         105     1984-12-31    1.0238             .
   2         105     1985-01-31    1.0238    1985-01-31
   2         105     1985-02-28    1.0238    1985-01-31
   2         105     1985-03-29    1.0238    1985-01-31
   2         105     1985-04-30    1.0238    1985-01-31
   2         105     1985-05-31    1.0238    1985-01-31
   2         105     1985-06-28    1.0238    1985-06-01
   2         105     1985-07-31    0.9990    1985-06-01

View solution in original post

13 REPLIES 13
ballardw
Super User

1) Are those apparent data variables SAS date values or character?

2) Is the implied rule that the table 2 caldt fall between the table 1 offer and caldt within the ID?

   2a) if between the dates what happens when there are two or more caldt in table 2 within the range of offer - caldt in table 1? It looks like each one is in the result but want to clarify that is the case.

sasphd
Lapis Lazuli | Level 10

1)the date here are sas variables YYMMDDN8.

 

2) in reality you can ignore the caldt in table 1. I want that in the joint table the offer begin when it equal the caldt and continue so until a new value of offer begin 

 

 

 

Kurt_Bremser
Super User

@sasphd wrote:

1)the date here are sas variables YYMMDDN8.

 

2) in reality you can ignore the caldt in table 1. I want that in the joint table the offer begin when it equal the caldt and continue so until a new value of offer begin 

 

 

 


Then your desired result does not match your criteria, as the second offer date is larger than all caldt's in table 2.

Kurt_Bremser
Super User

On the assumption that your example table 2 is simply not complete, and that caldt in table 2 always falls on the last day of a month, I did this:

data t1;
input id offer :yymmdd10. caldt :yymmdd10.;
format offer caldt yymmddd10.;
cards;
1 1997-02-01 1997-12-31
1 1999-02-01 1998-12-31
;
run;

data t2;
input id mret caldt :yymmdd10.;
format caldt yymmddd10.;
cards;
1 0.09 1997-12-31
1 0.87 1998-01-31
1 0.89 1998-12-31
1 0.34 1999-01-31
1 0.35 1999-02-28
1 0.40 1999-03-31
;
run;

%let maxdate=%sysfunc(date());

data t1_int (keep=id oldoffer caldt rename=(oldoffer=offer));
set t1 (keep=id offer);
by id;
format oldoffer caldt yymmddd10.;
oldoffer = lag(offer);
if not first.id
then do;
  caldt = intnx('month',oldoffer,0,'end');
  do until (caldt > offer);
    output;
    caldt = intnx('month',caldt,1,'end');
  end;
end;
if last.id
then do;
  caldt = intnx('month',offer,0,'end');
  oldoffer = offer;
  do until (caldt > &maxdate);
    output;
    caldt = intnx('month',caldt,1,'end');
  end;
end;
run;

data want;
merge
  t1_int (in=a)
  t2 (in=b)
;
by id caldt;
if b;
run;

Basically, I create a lookup table for all possible end-of-month dates from the offers, and merged that with table 2.

If caldt could fall on other days than end-of-month, one could use

 do caldt = oldoffer to offer - 1;

and create a lookup table for all dates, but that would of course be quite large.

Feragon42
Fluorite | Level 6

If the IDs are differents (You use in the example only ID 1) use a PROC SQL, but watch out with what's your primary table, because the observations of table 2 with an ID that don't appear in table 1 will be ignored.

 

proc sql;
   create table table_3 as
   select t1.*,
          t2.mret
   from table_1 t1
   left join table_2 t2 on (t1.id = t2.id)
quit;

 

sasphd
Lapis Lazuli | Level 10

your solution give me this resut (real data). however in table 3 the caldt is repeted as number of MGR_DT. 

How can I eliminate this!!!!!!!!!!!!!!!!!

this is the program and the data tables

proc sql;
create table table_3 as
select t2.*,
t1.mret, t1.caldt
from table2 t1
left join table1 t2 on (t1.crsp_fundno = t2.crsp_fundno)
order by t1.crsp_fundno, t1.Caldt;
quit;

sasphd
Lapis Lazuli | Level 10

your solution give me this resut (real data). however in table 3 the caldt is repeted as number of MGR_DT. I want distinct caldt for each id

How can I eliminate this!!!!!!!!!!!!!!!!!

this is the program and the data tables

proc sql;
create table table_3 as
select t2.*,
t1.mret, t1.caldt
from table2 t1
left join table1 t2 on (t1.crsp_fundno = t2.crsp_fundno)
order by t1.crsp_fundno, t1.Caldt;
quit;

Feragon42
Fluorite | Level 6

Hello!

When you use the Left Join, you will use all observations of the declared table 1 (t1) and merge with the observations that match with key (in your case crsp_fundno) of table 2 (t2), and ignore the rest.

 img_leftjoin.gif

 

How I say in my first comment, it's better to implement this method when you have an unique key (and ID or something), but you hasn't it, because crsp_fundno repeat itself a great amount of times. 

The thinks what your codes do is the next:

Your t1 is table2, where do you take only caldt and mret. Then, as table2 is the t1, the program put all the results of this table.

Meanwhile, table1 is t2, so, it will put the result that coincide with the key of t1 (crsp_fundno), and, if t1 have less results (as it is), it will begin to repeat this results, as you see in the graphic bellow.

The crsp_fundno 53 has a lot of results in table2 (t1), and only two in table1 (t2), one with the mgr_dt equal to 01/01/1999 and the other with 01/01/1991. Because of that, the program begin two repeat this result until it fulfill with the t1 results amount. And that is because you have this problem.

Captura.PNG

Now, if you use this software in other way, with table1 as t1, and table2 as t2. It will take only two results of t2 and ignore the rest (that is a lot). There is a grapich of more joins that sql do:

SQL Joins.jpg

 

And with your question. You want differents caidt for each crsp_fundno, but you have a lot of equals crsp_fundno. You want to eliminate duplicate or something?

Sorry for the long text, but I want to explain the code that you do, and why you have that result. But really I don't understand what you want with the crsp_fundno. 

sasphd
Lapis Lazuli | Level 10

thanks a lot 

this is the table 3 that I want. 

I want that MGR_DT will be repeat for each crsp_fundno for the all period and it began when year and month of MGR_DT equal the caldt. 

 

ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Post example data in the form of data step code. You can find instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... to create datastep code from a SAS data set. Post the results as text in the {i} code boxes or attach as TXT files.

sasphd
Lapis Lazuli | Level 10

able 1

 

Rank_NO crsp_fundno MGR_DT
1                      53           1991-01-01
1                      53            1999-01-01
2                    105             1985-01-01
2                    105             1985-06-01

 

 

table 2

Rank_NO crsp_fundno caldt mret
1 53 1990-11-30 1,0459
1 53 1990-12-31 1,0250
1 53 1991-01-31 1,0435
1 53 1991-02-28 1,0499
1 53 1991-03-28 1,0078
1 53 1999-01-29 0,9810
1 53 1999-02-26 1,0002
1 53 1999-03-31 1,0238
2 105 1984-12-31 1,0238
2 105 1985-01-31 1,0238
2 105 1985-02-28 1,0238
2 105 1985-03-29 1,0238
2 105 1985-04-30 1,0238
2 105 1985-05-31 1,0238
2 105 1985-06-28 1,0238
2 105 1985-07-31 0,9990

 

 

I want table 3 

Rank_NO crsp_fundno caldt mret MGR_DT
1 53 1990-11-30 1,0459  
1 53 1990-12-31 1,0250  
1 53 1991-01-31 1,0435 1991-01-01
1 53 1991-02-28 1,0499 1991-01-01
1 53 1991-03-28 1,0078 1991-01-01
1 53 1999-01-29 0,9810 1999-01-01
1 53 1999-02-26 1,0002 1999-01-01
1 53 1999-03-31 1,0238 1999-01-01
2 105 1984-12-31 1,0238  
2 105 1985-01-31 1,0238 1985-01-01
2 105 1985-02-28 1,0238 1985-01-01
2 105 1985-03-29 1,0238 1985-01-01
2 105 1985-04-30 1,0238 1985-01-01
2 105 1985-05-31 1,0238 1985-01-01
2 105 1985-06-28 1,0238 1985-06-01
2 105 1985-07-31 0,999 1985-06-01
Kurt_Bremser
Super User

So it's not always the last of the month. In this case, our lookup table best covers all days.

data table1;
input rank_no crsp_fundno mgr_dt :yymmdd10.;
format mgr_dt yymmddd10.;
cards;
1 53 1991-01-01
1 53 1999-01-01
2 105 1985-01-01
2 105 1985-06-01
;
run;

data table2;
input rank_no crsp_fundno caldt :yymmdd10. mret;
format caldt yymmddd10.;
cards;
1 53 1990-11-30 1.0459
1 53 1990-12-31 1.0250
1 53 1991-01-31 1.0435
1 53 1991-02-28 1.0499
1 53 1991-03-28 1.0078
1 53 1999-01-29 0.9810
1 53 1999-02-26 1.0002
1 53 1999-03-31 1.0238
2 105 1984-12-31 1.0238
2 105 1985-01-31 1.0238
2 105 1985-02-28 1.0238
2 105 1985-03-29 1.0238
2 105 1985-04-30 1.0238
2 105 1985-05-31 1.0238
2 105 1985-06-28 1.0238
2 105 1985-07-31 0.9990
;
run;

%let maxdate=%sysfunc(date());

data t1_int (keep=rank_no crsp_fundno oldmgrdt caldt rename=(oldmgrdt=mgr_dt));
set table1 (keep=rank_no crsp_fundno mgr_dt);
by rank_no crsp_fundno;
format oldmgrdt caldt yymmddd10.;
oldmgrdt = intnx('month',lag(mgr_dt),0,'end');
if not first.crsp_fundno
then do;
  do caldt = oldmgrdt to mgr_dt - 1;
    output;
  end;
end;
if last.crsp_fundno
then do;
  oldmgrdt = mgr_dt;
  do caldt = mgr_dt to &maxdate;
    output;
  end;
end;
run;

data table3;
merge
  t1_int (in=a)
  table2 (in=b)
;
by rank_no crsp_fundno caldt;
if b;
run;

proc print data=table3 noobs;
run;

And note, once again, how to provide test data so that your possible helpers do not have to waste time typing those steps themselves. Its not rocket science and just a matter of basic forum courtesy.

 

This is the result:

            crsp_
rank_no    fundno         caldt     mret         mgr_dt

   1          53     1990-11-30    1.0459             .
   1          53     1990-12-31    1.0250             .
   1          53     1991-01-31    1.0435    1991-01-31
   1          53     1991-02-28    1.0499    1991-01-31
   1          53     1991-03-28    1.0078    1991-01-31
   1          53     1999-01-29    0.9810    1999-01-01
   1          53     1999-02-26    1.0002    1999-01-01
   1          53     1999-03-31    1.0238    1999-01-01
   2         105     1984-12-31    1.0238             .
   2         105     1985-01-31    1.0238    1985-01-31
   2         105     1985-02-28    1.0238    1985-01-31
   2         105     1985-03-29    1.0238    1985-01-31
   2         105     1985-04-30    1.0238    1985-01-31
   2         105     1985-05-31    1.0238    1985-01-31
   2         105     1985-06-28    1.0238    1985-06-01
   2         105     1985-07-31    0.9990    1985-06-01

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4650 views
  • 2 likes
  • 4 in conversation