BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
trich12
Calcite | Level 5

Hi everyone, I have a merging question.  I have two datasets that I want to merge using an identifier and year.  But I need to include a condition that if the year is not present in the second dataset, I would like to use the previous year's data entry.  Here are examples of the data:

Data 1:

Firm      Year

1004     1996

1160     1996

1235     1996

Data 2:

Firm     Year     RepMeasure

1004     1995     0.36

1004     1996     0.42

1160     1995     0.11

1235     1995     0.78

1235     1996     0.69

For Firms 1004 and 1235, merging by firm and year is fine. For firm 1160 though, I don't have 1996 data so I need to use the 1995 RepMeasure.  Let me know if I need to provide any more info. Thanks for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Performancewise, data step hash probably better, but Proc SQL can offer a simple syntax: 

Data _1;

input Firm$      Year;

cards;

1004     1996

1160     1996

1235     1996

;

Data _2;

input Firm$     Year     RepMeasure;

cards;

1004     1995 0.36

1004     1996 0.42

1160     1995 0.11

1235     1995 0.78

1235     1996 0.69

;

proc sql;

  create table _merge as

     select a.*, repmeasure

       from _1 a

         left join _2 b

                   on a.firm=b.firm

                   and a.year - b.year in (1,0)

         group by a.firm

     having a.year - b.year = min(a.year-b.year)

      ;

      quit;

                   

        

Haikuo

View solution in original post

6 REPLIES 6
snoopy369
Barite | Level 11

Easiest way might be just to take the Data2 dataset and add rows for missing years, then merge.

Haikuo
Onyx | Level 15

Performancewise, data step hash probably better, but Proc SQL can offer a simple syntax: 

Data _1;

input Firm$      Year;

cards;

1004     1996

1160     1996

1235     1996

;

Data _2;

input Firm$     Year     RepMeasure;

cards;

1004     1995 0.36

1004     1996 0.42

1160     1995 0.11

1235     1995 0.78

1235     1996 0.69

;

proc sql;

  create table _merge as

     select a.*, repmeasure

       from _1 a

         left join _2 b

                   on a.firm=b.firm

                   and a.year - b.year in (1,0)

         group by a.firm

     having a.year - b.year = min(a.year-b.year)

      ;

      quit;

                   

        

Haikuo

NaveenSrinivasan
Calcite | Level 5

Hi Sir,

I would like to seek clarification on the logic /*and a.year - b.year in (1,0)*/ as it seems the program returns the same results without the condition. So is that condition written to make sure to avoid other possible values? I'd appreciate if you can correct my understanding

sql;


create table _merge as


select a.*, repmeasure


from _1 a


left join _2 b


on a.firm=b.firm


/* and a.year - b.year in (1,0)*/


group by a.firm


a.year - b.year = min(a.year-b.year);


;



Haikuo
Onyx | Level 15

Yes, it is there to rule out other possibilities.  In your initial question, you asked to use "previous year" in case of no-match. Consider the following slightly modified data, run the code with and without the additional condition.

Data _2;

input Firm$ Year RepMeasure;

cards;

1004 1995 0.36

1004 1996 0.42

1160 1995 0.11

1160 1997 100

1235 1995 0.78

1235 1996 0.69

;

Haikuo

Astounding
PROC Star

Two key questions ...

1. Can we rely on this being the full set of all variables in the two incoming data sets?

2. Which observations do you want in the output?  There are at least two ways to imagine what the results should look like.

trich12
Calcite | Level 5

Hai.kuo that worked well. Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1056 views
  • 0 likes
  • 5 in conversation