DATA Step, Macro, Functions and more

How to merge a dataset with duplicate identifiers to a panel data?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

How to merge a dataset with duplicate identifiers to a panel data?

I have two datasets, call them dataset1 and dataset2. The first is a list of identifiers with character variable name "symbol", and there are duplicates in it. I use another variable "id" to order these symbols. The second is a panel data set with symbol and "year". For example,

 

data dataset1;
input id symbol $;
format id 1. symbol $2.;
datalines;
1 A
2 B
3 B
4 C
5 D
6 D
7 D
8 E
;
run;

and 

data dataset2;
length symbol $2 date 8;
informat symbol $2. date yymmdd.;
format symbol $2. date yymmddn8.;
input symbol $ date;
datalines;
A 20120101
A 20120102
A 20120103
B 20120101
B 20120102
C 20120102
C 20120103
C 20120104
D 20120101
D 20120102
D 20120103
D 20120104
E 20120102
E 20120103
;
run;

My purpose is to merge the two datasets by "symbol", and each "id" in dataset1 should be merged with all corresponding symbol-date observations in dataset2. The merged data set should contain 24 observations. However, if I use

data want;
	merge dataset1 (in = a) dataset2 (in = b);
	by symbol;
	if a and b;
run;

I got only 14 observations. The reason is that, duplicate symbols in dataset1 are not treated as different IDs. How can I achieve the result I want?


Accepted Solutions
Solution
‎01-20-2018 06:15 PM
Super User
Posts: 6,921

Re: How to merge a dataset with duplicate identifiers to a panel data?

SQL does this fairly easily:

 

proc sql;

create table want as

select dataset1.*, dataset2.date

from dataset1, dataset2

where dataset1.symbol = dataset2.symbol;

quit;

View solution in original post


All Replies
Solution
‎01-20-2018 06:15 PM
Super User
Posts: 6,921

Re: How to merge a dataset with duplicate identifiers to a panel data?

SQL does this fairly easily:

 

proc sql;

create table want as

select dataset1.*, dataset2.date

from dataset1, dataset2

where dataset1.symbol = dataset2.symbol;

quit;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 211 views
  • 1 like
  • 2 in conversation