Solved: Merging data sets

Amanda_Lemon · Posted 09-11-2019 11:57 PM

Hello,

I am trying to merge two data sets and so far unsuccessfully... I would appreciate any help!

Have: Data set #1 (students are nested within classes, which are in turn nested within teachers)

StudentID ClassID TeacherID

1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2

Data set #2: For this data set, 1 class per teacher was randomly selected from Data set #1, and the procedure was repeated 3 times.

ClassID TeacherID Replicate

1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3

Want: To add student data to Data set #2. So I need to repeat each row as many times as there are students in class, and add students' IDs.

StudentID ClassID TeacherID Replicate

1 1 1 1
2 1 1 1
1 1 1 2
2 1 1 2
3 2 1 3
4 2 1 3
5 3 2 1
6 4 2 2
7 4 2 2
8 4 2 2
6 4 2 3
7 4 2 3
8 4 2 3

Reeza · Posted 09-12-2019 01:13 AM

I think this works

data new;
input Stu_ID ClassID TeacherID;
datalines;
1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2
;

data resampled;
input ClassID TeacherID Replicate;
datalines; 
1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3
;


proc sql;
create table want as
select r.*, n.Stu_id
from resampled as r
full join new as n
on r.classId=n.classId
and r.teacherID=n.teacherID
order by 4, 1, 2, 3
;
quit;

View solution in original post

Reeza · Posted 09-12-2019 12:02 AM

A standard merge didn't work? Can you show what you tried please?

Assuming you merge by classid and teacherID it should work.

Amanda_Lemon · Posted 09-12-2019 12:10 AM

Nope...

Here is what I tried:

data new;
input Stu_ID ClassID TeacherID;
datalines;
1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2
;

data resampled;
input ClassID TeacherID Replicate;
datalines; 
1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3
;


proc sort data = new;
by ClassID TeacherID;
run;
proc sort data = resampled;
by ClassID TeacherID;
run;

data merged;
merge new resampled;
by ClassID TeacherID;
run;

proc print data = merged;
run;

Here is what I got:

Obs	Stu_ID	ClassID	TeacherID	Replicate
1	1	1	1	1
2	2	1	1	2
3	3	2	1	3
4	4	2	1	3
5	5	3	2	1
6	6	4	2	2
7	7	4	2	3
8	8	4	2	3

Reeza · Posted 09-12-2019 12:20 AM

Is that output exactly what you want? Or an example of it?

proc sql;
create table want as
select r.*, n.Stu_id
from resampled as r
cross join new as n
;
quit;

Amanda_Lemon · Posted 09-12-2019 12:41 AM

Your code doesn't produce what I need... The output in the starting post is exactly want I need, yes.

Your code produced this output:

Obs	ClassID	TeacherID	Replicate	Stu_ID
1	1	1	1	1
2	1	1	2	1
3	2	1	3	1
4	3	2	1	1
5	4	2	2	1
6	4	2	3	1
7	1	1	1	2
8	1	1	2	2
9	2	1	3	2
10	3	2	1	2
11	4	2	2	2
12	4	2	3	2
13	1	1	1	3
14	1	1	2	3
15	2	1	3	3
16	3	2	1	3
17	4	2	2	3
18	4	2	3	3
19	1	1	1	4
20	1	1	2	4
21	2	1	3	4
22	3	2	1	4
23	4	2	2	4
24	4	2	3	4
25	1	1	1	5
26	1	1	2	5
27	2	1	3	5
28	3	2	1	5
29	4	2	2	5
30	4	2	3	5
31	1	1	1	6
32	1	1	2	6
33	2	1	3	6
34	3	2	1	6
35	4	2	2	6
36	4	2	3	6
37	1	1	1	7
38	1	1	2	7
39	2	1	3	7
40	3	2	1	7
41	4	2	2	7
42	4	2	3	7
43	1	1	1	8
44	1	1	2	8
45	2	1	3	8
46	3	2	1	8
47	4	2	2	8
48	4	2	3	8

But what I need is to add StudentID to each case (line) in Data set #2. So, if, say, a class had 3 students, then instead of one line from Data set #2 there will be three, and they will differ only in StudentID.

Thank you so much for helping!

Reeza · Posted 09-12-2019 01:13 AM

I think this works

data new;
input Stu_ID ClassID TeacherID;
datalines;
1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2
;

data resampled;
input ClassID TeacherID Replicate;
datalines; 
1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3
;


proc sql;
create table want as
select r.*, n.Stu_id
from resampled as r
full join new as n
on r.classId=n.classId
and r.teacherID=n.teacherID
order by 4, 1, 2, 3
;
quit;

Amanda_Lemon · Posted 09-12-2019 01:22 AM

Yes! It works! Thank you!!!

Just one question to make sure I understand the code -- what do these mean?

n.
r.
order by 4, 1, 2, 3

Reeza · Posted 09-12-2019 01:27 AM

n/r are table alias so you don't have to type out the whole table name. This allow you to reference columns from a specific data set, ie n.classID refers to the classID from the new table.

If you don't do that you need to type out the full name, new.table

"ORDER BY" sorts the data, 4, 1, 2, 3 refer to the columns in the data set but I was too lazy to type them out.

I usually never use that except on here or for testing to be honest, I prefer to name the columns for sorting but it's late and I'm lazy 🙂

Amanda_Lemon · Posted 09-12-2019 01:31 AM

I see! Thank you very much!!

Merging data sets

Re: Merging data sets

Re: Merging data sets

Re: Merging data sets

Re: Merging data sets

Re: Merging data sets

Re: Merging data sets

Re: Merging data sets

Re: Merging data sets

Re: Merging data sets

Registration is open

SAS Training: Just a Click Away