BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Amanda_Lemon
Quartz | Level 8

Hello, 

 

I am trying to merge two data sets and so far unsuccessfully... I would appreciate any help!

 

Have: Data set #1 (students are nested within classes, which are in turn nested within teachers)

 

StudentID ClassID TeacherID 

1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2

 

Data set #2: For this data set, 1 class per teacher was randomly selected from Data set #1, and the procedure was repeated 3 times. 

 

ClassID TeacherID Replicate

1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3

 

Want: To add student data to Data set #2. So I need to repeat each row as many times as there are students in class, and add students' IDs.

 

StudentID ClassID TeacherID Replicate

1 1 1 1
2 1 1 1
1 1 1 2
2 1 1 2
3 2 1 3
4 2 1 3
5 3 2 1
6 4 2 2
7 4 2 2
8 4 2 2
6 4 2 3
7 4 2 3
8 4 2 3

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I think this works

 

data new;
input Stu_ID ClassID TeacherID;
datalines;
1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2
;

data resampled;
input ClassID TeacherID Replicate;
datalines; 
1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3
;


proc sql;
create table want as
select r.*, n.Stu_id
from resampled as r
full join new as n
on r.classId=n.classId
and r.teacherID=n.teacherID
order by 4, 1, 2, 3
;
quit;

View solution in original post

8 REPLIES 8
Reeza
Super User
A standard merge didn't work? Can you show what you tried please?

Assuming you merge by classid and teacherID it should work.
Amanda_Lemon
Quartz | Level 8

Nope...

 

Here is what I tried: 

 

data new;
input Stu_ID ClassID TeacherID;
datalines;
1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2
;

data resampled;
input ClassID TeacherID Replicate;
datalines; 
1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3
;


proc sort data = new;
by ClassID TeacherID;
run;
proc sort data = resampled;
by ClassID TeacherID;
run;

data merged;
merge new resampled;
by ClassID TeacherID;
run;

proc print data = merged;
run;

Here is what I got: 

 

Obs

Stu_ID

ClassID

TeacherID

Replicate

1

1

1

1

1

2

2

1

1

2

3

3

2

1

3

4

4

2

1

3

5

5

3

2

1

6

6

4

2

2

7

7

4

2

3

8

8

4

2

3

Reeza
Super User

Is that output exactly what you want? Or an example of it?

 

proc sql;
create table want as
select r.*, n.Stu_id
from resampled as r
cross join new as n
;
quit;
Amanda_Lemon
Quartz | Level 8

Your code doesn't produce what I need... The output in the starting post is exactly want I need, yes.  

 

Your code produced this output: 

 

Obs

ClassID

TeacherID

Replicate

Stu_ID

1

1

1

1

1

2

1

1

2

1

3

2

1

3

1

4

3

2

1

1

5

4

2

2

1

6

4

2

3

1

7

1

1

1

2

8

1

1

2

2

9

2

1

3

2

10

3

2

1

2

11

4

2

2

2

12

4

2

3

2

13

1

1

1

3

14

1

1

2

3

15

2

1

3

3

16

3

2

1

3

17

4

2

2

3

18

4

2

3

3

19

1

1

1

4

20

1

1

2

4

21

2

1

3

4

22

3

2

1

4

23

4

2

2

4

24

4

2

3

4

25

1

1

1

5

26

1

1

2

5

27

2

1

3

5

28

3

2

1

5

29

4

2

2

5

30

4

2

3

5

31

1

1

1

6

32

1

1

2

6

33

2

1

3

6

34

3

2

1

6

35

4

2

2

6

36

4

2

3

6

37

1

1

1

7

38

1

1

2

7

39

2

1

3

7

40

3

2

1

7

41

4

2

2

7

42

4

2

3

7

43

1

1

1

8

44

1

1

2

8

45

2

1

3

8

46

3

2

1

8

47

4

2

2

8

48

4

2

3

8

 

But what I need is to add StudentID to each case (line) in Data set #2. So, if, say, a class had 3 students, then instead of one line from Data set #2 there will be three, and they will differ only in StudentID. 

 

Thank you so much for helping!

Reeza
Super User

I think this works

 

data new;
input Stu_ID ClassID TeacherID;
datalines;
1 1 1
2 1 1
3 2 1
4 2 1
5 3 2
6 4 2
7 4 2
8 4 2
;

data resampled;
input ClassID TeacherID Replicate;
datalines; 
1 1 1
1 1 2
2 1 3
3 2 1
4 2 2
4 2 3
;


proc sql;
create table want as
select r.*, n.Stu_id
from resampled as r
full join new as n
on r.classId=n.classId
and r.teacherID=n.teacherID
order by 4, 1, 2, 3
;
quit;
Amanda_Lemon
Quartz | Level 8
Yes! It works! Thank you!!!

Just one question to make sure I understand the code -- what do these mean?

n.
r.
order by 4, 1, 2, 3
Reeza
Super User
n/r are table alias so you don't have to type out the whole table name. This allow you to reference columns from a specific data set, ie n.classID refers to the classID from the new table.

If you don't do that you need to type out the full name, new.table

"ORDER BY" sorts the data, 4, 1, 2, 3 refer to the columns in the data set but I was too lazy to type them out.

I usually never use that except on here or for testing to be honest, I prefer to name the columns for sorting but it's late and I'm lazy 🙂
Amanda_Lemon
Quartz | Level 8
I see! Thank you very much!!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1133 views
  • 2 likes
  • 2 in conversation