Solved: what is the best - merge or set

Stalk · Posted 08-25-2021 10:51 PM

What is the best statement SET or Merge to get the desired PE output. I need to join the tables by PID and retain only the records from table P. Also need to have a field called source in PE to identify if the record is in both or P.

Thank you

Table P			Table E
PID	SID1		PID	SID2
1	123		1	124
2	24		2	24
3	43		4	33
5	26		5	23
			6	15
Table PE
PID	SID1	SID2
1	123	124
2	24	24
3	43
5	26	23

Stalk · Posted 08-26-2021 07:08 AM

I did the exact same thing. But I am getting Cartesian product.
Suppose my Table P has 17K records and table E has 16500 records. My merged dataset has 20684 records, even after keeping the records from P.
I want my output to have 17K records with SID from both tables. How?

View solution in original post

ballardw · Posted 08-25-2021 11:17 PM

@Stalk wrote:

What is the best statement SET or Merge to get the desired PE output. I need to join the tables by PID and retain only the records from table P. Also need to have a field called source in PE to identify if the record is in both or P.

Thank you

Table P Table E

PID SID1 PID SID2

1 123 1 124

2 24 2 24

3 43 4 33

5 26 5 23

6 15

Table PE

PID SID1 SID2

1 123 124

2 24 24

3 43

5 26 23

When you want to MATCH records from two (or more) data sets based on the value of a common variable then you want a Merge with a By;

data p;
  input Pid sid1;
datalines;
1 1
2 2
;

data e;
  input enrolid sid2;
datalines;
1  11
3  33
;

data merged;
  merge p (in=in1)
        e (in=in2)
  ;
  /* the first IF keeps records that match the id field
     in the first set
  */
  if in1; 
  If in1 and in2 then source='Both';
  else if in1 then source='P';
run;

The data set option IN= creates a temporary variable valued 1 or 0 if the data set contributes to the current record. SAS uses 1 for true and 0 for false so can be used in If as above.

Stalk · Posted 08-26-2021 07:08 AM

I did the exact same thing. But I am getting Cartesian product.
Suppose my Table P has 17K records and table E has 16500 records. My merged dataset has 20684 records, even after keeping the records from P.
I want my output to have 17K records with SID from both tables. How?

Stalk · Posted 08-26-2021 07:11 AM

Actually never mind. I know my mistake now. I merged by PID.

AndreaVianello · Posted 08-26-2021 08:26 AM

what is the best - merge or set?

NO, SQL Left join !

data p;
input Pid sid1;
datalines;
1 123
2 24
3 43
5 26
;

data e;
input pid sid2;
datalines;
1 124
2 24
4 33
5 23
6 15
;
run;

proc sql;
create table PE as
select p.pid, sid1, sid2
from p left join e
on p.pid=e.pid;
quit;

what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

Re: what is the best - merge or set

Click image to register for webinar

Classroom Training Available!