Re: merging

Dhana18 · Posted 06-13-2019 11:04 AM

Good morning,

I have 2 data sets set like this

id name age sex

Data set 1:

1 sima 23 f

2 shyam 34 m

3 ana 35 f

4 jacob 26 m

5 chris 34 m

data set 2:

id

1

3

5

how can i have output like this?

1 sima 23 f

3 ana 35 f

5 chris 34 m

ballardw · Posted 06-13-2019 11:30 AM

There are several ways.

Data step merge would require sorting the data on the matching variable (id) and use of a BY statement.

proc sort data=set1;
  by id;
run;

proc sort data=set2;
  by id;
run;

/* merge*/
data want;
   merge set1 (in=in1) 
         set2 (in=in2)
   ;
   by id;
   if in1 and in2;
run;

The IN= data set option creates temporary variables that indicate if the current record has values from that data set, 1 when true and 0 when false.

So if both are true you have the matching values.

Or proc sql:

proc sql;
   create table want as
   select b.* 
   from set2 as a
        left join
        set1 as b
        on a.id=b.id
   ;
quit;

If you have repeats of the ID in one or both data sets then more consideration may be needed.

Dhana18 · Posted 06-13-2019 12:11 PM

I used the first code you sen to me; the log says this;

NOTE: There were 9346 observations read from the data set WORK.CHECKING_A.
NOTE: There were 2576 observations read from the data set WORK.CHRIS_CHECKING_A.
NOTE: The data set WORK.WANT has 0 observations and 58 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

And the result is only name of the variables with no values

Reeza · Posted 06-13-2019 12:21 PM

This means your data isn't matching at all for some reason, for example the cases differ, ie VARA is not the same as varA. This is a data problem though, not a code problem.

Dhana18 · Posted 06-13-2019 12:31 PM

Do all of the variables name match in both datasets even one of the data set has only ids?

Reeza · Posted 06-13-2019 12:49 PM

Post your code and log.

Dhana18 · Posted 06-13-2019 01:07 PM

data checking_A;
set checking;
ID=put(event_id, best11.);/*converting numeric CL_eventid to character EventID as it is in the other dataset*/
run;
proc sort data=checking_a;
by ID;
run;
data chris_checking_A(keep=EventID);
length EventID $11.;
set chris_checking;
where CL_Facility_location="MIL-01";
run;

data chris_checking_B;
set chris_checking_A;
rename eventid=id;
label eventid=id;
run;
proc sort data=chris_checking_B;
by ID;
run;

proc sql;
create table CheckingAA as
select id.*
from checking_A as a
left join
chris_checking_B as b
on a.id=b.id
;

quit;

547 proc sql;
548 create table CheckingAA as
549 select id.*
550 from checking_A as a
551 left join
552 chris_checking_B as b
553 on a.id=b.id
554 ;
ERROR: Could not expand id.*, correlation name not found.
ERROR: Ambiguous reference, column id is in more than one table.
555 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds

cpu time 0.03 seconds

Reeza · Posted 06-13-2019 01:18 PM

ID.* is not a table, fix that error and that step will run but no idea if that's the step with the mistake because you didn't include he full log.

The SQL syntax is tableAlias.* or tableName.* to select all variables. You have no tableAlias or tableName set as *. Replace that ID with either A or B depending on which data set you want data from. Or remove ID if you want from both fields, but if you have have variables with the same name that won't work either.

Ksharp · Posted 06-14-2019 08:57 AM

proc sql;
   create table want as
   select * 
   from set1 
   where id in (select id from set2)
   ;
quit;

Registration is open

SAS Training: Just a Click Away