Merge two datasets to delete a new variable

buddha_d · Posted 08-14-2021 10:59 PM

Dear community,

I am trying to delete a companyid (compid) if one of the employee last and first name matches with another employee last and first name in two different dataset. Of course, these are sample datasets. Even if an employee first and last name matches in both datasets then I want to delete that company id (compid field), there are hundreds of employees in real data. I have sample code which I tried and I am getting error. Please give me any suggestions.

/* sample dataset1 birthday*/
data birthday;
   input compid birthdat date7.
         lastname $18.
         firstnam $15.
         phone $4.;
		 format birthdat mmddyy10.;
         datalines;
459287 05JAN39 RODRIGUES        JUAN           5879
127845 25DEC48 MEDER            VLADIMIR       6231
254896 06APR54 TAYLOR-HUNYADI   ITO            0231
;
run;

/*Sample dataset2 tempbirthday*/
proc sql;
create table WORK.TEMPBIRTHDAY( bufsize=65536 )
  (
   compid num,
   birthdat num format=MMDDYY10.,
   lastname char(18),
   firstnam char(15),
   phone char(4)
  );
insert into WORK.TEMPBIRTHDAY
      values(459287,'04MAY98'd, 'NISHIMATSU-LYNCH','RICHARD', '6778')
      values(765112,'04MAY98'd, 'SMITH','ROBERT', '')
      values(219776,'15APR98'd, 'PASTORELLI','ZORA','')
      values(245233,'10APR98'd, 'ALI','SADIQ','')
      values(245234,'10APR98'd, 'MEHAILESCU','NADIA','')
      values(326721,'01MAY98'd, 'CALHOUN','WILLIS','')
	  values(459287,'05JAN39'd, 'RODRIGUES', 'JUAN', '5879')
	  values(127845,'25DEC48'd, 'MEDER', 'VLADIMIR', '6231')
	  values(254896,'06APR54'd, 'TAYLOR-HUNYADI', 'ITO', '0231')
;

quit;

/*sort both datasets by compid */
proc sort data=birthday ;
	by compid;
proc sort data=tempbirthday (rename=( birthdat=birthdat1 lastname=lastname1 firstnam=firstnam1 phone=phone1));
	by compid;
run;

/* delete the employees who is from same compid (companyid) */
data total1;
	merge birthday (in=ina) tempbirthday(in=inb);
	by compid;
	if ina and inb then do;
	delete compid;
	end;
run;

Thanks in advance

Patrick · Posted 08-14-2021 11:23 PM

@buddha_d

The DELETE statement deletes a row of data not just a single variable. If this is what you want then just remove the variable name from the delete statement and your code will work.

You can't delete a variable (the column) from a specific row (observation). You can set the variable in a specific row ("the cell") to missing or change it but not delete a "cell". SAS tables are rectangular.

buddha_d · Posted 08-15-2021 12:06 AM

Yes I agree Patrick. In the sample data case, I want to remove company id (compid) '459287' as it meets the criteria of present in both datasets. Rest of the data after merging, could be the same. The output dataset should look like this

Thanks,

Patrick · Posted 08-15-2021 12:28 AM

Given the required logic you describe not sure why compid 127845 would remain in your desired result.

Below the code for the logic as I understand it.

data total1;
  merge 
    birthday (in=ina)
    tempbirthday(in=inb);
  by compid;
  if ina and inb and lastname=lastname1 and firstnam=firstnam1 then delete;
run;

buddha_d · Posted 08-15-2021 12:58 AM

Patrick, you are correct. Compid 127845 shouldn't be there in the dataset. In fact, all the compids 459287, 127845, 254896 shouldn't be seen in the output dataset as there is at least one matching observation (compid on both datasets). The highlighted record should be deleted as well (though the first and last names are different, I don't want to see if the two different employees work for the same company)

which will be accomplished by using this code

/* delete the employees who is from same compid */
data total1;
merge
birthday (in=ina)
tempbirthday(in=inb);
by compid;
if not ina and inb ;
run;

Tom · Posted 08-15-2021 11:35 AM

(NOT INA AND INB) is different than NOT (INA and INB).

This first just finds the observations that exist in B that are not in A.

The second also includes the cases that exist in A that are not in B.

Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Registration is open

Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Re: Merge two datasets to delete a new variable

Registration is open

SAS Training: Just a Click Away