Dear community,
I am trying to delete a companyid (compid) if one of the employee last and first name matches with another employee last and first name in two different dataset. Of course, these are sample datasets. Even if an employee first and last name matches in both datasets then I want to delete that company id (compid field), there are hundreds of employees in real data. I have sample code which I tried and I am getting error. Please give me any suggestions.
/* sample dataset1 birthday*/
data birthday;
input compid birthdat date7.
lastname $18.
firstnam $15.
phone $4.;
format birthdat mmddyy10.;
datalines;
459287 05JAN39 RODRIGUES JUAN 5879
127845 25DEC48 MEDER VLADIMIR 6231
254896 06APR54 TAYLOR-HUNYADI ITO 0231
;
run;
/*Sample dataset2 tempbirthday*/
proc sql;
create table WORK.TEMPBIRTHDAY( bufsize=65536 )
(
compid num,
birthdat num format=MMDDYY10.,
lastname char(18),
firstnam char(15),
phone char(4)
);
insert into WORK.TEMPBIRTHDAY
values(459287,'04MAY98'd, 'NISHIMATSU-LYNCH','RICHARD', '6778')
values(765112,'04MAY98'd, 'SMITH','ROBERT', '')
values(219776,'15APR98'd, 'PASTORELLI','ZORA','')
values(245233,'10APR98'd, 'ALI','SADIQ','')
values(245234,'10APR98'd, 'MEHAILESCU','NADIA','')
values(326721,'01MAY98'd, 'CALHOUN','WILLIS','')
values(459287,'05JAN39'd, 'RODRIGUES', 'JUAN', '5879')
values(127845,'25DEC48'd, 'MEDER', 'VLADIMIR', '6231')
values(254896,'06APR54'd, 'TAYLOR-HUNYADI', 'ITO', '0231')
;
quit;
/*sort both datasets by compid */
proc sort data=birthday ;
by compid;
proc sort data=tempbirthday (rename=( birthdat=birthdat1 lastname=lastname1 firstnam=firstnam1 phone=phone1));
by compid;
run;
/* delete the employees who is from same compid (companyid) */
data total1;
merge birthday (in=ina) tempbirthday(in=inb);
by compid;
if ina and inb then do;
delete compid;
end;
run;
Thanks in advance
The DELETE statement deletes a row of data not just a single variable. If this is what you want then just remove the variable name from the delete statement and your code will work.
You can't delete a variable (the column) from a specific row (observation). You can set the variable in a specific row ("the cell") to missing or change it but not delete a "cell". SAS tables are rectangular.
Yes I agree Patrick. In the sample data case, I want to remove company id (compid) '459287' as it meets the criteria of present in both datasets. Rest of the data after merging, could be the same. The output dataset should look like this
Thanks,
Given the required logic you describe not sure why compid 127845 would remain in your desired result.
Below the code for the logic as I understand it.
data total1;
merge
birthday (in=ina)
tempbirthday(in=inb);
by compid;
if ina and inb and lastname=lastname1 and firstnam=firstnam1 then delete;
run;
Patrick, you are correct. Compid 127845 shouldn't be there in the dataset. In fact, all the compids 459287, 127845, 254896 shouldn't be seen in the output dataset as there is at least one matching observation (compid on both datasets). The highlighted record should be deleted as well (though the first and last names are different, I don't want to see if the two different employees work for the same company)
which will be accomplished by using this code
/* delete the employees who is from same compid */
data total1;
merge
birthday (in=ina)
tempbirthday(in=inb);
by compid;
if not ina and inb ;
run;
(NOT INA AND INB) is different than NOT (INA and INB).
This first just finds the observations that exist in B that are not in A.
The second also includes the cases that exist in A that are not in B.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.