My requirement is to delete employee records from datasets in library (Emp) by scanning all the datasets (more than 2000 datasets and
most of them are huge datasets with 1 million plus observations). Please note that not all datasets has Employee ID variable
The list of Employee ID's to be deleted will be provided in a separate dataset every month.
This dataset has one variable Emp_ID which is $9. and 9 length.
The issue is in the library Emp which has different name and lengths for the same variable in different datasets.
Example:
Emp.Address_mon dataset has Employee_ID.
Emp.Trans_mon has EmpID
Emp.Contact_mon has Emp_ID ans so on.
How do I loop through and scan the Emp_ID in the library using macros and delete the record in a efficient way.
Any suggestions
1) Remember to have backups you can return to.
2) The following code assumes the employee id column in each emp table can be identified as the only one containing 'emp' and 'id'.
3) It generates proc sql statements for each table, at the moment they are just create table and select statements.
4) Run the code produced in the 'full-file-path.sas'.
5) If you are happy with the results then you can change the sql to use a delete statement instead.
/* replace 'full-file-path.sas' and 'monthly_deletes' */
data _null_;
file 'full-file-path.sas';
set sashelp.vcolumn;
where upcase(libname) eq 'EMP'
and upcase(name) contains ('EMP')
and upcase(name) contains ('ID')
;
put 'proc sql;';
put 'create table work.' memname 'as';
put 'select * from emp.' memname;
put 'where ' name 'in (select emp_id from monthly_deletes);';
put 'quit;';
run;
Regards,
Amir.
1) Remember to have backups you can return to.
2) The following code assumes the employee id column in each emp table can be identified as the only one containing 'emp' and 'id'.
3) It generates proc sql statements for each table, at the moment they are just create table and select statements.
4) Run the code produced in the 'full-file-path.sas'.
5) If you are happy with the results then you can change the sql to use a delete statement instead.
/* replace 'full-file-path.sas' and 'monthly_deletes' */
data _null_;
file 'full-file-path.sas';
set sashelp.vcolumn;
where upcase(libname) eq 'EMP'
and upcase(name) contains ('EMP')
and upcase(name) contains ('ID')
;
put 'proc sql;';
put 'create table work.' memname 'as';
put 'select * from emp.' memname;
put 'where ' name 'in (select emp_id from monthly_deletes);';
put 'quit;';
run;
Regards,
Amir.
@anandrc wrote:
My requirement is to delete employee records from datasets in library (Emp) by scanning all the datasets (more than 2000 datasets and
most of them are huge datasets with 1 million plus observations). Please note that not all datasets has Employee ID variable
The list of Employee ID's to be deleted will be provided in a separate dataset every month.
This dataset has one variable Emp_ID which is $9. and 9 length.
The issue is in the library Emp which has different name and lengths for the same variable in different datasets.
Example:
Emp.Address_mon dataset has Employee_ID.
Emp.Trans_mon has EmpID
Emp.Contact_mon has Emp_ID ans so on.
How do I loop through and scan the Emp_ID in the library using macros and delete the record in a efficient way.
Any suggestions
I would say the first step is to go back and 1) change everything creating /using different names to the same variable name and 2) make sure the lengths are consistent.
Otherwise you will have to create an maintain an additional table of which data set uses which name. If you have thousands of data sets I suspect that you may have more with different variable names appearing frequently.
If the length differences are actually using different values such as 12-345 in one set and 12-345-7896 in a different one then you have a whole bunch of other rules to maintain to come close to this. If the values aren't different that should not be an issue unless some are actually numeric and others are actually character. That will cause another different class of headaches as you will need somewhat different code to handle those as two different approaches.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.