Solved: How do i Loop through the library and delete records

anandrc · Posted 04-17-2019 07:07 AM

My requirement is to delete employee records from datasets in library (Emp) by scanning all the datasets (more than 2000 datasets and

most of them are huge datasets with 1 million plus observations). Please note that not all datasets has Employee ID variable

The list of Employee ID's to be deleted will be provided in a separate dataset every month.

This dataset has one variable Emp_ID which is $9. and 9 length.

The issue is in the library Emp which has different name and lengths for the same variable in different datasets.

Example:

Emp.Address_mon dataset has Employee_ID.

Emp.Trans_mon has EmpID

Emp.Contact_mon has Emp_ID ans so on.

How do I loop through and scan the Emp_ID in the library using macros and delete the record in a efficient way.

Any suggestions

Amir · Posted 04-17-2019 08:24 AM

1) Remember to have backups you can return to.

2) The following code assumes the employee id column in each emp table can be identified as the only one containing 'emp' and 'id'.

3) It generates proc sql statements for each table, at the moment they are just create table and select statements.

4) Run the code produced in the 'full-file-path.sas'.

5) If you are happy with the results then you can change the sql to use a delete statement instead.

/* replace 'full-file-path.sas' and 'monthly_deletes' */
data _null_;
   file 'full-file-path.sas';

   set sashelp.vcolumn;

   where     upcase(libname) eq       'EMP'
         and upcase(name)    contains ('EMP')
         and upcase(name)    contains ('ID')  
   ;

   put 'proc sql;';
   put 'create table work.' memname 'as';
   put 'select * from emp.' memname;
   put 'where ' name 'in (select emp_id from monthly_deletes);';
   put 'quit;';
run;

Regards,

Amir.

View solution in original post

Amir · Posted 04-17-2019 08:24 AM

1) Remember to have backups you can return to.

2) The following code assumes the employee id column in each emp table can be identified as the only one containing 'emp' and 'id'.

3) It generates proc sql statements for each table, at the moment they are just create table and select statements.

4) Run the code produced in the 'full-file-path.sas'.

5) If you are happy with the results then you can change the sql to use a delete statement instead.

/* replace 'full-file-path.sas' and 'monthly_deletes' */
data _null_;
   file 'full-file-path.sas';

   set sashelp.vcolumn;

   where     upcase(libname) eq       'EMP'
         and upcase(name)    contains ('EMP')
         and upcase(name)    contains ('ID')  
   ;

   put 'proc sql;';
   put 'create table work.' memname 'as';
   put 'select * from emp.' memname;
   put 'where ' name 'in (select emp_id from monthly_deletes);';
   put 'quit;';
run;

Regards,

Amir.

anandrc · Posted 04-17-2019 11:21 AM

Thanks Amir. I did test this on a sample library of 10 datasets and it works. Will need to test this on the entire library and take a not on the processing time

ballardw · Posted 04-17-2019 10:43 AM

@anandrc wrote:

My requirement is to delete employee records from datasets in library (Emp) by scanning all the datasets (more than 2000 datasets and

most of them are huge datasets with 1 million plus observations). Please note that not all datasets has Employee ID variable

The list of Employee ID's to be deleted will be provided in a separate dataset every month.

This dataset has one variable Emp_ID which is $9. and 9 length.

The issue is in the library Emp which has different name and lengths for the same variable in different datasets.

Example:

Emp.Address_mon dataset has Employee_ID.

Emp.Trans_mon has EmpID

Emp.Contact_mon has Emp_ID ans so on.

　

How do I loop through and scan the Emp_ID in the library using macros and delete the record in a efficient way.

Any suggestions

I would say the first step is to go back and 1) change everything creating /using different names to the same variable name and 2) make sure the lengths are consistent.

Otherwise you will have to create an maintain an additional table of which data set uses which name. If you have thousands of data sets I suspect that you may have more with different variable names appearing frequently.

If the length differences are actually using different values such as 12-345 in one set and 12-345-7896 in a different one then you have a whole bunch of other rules to maintain to come close to this. If the values aren't different that should not be an issue unless some are actually numeric and others are actually character. That will cause another different class of headaches as you will need somewhat different code to handle those as two different approaches.

How do i Loop through the library and delete records

Re: How do i Loop through the library and delete records

Re: How do i Loop through the library and delete records

Re: How do i Loop through the library and delete records

Re: How do i Loop through the library and delete records