BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anandrc
Obsidian | Level 7

My requirement is to delete employee records from datasets in library (Emp) by scanning all the datasets (more than 2000 datasets and

most of them are huge datasets with 1 million plus observations). Please note that not all datasets has Employee ID variable

 

 

The list of Employee ID's to be deleted will be provided in a separate dataset every month.

 

This dataset has one variable Emp_ID which is $9. and 9 length.

The issue is in the library Emp which has different name and lengths for the same variable in different datasets.

Example:

Emp.Address_mon dataset has Employee_ID.

Emp.Trans_mon has EmpID

Emp.Contact_mon has Emp_ID ans so on.

 

How do I loop through and scan the Emp_ID in the library using macros and delete the record in a efficient way.

Any suggestions

1 ACCEPTED SOLUTION

Accepted Solutions
Amir
PROC Star

1) Remember to have backups you can return to.

 

2) The following code assumes the employee id column in each emp table can be identified as the only one containing 'emp' and 'id'.

 

3) It generates proc sql statements for each table, at the moment they are just create table and select statements.

 

4) Run the code produced in the 'full-file-path.sas'.

 

5) If you are happy with the results then you can change the sql to use a delete statement instead.

 

/* replace 'full-file-path.sas' and 'monthly_deletes' */
data _null_;
   file 'full-file-path.sas';

   set sashelp.vcolumn;

   where     upcase(libname) eq       'EMP'
         and upcase(name)    contains ('EMP')
         and upcase(name)    contains ('ID')  
   ;

   put 'proc sql;';
   put 'create table work.' memname 'as';
   put 'select * from emp.' memname;
   put 'where ' name 'in (select emp_id from monthly_deletes);';
   put 'quit;';
run;

 

Regards,

Amir.

View solution in original post

3 REPLIES 3
Amir
PROC Star

1) Remember to have backups you can return to.

 

2) The following code assumes the employee id column in each emp table can be identified as the only one containing 'emp' and 'id'.

 

3) It generates proc sql statements for each table, at the moment they are just create table and select statements.

 

4) Run the code produced in the 'full-file-path.sas'.

 

5) If you are happy with the results then you can change the sql to use a delete statement instead.

 

/* replace 'full-file-path.sas' and 'monthly_deletes' */
data _null_;
   file 'full-file-path.sas';

   set sashelp.vcolumn;

   where     upcase(libname) eq       'EMP'
         and upcase(name)    contains ('EMP')
         and upcase(name)    contains ('ID')  
   ;

   put 'proc sql;';
   put 'create table work.' memname 'as';
   put 'select * from emp.' memname;
   put 'where ' name 'in (select emp_id from monthly_deletes);';
   put 'quit;';
run;

 

Regards,

Amir.

anandrc
Obsidian | Level 7
Thanks Amir. I did test this on a sample library of 10 datasets and it works. Will need to test this on the entire library and take a not on the processing time
ballardw
Super User

@anandrc wrote:

My requirement is to delete employee records from datasets in library (Emp) by scanning all the datasets (more than 2000 datasets and

most of them are huge datasets with 1 million plus observations). Please note that not all datasets has Employee ID variable

 

 

The list of Employee ID's to be deleted will be provided in a separate dataset every month.

 

This dataset has one variable Emp_ID which is $9. and 9 length.

The issue is in the library Emp which has different name and lengths for the same variable in different datasets.

Example:

Emp.Address_mon dataset has Employee_ID.

Emp.Trans_mon has EmpID

Emp.Contact_mon has Emp_ID ans so on.

 

How do I loop through and scan the Emp_ID in the library using macros and delete the record in a efficient way.

Any suggestions


I would say the first step is to go back and 1) change everything creating /using different names to the same variable name and 2) make sure the lengths are consistent.

 

Otherwise you will have to create an maintain an additional table of which data set uses which name. If you have thousands of data sets I suspect that you may have more with different variable names appearing frequently.

If the length differences are actually using different values such as 12-345 in one set and 12-345-7896 in a different one then you have a whole bunch of other rules to maintain to come close to this. If the values aren't different that should not be an issue unless some are actually numeric and others are actually character. That will cause another different class of headaches as you will need somewhat different code to handle those as two different approaches.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1894 views
  • 2 likes
  • 3 in conversation