Re: Active duplicate records

ren2010 · Posted 11-23-2010 01:58 PM

Hi,

I am facing a problem,where I have an employee file;this file is a combination of employee data from different employers.So the chances are that an employee number can be shared between multiple employers.I am trying to set a flag for those employee numbers who are only actively working in the company.

I did the following:
sorted the file on emp_no and active_dt field.

Then if first.emp_no ne last.emp_no(only for employees whose active_dt >= today())) then set the multi_emp to "Y".

But I am facing the following issue is like this:
for eg:
the emp_no 11111 is active in company ABC
and emp_no 1111 is non-active in company ZZZ.

emp_NO:11111 active_dt:12/31/2099 company name:xxxx
emp_NO:11111active_dt: 12/31/2001 company name:vvvvv

as per the above logic these records are getting flagged as multi_emp = 'Y'.

I only want employees who are active and share same emp_nos;
like the following:

emp_NO:22222 active_dt:12/31/2099 company name:xxxx
emp_NO:22222 active_dt: 12/31/2020 company name:vvvvv
emp_NO:22222 active_dt:12/31/2099 company name:KKKK

Please share your thoughts.
Thanks in advance.

SPR · Posted 11-23-2010 02:08 PM

Hello Ren2010,

If I've understood you correctly it is necessary to add a company name to your sorting sequence, i.e. company_name emp_no active_dt.

Sincerely,
SPR

sbb · Posted 11-23-2010 02:13 PM

Suggest you share whatever code piece you may have tried that is not working?

And, suggesting also that you try adding this code line so you can tell when you have FIRST. and/or LAST. (where "nn" can be unique and you can insert multiples at various DATA step locations to see effect):

PUTLOG '>DIAG-nn>' / _ALL_;

For your interest, I believe that BY GROUP processing, while having the proper sorted file and also having the BY statement variable optimized, then you can identify unique conditions / characteristics for your data and set a flag accordingly.

It will work-out as long as you find a suitable combination and sequence for your BY variables, then test those conditions while using the PUTLOG example.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

by group processing site:sas.com

ren2010 · Posted 11-23-2010 04:06 PM

My code is below:

proc sort data=combine_mbr
out=combine_mbr_SORT;
by emp_no actv_dt;
run;

data mbr_error_undup ;

set combine_hics;
by emp_no actv_dt;

if emp_no ne ' '
then do;
if first.emp_no ne last.emp_no
then Multi_mbrs='Y';
end;

if last.emp_no then output;

run;

sbb · Posted 11-23-2010 04:09 PM

Your FIRST. and LAST. processing logic is flawed (you may be thinking of LAG instead?) -- the condition is either a 1 or a 0, which leads back to the idea of adding the PUTLOG statements for self-diagnosis. Also, suggest reading more about BY GROUP PROCESSING in the SAS support website references and SAS-hosted DOC.

Scott Barry
SBBWorks, Inc.

Active duplicate records