Hi all,
I have a dataset which has date_of_birth of the customer. As seen in the data, few icustomerid are same and they have a different date_of_birth. I want the output data to be considering the odd date_of_birth as bad dates. For e.g., For icustomerid 479864, two DOB is 01DEC67 and 1 is 02DEC68. So I want to mark the 02DEC68 as bad date as it is not meeting the consistency with other DOBs. Also icustomerid 374534 has one odd DOB i.e., 12SEP56 so that should be considered as bad date. I want to identify which dates of birth (DOB) appear the most in a list, and mark any DOBs that appear fewer times than the most common one with a 1. For example, if the list contains the following DOBs: 1/1/2000 2/2/2000 1/1/2000 The output should be: 1/1/2000 0 2/2/2000 1. Thanks
Sample Dataset Data Bad_Dates; infile cards expandtabs; input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth $ Num_of_accs ;
datalines ; 355774 316997113 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2 355774 320478910 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2 375818 379299084 Lynn Summerhayes 6Crossways Jaywick CO152NB 12MAR63 1 479864 328107651 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3 479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3 479864 328908108 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3 374534 157389842 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4 374534 157389826 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4 374534 157389800 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4 374534 211701685 Sandra Motram 12PughRoad Bradley WV148PT 12SEP56 4 ; run;
DATA bad_dates;
SET Linked_acc;
IF icustomerid=lag(icustomerid) and date_of_birth ne lag(date_of_birth) THEN
bad_date=1;
ELSE
bad_date=0;
RUN;
... View more