Hi all,
I have a dataset which has date_of_birth of the customer. As seen in the data, few icustomerid are same and they have a different date_of_birth. I want the output data to be considering the odd date_of_birth as bad dates. For e.g., For icustomerid 479864, two DOB is 01DEC67 and 1 is 02DEC68. So I want to mark the 02DEC68 as bad date as it is not meeting the consistency with other DOBs. Also icustomerid 374534 has one odd DOB i.e., 12SEP56 so that should be considered as bad date. I want to identify which dates of birth (DOB) appear the most in a list, and mark any DOBs that appear fewer times than the most common one with a 1. For example, if the list contains the following DOBs: 1/1/2000 2/2/2000 1/1/2000 The output should be: 1/1/2000 0 2/2/2000 1. Thanks
Sample Dataset
Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth $ Num_of_accs ;
datalines ;
355774 316997113 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
355774 320478910 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
375818 379299084 Lynn Summerhayes 6Crossways Jaywick CO152NB 12MAR63 1
479864 328107651 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 328908108 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
374534 157389842 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389826 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389800 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 211701685 Sandra Motram 12PughRoad Bradley WV148PT 12SEP56 4
;
run;
DATA bad_dates;
SET Linked_acc;
IF icustomerid=lag(icustomerid) and date_of_birth ne lag(date_of_birth) THEN
bad_date=1;
ELSE
bad_date=0;
RUN;
I have modified your data so that it includes the case where some dates occur twice and another date occurs twice.
Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;
datalines ;
355774 316997113 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
355774 320478910 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
375818 379299084 Lynn Summerhayes 6Crossways Jaywick CO152NB 12MAR63 1
479864 328107651 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 328908108 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
374534 157389842 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389826 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389800 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 211701685 Sandra Motram 12PughRoad Bradley WV148PT 12SEP56 4
;
proc sort data=bad_dates;
by icustomerid date_of_birth;
run;
proc freq data=bad_dates ;
by icustomerid;
tables date_of_birth/noprint out=list;
run;
data bad_dates1;
merge list(keep=icustomerid date_of_birth count) bad_dates;
by icustomerid date_of_birth;
run;
proc sql;
create table want as select * from bad_dates1
group by icustomerid
having count=max(count) and date_of_birth=min(date_of_birth);
quit;
I have done order by icustomerid and date_of_birth and then ran bad_dates code. It works but in case where there are more than one bad date then it gives 1 to only the first bad date and 0 to the rest of the bad dates. For e.g.,
icustomerid | debt_code | dr_inits | dr_name | dr_address1 | dr_address2 | dr_postcode | Date_of_birth | Num_of_accs | bad_date |
809079 | 354138000 | Martyn | Barrett | 26Bognor Drive | CT68QP | 29-Mar-70 | 9 | 0 | |
809079 | 340791011 | Martyn | Barrett | 26Bognor Drive | CT68QP | 29-Mar-70 | 9 | 0 | |
809079 | 97611547 | Martyn | Barrett | 26Bognor Drive | CT68QP | 29-Mar-70 | 9 | 0 | |
809079 | 227681087 | Martyn | Barrett | 26Bognor Drive | CT68QP | 29-Mar-70 | 9 | 0 | |
809079 | 261116552 | Martyn | Barrett | 26Bognor Drive | CT68QP | 29-Mar-70 | 9 | 0 | |
809079 | 354086019 | Martyn | Barrett | 26Bognor Drive | CT68QP | 30-Mar-70 | 9 | 1 | |
809079 | 354103962 | Martyn | Barrett | 26Bognor Drive | CT68QP | 30-Mar-70 | 9 | 0 | |
809079 | 355073420 | Martyn | Barrett | 26Bognor Drive | CT68QP | 30-Mar-70 | 9 | 0 | |
809079 | 355068768 | Martyn | Barrett | 26Bognor Drive | CT68QP | 30-Mar-70 | 9 | 0 |
I want all the 30 Mar 70 should be as 1. Is it possible to do this way?
For icustomerid 479864, two DOB is 01DEC67 and 1 is 02DEC68. So I want to mark the 02DEC68 as bad date as it is not meeting the consistency with other DOBs.
What happens if you have a case where 01DEC67 appears twice and 02DEC68 appears twice?
Advice ... do not (as in NEVER) read date of birth (or any date or time or datetime) as character. Read it in as an actual date, in this case using informat DATE9.
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;
Thank you. In case when date of birth has appeared equally then I would like to keep the one that comes first. Secondly, I tried using date9 for Date_of_birth but it shows me just number like -2755, 1166, 3258, etc. That's why I used the character.
:date9. gives the correct date in "SAS terms", the number of days since 01JAN1960. If you don't like the way it appears, you can also assign the date9. format (or any other date format you would like) to it, and then it will appear to humans as an actual date.
Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;
format date_of_birth date9.;
datalines ;
I will have to think about how to get the earliest date in this case.
I have modified your data so that it includes the case where some dates occur twice and another date occurs twice.
Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;
datalines ;
355774 316997113 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
355774 320478910 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
375818 379299084 Lynn Summerhayes 6Crossways Jaywick CO152NB 12MAR63 1
479864 328107651 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 328908108 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
374534 157389842 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389826 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389800 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 211701685 Sandra Motram 12PughRoad Bradley WV148PT 12SEP56 4
;
proc sort data=bad_dates;
by icustomerid date_of_birth;
run;
proc freq data=bad_dates ;
by icustomerid;
tables date_of_birth/noprint out=list;
run;
data bad_dates1;
merge list(keep=icustomerid date_of_birth count) bad_dates;
by icustomerid date_of_birth;
run;
proc sql;
create table want as select * from bad_dates1
group by icustomerid
having count=max(count) and date_of_birth=min(date_of_birth);
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.