BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi all,

I have a dataset which has date_of_birth of the customer. As seen in the data, few icustomerid are same and they have a different date_of_birth. I want the output data to be considering the odd date_of_birth as bad dates. For e.g., For icustomerid 479864, two DOB is 01DEC67 and 1 is 02DEC68. So I want to mark the 02DEC68 as bad date as it is not meeting the consistency with other DOBs. Also icustomerid 374534 has one odd DOB i.e., 12SEP56 so that should be considered as bad date. I want to identify which dates of birth (DOB) appear the most in a list, and mark any DOBs that appear fewer times than the most common one with a 1. For example, if the list contains the following DOBs: 1/1/2000 2/2/2000 1/1/2000 The output should be: 1/1/2000 0 2/2/2000 1. Thanks

Sample Dataset
Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth $ Num_of_accs ;

datalines ;
355774 316997113 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
355774 320478910 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
375818 379299084 Lynn Summerhayes 6Crossways Jaywick CO152NB 12MAR63 1
479864 328107651 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 328908108 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
374534 157389842 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389826 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389800 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 211701685 Sandra Motram 12PughRoad Bradley WV148PT 12SEP56 4
;
run;

DATA bad_dates;
    SET Linked_acc;
    IF icustomerid=lag(icustomerid) and date_of_birth ne lag(date_of_birth) THEN
        bad_date=1;
    ELSE
        bad_date=0;
    RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

I have modified your data so that it includes the case where some dates occur twice and another date occurs twice.

 

Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;
datalines ;
355774 316997113 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
355774 320478910 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
375818 379299084 Lynn Summerhayes 6Crossways Jaywick CO152NB 12MAR63 1
479864 328107651 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 328908108 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
374534 157389842 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389826 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389800 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 211701685 Sandra Motram 12PughRoad Bradley WV148PT 12SEP56 4
;

proc sort data=bad_dates;
    by icustomerid date_of_birth;
run;
proc freq data=bad_dates ;
    by icustomerid;
    tables date_of_birth/noprint out=list;
run;
data bad_dates1;
    merge list(keep=icustomerid date_of_birth count) bad_dates;
    by icustomerid date_of_birth;
run;
proc sql;
    create table want as select * from bad_dates1
    group by icustomerid
    having count=max(count) and date_of_birth=min(date_of_birth);
quit;

 

--
Paige Miller

View solution in original post

7 REPLIES 7
Sandeep77
Lapis Lazuli | Level 10

I have done order by icustomerid and date_of_birth and then ran bad_dates code. It works but in case where there are more than one bad date then it gives 1 to only the first bad date and 0 to the rest of the bad dates. For e.g.,

icustomerid debt_code dr_inits dr_name dr_address1 dr_address2 dr_postcode Date_of_birth Num_of_accs bad_date
809079 354138000 Martyn Barrett 26Bognor Drive   CT68QP 29-Mar-70 9 0
809079 340791011 Martyn Barrett 26Bognor Drive   CT68QP 29-Mar-70 9 0
809079 97611547 Martyn Barrett 26Bognor Drive   CT68QP 29-Mar-70 9 0
809079 227681087 Martyn Barrett 26Bognor Drive   CT68QP 29-Mar-70 9 0
809079 261116552 Martyn Barrett 26Bognor Drive   CT68QP 29-Mar-70 9 0
809079 354086019 Martyn Barrett 26Bognor Drive   CT68QP 30-Mar-70 9 1
809079 354103962 Martyn Barrett 26Bognor Drive   CT68QP 30-Mar-70 9 0
809079 355073420 Martyn Barrett 26Bognor Drive   CT68QP 30-Mar-70 9 0
809079 355068768 Martyn Barrett 26Bognor Drive   CT68QP 30-Mar-70 9 0

I want all the 30 Mar 70 should be as 1. Is it possible to do this way?

PaigeMiller
Diamond | Level 26

For icustomerid 479864, two DOB is 01DEC67 and 1 is 02DEC68. So I want to mark the 02DEC68 as bad date as it is not meeting the consistency with other DOBs.

 

What happens if you have a case where 01DEC67 appears twice and 02DEC68 appears twice?

 

Advice ... do not (as in NEVER) read date of birth (or any date or time or datetime) as character. Read it in as an actual date, in this case using informat DATE9.

 

input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;

 

 

--
Paige Miller
Sandeep77
Lapis Lazuli | Level 10

Thank you. In case when date of birth has appeared equally then I would like to keep the one that comes first. Secondly, I tried using date9 for Date_of_birth but it shows me just number like -2755, 1166, 3258, etc. That's why I used the character.

PaigeMiller
Diamond | Level 26

:date9. gives the correct date in "SAS terms", the number of days since 01JAN1960. If you don't like the way it appears, you can also assign the date9. format (or any other date format you would like) to it, and then it will appear to humans as an actual date.

 

Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;
format date_of_birth date9.;
datalines ;

 

I will have to think about how to get the earliest date in this case. 

--
Paige Miller
Sandeep77
Lapis Lazuli | Level 10
Thanks, it worked perfectly! It would be great if you could suggest me to mark all the bad dates as 1 instead of just the first one when it changes like I have shown in the 2nd part where only one date 30Mar70 is marked as (bad_date) instead of all the 30Mar70.
PaigeMiller
Diamond | Level 26

I have modified your data so that it includes the case where some dates occur twice and another date occurs twice.

 

Data Bad_Dates;
infile cards expandtabs;
input icustomerid debt_code dr_inits $ dr_name $ dr_address1 $ dr_address2 $ dr_postcode $ Date_of_birth :date9. Num_of_accs ;
datalines ;
355774 316997113 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
355774 320478910 Richard Williams 69MainwaringDrive SaltneyFerry CH40AZ 16JUN52 2
375818 379299084 Lynn Summerhayes 6Crossways Jaywick CO152NB 12MAR63 1
479864 328107651 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 261721781 Garry Wyett 79 CadewellLane TQ27HP 02DEC68 3
479864 328908108 Garry Wyatt 79 CadewellLane TQ27HP 01DEC67 3
374534 157389842 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389826 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 157389800 Sandra Motram 12PughRoad Bradley WV148PT 09SEP56 4
374534 211701685 Sandra Motram 12PughRoad Bradley WV148PT 12SEP56 4
;

proc sort data=bad_dates;
    by icustomerid date_of_birth;
run;
proc freq data=bad_dates ;
    by icustomerid;
    tables date_of_birth/noprint out=list;
run;
data bad_dates1;
    merge list(keep=icustomerid date_of_birth count) bad_dates;
    by icustomerid date_of_birth;
run;
proc sql;
    create table want as select * from bad_dates1
    group by icustomerid
    having count=max(count) and date_of_birth=min(date_of_birth);
quit;

 

--
Paige Miller
Sandeep77
Lapis Lazuli | Level 10
Thank you. That really helped!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 467 views
  • 2 likes
  • 2 in conversation