BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sandeep77
Lapis Lazuli | Level 10

Hi Experts,

I have a dataset which has few common icustomerid. I want to find how many of the common icustomerid have same date of birth and how many common icustomerid have different date of birth. Can you please suggest? For e.g. output data shows me that common icustomerid have the same date_of_birth then 1.

Here is the sample dateset

Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth $;
datalines ;
11902164 100079300 899 12DEC1963:00:00:00.000
4642984 100167774 891V 02JAN1964:00:00:00.000
4642984 100167915 899 02JAN1964:00:00:00.000
13622248 10086855 398M 10OCT1969:00:00:00.000
19939961 101134922 2H12 14DEC1970:00:00:00.000
9400191 105616411 503 28MAR1984:00:00:00.000
9400191 105616429 503 28MAR1984:00:00:00.000
;
run;

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
proc sort data=confirm_dob;
    by icustomerid date_of_birth;
run;

proc freq data=confirm_dob;
    tables icustomerid*date_of_birth/noprint out=_counts_;
run;

data want;
    merge confirm_dob _counts_(keep=icustomerID date_of_birth count);
    by icustomerid date_of_birth;
    if count>=2 then new_variable=1;
    else new_variable=0;
run;

 

This should work. You should not be treating dates as character strings, they should be numeric as a general rule, and making dates numeric is simply a good practice and a good habit to get into. In this case, I have left them as character because it doesn't matter in this problem; in most problems it will matter.

--
Paige Miller

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Your code doesn't work properly, the dates seem to have a character missing. Could you please fix it?

 

PaigeMiller_0-1685539578882.png

 

 

Also, what is the desired output of this analysis? What does it mean "how many common icustomerid have different date of birth" — different than what?

 

 

--
Paige Miller
Sandeep77
Lapis Lazuli | Level 10

I am looking to find that all the common icustomerid have same date of birth. We can see in the sample data that icustomerid 4642984 and 9400191 have multiple records. I want to check if those multiple records have same date of birth. E.g. icustomerid 4642984 have two debt_code and they have same date of birth then flag it as 1 and if any other icustomer have two debt_code and they have different date of birth then flag it as 0. Hope that makes sense.

PaigeMiller
Diamond | Level 26

What if there are three records for icustomerid, two have the same date of birth and one has a different date of birth?

 

Can you please fix your code?

--
Paige Miller
Sandeep77
Lapis Lazuli | Level 10

If there are three records then two with same date to be marked as 1 and the other odd date of birth to be 0. 

Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth $;
datalines ;
11902164 100079300 899 12DEC63
4642984 100167774 891V 02JAN64
4642984 100167915 899 02JAN1964
13622248 10086855 398M 10OCT69
19939961 101134922 2H12 14DEC70
9400191 105616411 503 28MAR84
9400191 105616429 503 28MAR84
;
run;

Thanks

PaigeMiller
Diamond | Level 26

Still not correct

 

PaigeMiller_0-1685541289563.png

 

--
Paige Miller
Sandeep77
Lapis Lazuli | Level 10

Apologies for that.

Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth $;
datalines ;
11902164 100079300 899 12DEC63
4642984 100167774 891V 02JAN64
4642984 100167915 899 02JAN64
13622248 10086855 398M 10OCT69
19939961 101134922 2H12 14DEC70
9400191 105616411 503 28MAR84
9400191 105616429 503 28MAR84
;
run;

PaigeMiller
Diamond | Level 26
proc sort data=confirm_dob;
    by icustomerid date_of_birth;
run;

proc freq data=confirm_dob;
    tables icustomerid*date_of_birth/noprint out=_counts_;
run;

data want;
    merge confirm_dob _counts_(keep=icustomerID date_of_birth count);
    by icustomerid date_of_birth;
    if count>=2 then new_variable=1;
    else new_variable=0;
run;

 

This should work. You should not be treating dates as character strings, they should be numeric as a general rule, and making dates numeric is simply a good practice and a good habit to get into. In this case, I have left them as character because it doesn't matter in this problem; in most problems it will matter.

--
Paige Miller
ballardw
Super User

Why do you treat dates, or datetime values as character?

 

When you provide a data step to read character data and expect to get longer than the default 8 characters either provide a LENGTH statement to set the expected length or use an informat long enough to read the expected results such as:

Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth :$22.;
datalines ;
11902164 100079300 899 12DEC1963:00:00:00.000
4642984 100167774 891V 02JAN1964:00:00:00.000
4642984 100167915 899 02JAN1964:00:00:00.000
13622248 10086855 398M 10OCT1969:00:00:00.000
19939961 101134922 2H12 14DEC1970:00:00:00.000
9400191 105616411 503 28MAR1984:00:00:00.000
9400191 105616429 503 28MAR1984:00:00:00.000
; 

IMHO anyone that provides dates with :00:00:00.000 is just wasting electrons and storage bytes somewhere. Lazy programmer no cookie.

 

One reason not to use character dates is that you don't have a clue when the value is not actually a valid date. Consider this minor modification has no concern about the 42d day of December.

Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth :$22.;
datalines ;
11902164 100079300 899 42DEC1963:00:00:00.000
4642984 100167774 891V 02JAN1964:00:00:00.000
4642984 100167915 899 02JAN1964:00:00:00.000
13622248 10086855 398M 10OCT1969:00:00:00.000
19939961 101134922 2H12 14DEC1970:00:00:00.000
9400191 105616411 503 28MAR1984:00:00:00.000
9400191 105616429 503 28MAR1984:00:00:00.000
; 

If you use an actual date informat SAS will tell you that you invalid data:

210  Data Confirm_DOB;
211  infile cards expandtabs;
212  input icustomerid debt_code rep_code $ Date_of_birth :date9.;
213  format date_of_birth date9.;
214  datalines ;

NOTE: Invalid data for Date_of_birth in line 215 24-45.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
215        11902164 100079300 899 42DEC1963:00:00:00.000
icustomerid=11902164 debt_code=100079300 rep_code=899 Date_of_birth=. _ERROR_=1 _N_=1
NOTE: The data set WORK.CONFIRM_DOB has 7 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


So you can fix, maybe, the values before proceeding to something that actually attempts to use those dates.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 738 views
  • 2 likes
  • 3 in conversation