Hi Experts,
I have a dataset which has few common icustomerid. I want to find how many of the common icustomerid have same date of birth and how many common icustomerid have different date of birth. Can you please suggest? For e.g. output data shows me that common icustomerid have the same date_of_birth then 1.
Here is the sample dateset
Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth $;
datalines ;
11902164 100079300 899 12DEC1963:00:00:00.000
4642984 100167774 891V 02JAN1964:00:00:00.000
4642984 100167915 899 02JAN1964:00:00:00.000
13622248 10086855 398M 10OCT1969:00:00:00.000
19939961 101134922 2H12 14DEC1970:00:00:00.000
9400191 105616411 503 28MAR1984:00:00:00.000
9400191 105616429 503 28MAR1984:00:00:00.000
;
run;
Thanks
proc sort data=confirm_dob;
by icustomerid date_of_birth;
run;
proc freq data=confirm_dob;
tables icustomerid*date_of_birth/noprint out=_counts_;
run;
data want;
merge confirm_dob _counts_(keep=icustomerID date_of_birth count);
by icustomerid date_of_birth;
if count>=2 then new_variable=1;
else new_variable=0;
run;
This should work. You should not be treating dates as character strings, they should be numeric as a general rule, and making dates numeric is simply a good practice and a good habit to get into. In this case, I have left them as character because it doesn't matter in this problem; in most problems it will matter.
Your code doesn't work properly, the dates seem to have a character missing. Could you please fix it?
Also, what is the desired output of this analysis? What does it mean "how many common icustomerid have different date of birth" — different than what?
I am looking to find that all the common icustomerid have same date of birth. We can see in the sample data that icustomerid 4642984 and 9400191 have multiple records. I want to check if those multiple records have same date of birth. E.g. icustomerid 4642984 have two debt_code and they have same date of birth then flag it as 1 and if any other icustomer have two debt_code and they have different date of birth then flag it as 0. Hope that makes sense.
What if there are three records for icustomerid, two have the same date of birth and one has a different date of birth?
Can you please fix your code?
If there are three records then two with same date to be marked as 1 and the other odd date of birth to be 0.
Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth $;
datalines ;
11902164 100079300 899 12DEC63
4642984 100167774 891V 02JAN64
4642984 100167915 899 02JAN1964
13622248 10086855 398M 10OCT69
19939961 101134922 2H12 14DEC70
9400191 105616411 503 28MAR84
9400191 105616429 503 28MAR84
;
run;
Thanks
Still not correct
Apologies for that.
Data Confirm_DOB;
infile cards expandtabs;
input icustomerid debt_code rep_code $ Date_of_birth $;
datalines ;
11902164 100079300 899 12DEC63
4642984 100167774 891V 02JAN64
4642984 100167915 899 02JAN64
13622248 10086855 398M 10OCT69
19939961 101134922 2H12 14DEC70
9400191 105616411 503 28MAR84
9400191 105616429 503 28MAR84
;
run;
proc sort data=confirm_dob;
by icustomerid date_of_birth;
run;
proc freq data=confirm_dob;
tables icustomerid*date_of_birth/noprint out=_counts_;
run;
data want;
merge confirm_dob _counts_(keep=icustomerID date_of_birth count);
by icustomerid date_of_birth;
if count>=2 then new_variable=1;
else new_variable=0;
run;
This should work. You should not be treating dates as character strings, they should be numeric as a general rule, and making dates numeric is simply a good practice and a good habit to get into. In this case, I have left them as character because it doesn't matter in this problem; in most problems it will matter.
Why do you treat dates, or datetime values as character?
When you provide a data step to read character data and expect to get longer than the default 8 characters either provide a LENGTH statement to set the expected length or use an informat long enough to read the expected results such as:
Data Confirm_DOB; infile cards expandtabs; input icustomerid debt_code rep_code $ Date_of_birth :$22.; datalines ; 11902164 100079300 899 12DEC1963:00:00:00.000 4642984 100167774 891V 02JAN1964:00:00:00.000 4642984 100167915 899 02JAN1964:00:00:00.000 13622248 10086855 398M 10OCT1969:00:00:00.000 19939961 101134922 2H12 14DEC1970:00:00:00.000 9400191 105616411 503 28MAR1984:00:00:00.000 9400191 105616429 503 28MAR1984:00:00:00.000 ;
IMHO anyone that provides dates with :00:00:00.000 is just wasting electrons and storage bytes somewhere. Lazy programmer no cookie.
One reason not to use character dates is that you don't have a clue when the value is not actually a valid date. Consider this minor modification has no concern about the 42d day of December.
Data Confirm_DOB; infile cards expandtabs; input icustomerid debt_code rep_code $ Date_of_birth :$22.; datalines ; 11902164 100079300 899 42DEC1963:00:00:00.000 4642984 100167774 891V 02JAN1964:00:00:00.000 4642984 100167915 899 02JAN1964:00:00:00.000 13622248 10086855 398M 10OCT1969:00:00:00.000 19939961 101134922 2H12 14DEC1970:00:00:00.000 9400191 105616411 503 28MAR1984:00:00:00.000 9400191 105616429 503 28MAR1984:00:00:00.000 ;
If you use an actual date informat SAS will tell you that you invalid data:
210 Data Confirm_DOB; 211 infile cards expandtabs; 212 input icustomerid debt_code rep_code $ Date_of_birth :date9.; 213 format date_of_birth date9.; 214 datalines ; NOTE: Invalid data for Date_of_birth in line 215 24-45. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+-- 215 11902164 100079300 899 42DEC1963:00:00:00.000 icustomerid=11902164 debt_code=100079300 rep_code=899 Date_of_birth=. _ERROR_=1 _N_=1 NOTE: The data set WORK.CONFIRM_DOB has 7 observations and 4 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
So you can fix, maybe, the values before proceeding to something that actually attempts to use those dates.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.