Hi SAS Pros,
I have a dataset like showed below.
data Have;
input ID first_date :mmddyy10. last_date :mmddyy10. edu class;
format first_date mmddyy10.;
format last_date mmddyy10.;
datalines;
1 02-01-2015 02-01-2015 3 3
1 02-01-2015 02-12-2015 3 1
1 02-01-2015 02-16-2015 3 3
1 02-01-2015 03-01-2015 3 3
1 08-20-2015 09-01-2015 3 3
1 10-06-2015 10-09-2015 2 1
2 10-19-2017 10-19-2017 1 1
2 10-19-2017 10-21-2017 3 1
2 10-19-2017 11-05-2017 2 2
2 10-19-2017 11-11-2017 1 1
2 10-19-2017 12-22-2017 1 1
2 12-25-2017 12-25-2017 3 2
;
run;
I would like to recode the last dates if the records under the same ID match on ID, first_date, edu, and class, with only last dates are different. One of these set of records has a first date and last date are the same (i.e. the first record for each ID). The I would like to enter the date of the record with same first and last dates (i.e. 02-01-2015 for ID=1 and 10-19-2017 for ID=2) for the last dates for those records match on ID, first_date, edu, and class under the same ID.
If the records under the same ID are different in any of ID, first_date, edu, and class, then just leave as them own, regardless first date is different or the same as the last date.
The Want data is also post below.
1 02-01-2015 02-01-2015 3 3
1 02-01-2015 02-12-2015 3 1
1 02-01-2015 02-01-2015 3 3
1 02-01-2015 02-01-2015 3 3
1 08-20-2015 09-01-2015 3 3
1 10-06-2015 10-09-2015 2 1
2 10-19-2017 10-19-2017 1 1
2 10-19-2017 10-21-2017 3 1
2 10-19-2017 11-05-2017 2 2
2 10-19-2017 10-19-2017 1 1
2 10-19-2017 10-19-2017 1 1
2 12-25-2017 12-25-2017 3 2
Thank you very much for any help!
Best regards,
C
If you just want to use the first last_date per key based on sort order of your source data then below one way to go.
data Have;
input ID first_date :mmddyy10. last_date :mmddyy10. edu class;
format first_date mmddyy10.;
format last_date mmddyy10.;
datalines;
1 02-01-2015 02-01-2015 3 3
1 02-01-2015 02-12-2015 3 1
1 02-01-2015 02-16-2015 3 3
1 02-01-2015 03-01-2015 3 3
1 08-20-2015 09-01-2015 3 3
1 10-06-2015 10-09-2015 2 1
2 10-19-2017 10-19-2017 1 1
2 10-19-2017 10-21-2017 3 1
2 10-19-2017 11-05-2017 2 2
2 10-19-2017 11-11-2017 1 1
2 10-19-2017 12-22-2017 1 1
2 12-25-2017 12-25-2017 3 2
;
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(keep=id first_date edu class last_date)');
h1.defineKey('id','first_date','edu','class');
h1.defineData('last_date');
h1.defineDone();
end;
set have;
h1.find();
run;
data expected;
input ID first_date :mmddyy10. last_date :mmddyy10. edu class;
format first_date mmddyy10.;
format last_date mmddyy10.;
datalines;
1 02-01-2015 02-01-2015 3 3
1 02-01-2015 02-12-2015 3 1
1 02-01-2015 02-01-2015 3 3
1 02-01-2015 02-01-2015 3 3
1 08-20-2015 09-01-2015 3 3
1 10-06-2015 10-09-2015 2 1
2 10-19-2017 10-19-2017 1 1
2 10-19-2017 10-21-2017 3 1
2 10-19-2017 11-05-2017 2 2
2 10-19-2017 10-19-2017 1 1
2 10-19-2017 10-19-2017 1 1
2 12-25-2017 12-25-2017 3 2
;
proc compare
data=want
comp=expected
;
run;
While loading the SAS hash table without option "multidata" SAS will only pick the first instance of a unique key combination and that's what the lookup using the find() method will return.
Hi,
Thanks for the codes. However, they just happened to be the first records. It is not necessary to be the first record with the same first and last date. I just list them like that.
What I want is to meet the conditions I mentioned in the first thread.
Thanks,
C
The condition for changing last_date is met and the code returns the desired data as posted by you.
The last_date used will be taken from the first occurrence of the key combination encountered in the data.
I understand your task as follows: if a given combination of id*first_date*edu*class has one record do nothing, i.e. leave the last_date value as is. But if that combination has more than one record, then assign last_date=first_date.
This can be done by passing through each ID twice. The first pass to count the frequency of a given id*first_date*edu*class, and the second pass to output the record with last_date modified if needed:
data Have;
input ID first_date :mmddyy10. last_date :mmddyy10. edu class;
format first_date mmddyy10.;
format last_date mmddyy10.;
datalines;
1 02-01-2015 02-01-2015 3 3
1 02-01-2015 02-12-2015 3 1
1 02-01-2015 02-16-2015 3 3
1 02-01-2015 03-01-2015 3 3
1 08-20-2015 09-01-2015 3 3
1 10-06-2015 10-09-2015 2 1
2 10-19-2017 10-19-2017 1 1
2 10-19-2017 10-21-2017 3 1
2 10-19-2017 11-05-2017 2 2
2 10-19-2017 11-11-2017 1 1
2 10-19-2017 12-22-2017 1 1
2 12-25-2017 12-25-2017 3 2
;
data want;
set have (in=firstpass) have (in=secondpass);
by id;
array _n_fdates {%sysevalf("01jan2015"d):%sysevalf("31dec2017"d),3,3} _temporary_;
if first.id then call missing(of _n_fdates{*});
if firstpass then _n_fdates{first_date,edu,class}+1;
if secondpass;
if _n_fdates{first_date,edu,class}>1 then last_date=first_date;
run;
The array _N_FDATES is a 3-dimensional array. The first dimension has a lower bound of "01jan2015"d and an upper bound of "31dec2017"d, to cover the expected range of first_date values. Change it to cover the actual domain of your data. The second dimension goes from 1 to 3 to cover EDU values, and the third also goes from 1 to 3 to cover CLASS values. They can be changed to other integer ranges too, if needed.
This program assumes the data are sorted by ID, but can be any order within ID.
Hi,
Thank you so much for the code! I was not able to work on this project in the past weeks so sorry for my late reply.
I think your understanding is quite close to my task. I think there is one criterion needs to add here: if a given combination of id*first_date*edu*class has one record do nothing, i.e. leave the last_date value as is. But if that combination has more than one record and one record has a date difference of 0, then assign last_date=first_date for the rest. So, if a given combination of id*first_date*edu*class has more than one record but each record has a date difference >0 and none of the records has a date difference of 0 within this ID, then leave them. Examples like follows:(there is no record like "5 8-19-2017 8-19-2017 1 1" for this person)
ID First_date Last_date EDU Class
5 8-19-2017 11-11-2017 1 1
5 8-19-2017 12-22-2017 1 1
Could you please advice me what the code look like when incorporating the above situation. I apologize for not specifying this before.
And if the variables that are going to be involved in a given combination are varA, varB, varC, varD, varE, varF and varG, I just need to put all of them in the "{}" in the if then statement, right?
Thank you so much!
Best regards,
@CynthiaWei wrote:
Hi,
Thank you so much for the code! I was not able to work on this project in the past weeks so sorry for my late reply.
I think your understanding is quite close to my task. I think there is one criterion needs to add here: if a given combination of id*first_date*edu*class has one record do nothing, i.e. leave the last_date value as is. But if that combination has more than one record and one record has a date difference of 0, then assign last_date=first_date for the rest. So, if a given combination of id*first_date*edu*class has more than one record but each record has a date difference >0 and none of the records has a date difference of 0 within this ID, then leave them. Examples like follows:(there is no record like "5 8-19-2017 8-19-2017 1 1" for this person)
ID First_date Last_date EDU Class
5 8-19-2017 11-11-2017 1 1
5 8-19-2017 12-22-2017 1 1
Could you please advice me what the code look like when incorporating the above situation. I apologize for not specifying this before.
So instead of assigning last_date=first_date for all combinations that have multiple records, as I had programmed before, just do it for all combinations that have multiple records, of which at least one already has last_date=first_date:
In that case then during the first pass, the program should still count records by combination. But in addition, if any record has first_date=last_date, then add (say) 1,000 to the count. I choose 1,000 because I assume no combination is likely to have near 1,000 records.
This means:
data want;
set have (in=firstpass) have (in=secondpass);
by id;
array _n_fdates {%sysevalf("01jan2015"d):%sysevalf("31dec2017"d),3,3} _temporary_;
if first.id then call missing(of _n_fdates{*});
if firstpass then do;
_n_fdates{first_date,edu,class}+1;
if first_date=last_date then _n_fdates{first_date,edu,class}+1000;
end;
if secondpass;
if _n_fdates{first_date,edu,class}>1001 then last_date=first_date;
run;
And if the variables that are going to be involved in a given combination are varA, varB, varC, varD, varE, varF and varG, I just need to put all of them in the "{}" in the if then statement, right?
In short yes. Make provision for each variable in the array statement, and then use those variable names as the array indexes in the subsequent statements.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.