Hello everyone,
I have a two part question and hope you can help me with some SAS codes.
I have a table which contains the following (have):
ID | Code | date |
1 | abc | 26.10.2017 |
1 | abc | 17.02.2018 |
1 | def | 23.02.2018 |
2 | ghi | 04.05.2018 |
2 | def | 16.04.2019 |
3 | abc | 07.08.2019 |
4 | abc | 28.01.2016 |
4 | ghi | 03.11.2018 |
4 | def | 17.12.2019 |
4 | def | 17.12.2019 |
I would only need the last entry for each ID and month. So if a have two codes within the same month I only need the entry with the last date. If an ID has two same codes at the same day, it doesnt matter which entry I take.
The data should not have different codes on the same day, but maybe you have an idea how I can check for that before reducing the table?
So my needed tabe would look like this (want):
ID | Code | date |
1 | abc | 26.10.2017 |
1 | def | 23.02.2018 |
2 | ghi | 04.05.2018 |
2 | def | 16.04.2019 |
3 | abc | 07.08.2019 |
4 | abc | 28.01.2016 |
4 | ghi | 03.11.2018 |
4 | def | 17.12.2019 |
In this example I only have maximum two entries for the same month, but theoretically it could also be more than two, relevant for me would only be the very last per each month.
I very much appreciate your help.
I too found the problem description a bit confusing. What I think you want is, for each ID/MONTH take the latest date. If there is are tied records for the latest date, then you can take either reocrd because you expect them to have the same CODE value.
If that is correct, and if your data are already sorted by ID/DATE, then :
data have;
input ID Code :$3. date ddmmyy10.;
format date date9.;
datalines;
1 abc 26.10.2017
1 abc 17.02.2018
1 def 23.02.2018
2 ghi 04.05.2018
2 def 16.04.2019
3 abc 07.08.2019
4 abc 28.01.2016
4 ghi 03.11.2018
4 def 17.12.2019
4 def 17.12.2019
run;
data want;
set have end=end_of_have;
by id ;
if end_of_have=0 then set have (keep=date firstobs=2 rename=(date=nxt_date));
if last.id=1 or intck('month',date,nxt_date)>0);
run;
This program allows each iteration of the data step to look ahead one record to examine the subsequent date (variable nxt_date). The look-ahead is done in the second (conditional) SET statement which starts at record 2, keeps only DATE and renames it to NXT_DATE).
If the record in hand is the last record for a given ID, or not in the same month as the subsequent record then keep it.
Your example output does not match your description. You say "So if a have two codes within the same month I only need the entry with the last date". Your data for ID=1 and Code=abc is from two different months, not within the same month.
If you meant that you want the last Date within each ID and code that would be
Proc sort data=have; by id code date; run; data want; set have; by id code; if last.code; run;
Or provide a more consistent start data set and matching output.
@Jay_Aguilar wrote:
Thank you for the input.
Maybe I did not explain it correctly, I try again:
What I meant was that I would need for each month the last dated entry and not for each ID only the last entry which would be the result of the code you provided.
Is it clear now?
Not understanding "each month" without a better example. WHY is the first row of
1 | abc | 26.10.2017 |
1 | abc | 17.02.2018 |
not in your output? It is a different month. So that should be there if Month is considered at all, in fact those aren't even in the same year. So All is see is the LAST DATE, not month. Provide an example where "last month" is easier to see and point it out.
And my code does not have the last ID but the last date for each Id and Code combination.
You know your problem.
You are not describing it well to others.
So is February to be considered across ALL Id values? You did not say so very clearly
. Or All Code values? Or does ID and Code have no role at all?
hmm ok, I try it differently: The table I need at the end which should look like the want table above will be ued to join to a table which looks like this:
ID startdate enddate
1 01.10.2017 31.10.2017
1 01.11.2017 30.11.2017
1 01.12.2017 31.12.2017
…
1 01.02.2018 28.02.2018
1 01.03.2018 31.03.2018
…
2 01.05.2018 31.05.2018
2 01.06.2018 30.06.2018
…
2 01.04.2019 30.04.2019
2 01.05.2019 31.05.2019
I have for every ID one row per month. The target would be now only get the information contained in the Code column for each row, so it would look like this:
ID startdate enddate Code
1 01.10.2017 31.10.2017 abc
1 01.11.2017 30.11.2017
1 01.12.2017 31.12.2017
…
1 01.02.2018 28.02.2018 def
1 01.03.2018 31.03.2018
…
2 01.05.2018 31.05.2018 ghi
2 01.06.2018 30.06.2018
…
2 01.04.2019 30.04.2019 def
2 01.05.2019 31.05.2019
Therefore ID plays a role and date as well. The information contained in Code is only relevant once per month. Since in the have table I have 2 entries for ID=1 and month=Februray, I could not join them correctly to the final table, cause I would have two matching values for Code.
I too found the problem description a bit confusing. What I think you want is, for each ID/MONTH take the latest date. If there is are tied records for the latest date, then you can take either reocrd because you expect them to have the same CODE value.
If that is correct, and if your data are already sorted by ID/DATE, then :
data have;
input ID Code :$3. date ddmmyy10.;
format date date9.;
datalines;
1 abc 26.10.2017
1 abc 17.02.2018
1 def 23.02.2018
2 ghi 04.05.2018
2 def 16.04.2019
3 abc 07.08.2019
4 abc 28.01.2016
4 ghi 03.11.2018
4 def 17.12.2019
4 def 17.12.2019
run;
data want;
set have end=end_of_have;
by id ;
if end_of_have=0 then set have (keep=date firstobs=2 rename=(date=nxt_date));
if last.id=1 or intck('month',date,nxt_date)>0);
run;
This program allows each iteration of the data step to look ahead one record to examine the subsequent date (variable nxt_date). The look-ahead is done in the second (conditional) SET statement which starts at record 2, keeps only DATE and renames it to NXT_DATE).
If the record in hand is the last record for a given ID, or not in the same month as the subsequent record then keep it.
@mkeintz that is exactly what I wanted: "each ID/MONTH take the latest date." Thank you for providing the code.
Thanks a lot!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.