Solved: Re: Only take last month's entry

Jay_Aguilar · Posted 06-30-2020 10:14 AM

Hello everyone,

I have a two part question and hope you can help me with some SAS codes.

I have a table which contains the following (have):

ID	Code	date
1	abc	26.10.2017
1	abc	17.02.2018
1	def	23.02.2018
2	ghi	04.05.2018
2	def	16.04.2019
3	abc	07.08.2019
4	abc	28.01.2016
4	ghi	03.11.2018
4	def	17.12.2019
4	def	17.12.2019

I would only need the last entry for each ID and month. So if a have two codes within the same month I only need the entry with the last date. If an ID has two same codes at the same day, it doesnt matter which entry I take.

The data should not have different codes on the same day, but maybe you have an idea how I can check for that before reducing the table?

So my needed tabe would look like this (want):

ID	Code	date
1	abc	26.10.2017
1	def	23.02.2018
2	ghi	04.05.2018
2	def	16.04.2019
3	abc	07.08.2019
4	abc	28.01.2016
4	ghi	03.11.2018
4	def	17.12.2019

In this example I only have maximum two entries for the same month, but theoretically it could also be more than two, relevant for me would only be the very last per each month.

I very much appreciate your help.

mkeintz · Posted 06-30-2020 03:32 PM

I too found the problem description a bit confusing. What I think you want is, for each ID/MONTH take the latest date. If there is are tied records for the latest date, then you can take either reocrd because you expect them to have the same CODE value.

If that is correct, and if your data are already sorted by ID/DATE, then :

data have;
  input ID	Code :$3.	date ddmmyy10.;
  format date date9.;
datalines;
1	abc	26.10.2017
1	abc	17.02.2018
1	def	23.02.2018
2	ghi	04.05.2018
2	def	16.04.2019
3	abc	07.08.2019
4	abc	28.01.2016
4	ghi	03.11.2018
4	def	17.12.2019
4	def	17.12.2019
run;

data want;
  set have  end=end_of_have;
  by id ;
  if end_of_have=0 then set have (keep=date firstobs=2 rename=(date=nxt_date));
  if last.id=1 or intck('month',date,nxt_date)>0);
run;

This program allows each iteration of the data step to look ahead one record to examine the subsequent date (variable nxt_date). The look-ahead is done in the second (conditional) SET statement which starts at record 2, keeps only DATE and renames it to NXT_DATE).

If the record in hand is the last record for a given ID, or not in the same month as the subsequent record then keep it.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

ballardw · Posted 06-30-2020 10:41 AM

Your example output does not match your description. You say "So if a have two codes within the same month I only need the entry with the last date". Your data for ID=1 and Code=abc is from two different months, not within the same month.

If you meant that you want the last Date within each ID and code that would be

Proc sort data=have;
   by id code date;
run;

data want;
   set have;
   by id code;
   if last.code;
run;

Or provide a more consistent start data set and matching output.

Jay_Aguilar · Posted 06-30-2020 11:13 AM

Thank you for the input.
Maybe I did not explain it correctly, I try again:
What I meant was that I would need for each month the last dated entry and not for each ID only the last entry which would be the result of the code you provided.
Is it clear now?

ballardw · Posted 06-30-2020 11:36 AM

@Jay_Aguilar wrote:
Thank you for the input.
Maybe I did not explain it correctly, I try again:
What I meant was that I would need for each month the last dated entry and not for each ID only the last entry which would be the result of the code you provided.
Is it clear now?

Not understanding "each month" without a better example. WHY is the first row of

1	abc	26.10.2017
1	abc	17.02.2018

not in your output? It is a different month. So that should be there if Month is considered at all, in fact those aren't even in the same year. So All is see is the LAST DATE, not month. Provide an example where "last month" is easier to see and point it out.

And my code does not have the last ID but the last date for each Id and Code combination.

Jay_Aguilar · Posted 06-30-2020 11:47 AM

The output table does not contain

ID Code date
1 abc 17.02.2018

and just
ID Code date
1 abc 26.10.2017
1 def 23.02.2018

because the entry of the 17.02.2018 for ID=1 is not needed. I just need the latest entry from February since there are two entries for February. For October 2017 there is only 1 entry, so I am fine. Cause per month and ID I need maximum 1 entry.
If I apply your code, I end up with only one row per ID.

ballardw · Posted 06-30-2020 12:02 PM

You know your problem.

You are not describing it well to others.

So is February to be considered across ALL Id values? You did not say so very clearly

. Or All Code values? Or does ID and Code have no role at all?

Jay_Aguilar · Posted 06-30-2020 12:17 PM

hmm ok, I try it differently: The table I need at the end which should look like the want table above will be ued to join to a table which looks like this:
ID startdate enddate
1 01.10.2017 31.10.2017
1 01.11.2017 30.11.2017
1 01.12.2017 31.12.2017
…
1 01.02.2018 28.02.2018
1 01.03.2018 31.03.2018
…
2 01.05.2018 31.05.2018
2 01.06.2018 30.06.2018
…
2 01.04.2019 30.04.2019
2 01.05.2019 31.05.2019

I have for every ID one row per month. The target would be now only get the information contained in the Code column for each row, so it would look like this:
ID startdate enddate Code
1 01.10.2017 31.10.2017 abc
1 01.11.2017 30.11.2017
1 01.12.2017 31.12.2017
…
1 01.02.2018 28.02.2018 def
1 01.03.2018 31.03.2018
…
2 01.05.2018 31.05.2018 ghi
2 01.06.2018 30.06.2018
…
2 01.04.2019 30.04.2019 def
2 01.05.2019 31.05.2019

Therefore ID plays a role and date as well. The information contained in Code is only relevant once per month. Since in the have table I have 2 entries for ID=1 and month=Februray, I could not join them correctly to the final table, cause I would have two matching values for Code.

mkeintz · Posted 06-30-2020 03:32 PM

I too found the problem description a bit confusing. What I think you want is, for each ID/MONTH take the latest date. If there is are tied records for the latest date, then you can take either reocrd because you expect them to have the same CODE value.

If that is correct, and if your data are already sorted by ID/DATE, then :

data have;
  input ID	Code :$3.	date ddmmyy10.;
  format date date9.;
datalines;
1	abc	26.10.2017
1	abc	17.02.2018
1	def	23.02.2018
2	ghi	04.05.2018
2	def	16.04.2019
3	abc	07.08.2019
4	abc	28.01.2016
4	ghi	03.11.2018
4	def	17.12.2019
4	def	17.12.2019
run;

data want;
  set have  end=end_of_have;
  by id ;
  if end_of_have=0 then set have (keep=date firstobs=2 rename=(date=nxt_date));
  if last.id=1 or intck('month',date,nxt_date)>0);
run;

This program allows each iteration of the data step to look ahead one record to examine the subsequent date (variable nxt_date). The look-ahead is done in the second (conditional) SET statement which starts at record 2, keeps only DATE and renames it to NXT_DATE).

If the record in hand is the last record for a given ID, or not in the same month as the subsequent record then keep it.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Jay_Aguilar · Posted 07-01-2020 07:52 AM

@mkeintz that is exactly what I wanted: "each ID/MONTH take the latest date." Thank you for providing the code.

Thanks a lot!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away