BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jay_Aguilar
Calcite | Level 5

Hello everyone,

 

I have a two part question and hope you can help me with some SAS codes.

 

I have a table which contains the following (have):

IDCodedate
1abc26.10.2017
1abc17.02.2018
1def23.02.2018
2ghi04.05.2018
2def16.04.2019
3abc07.08.2019
4abc28.01.2016
4ghi03.11.2018
4def17.12.2019
4def17.12.2019

 

I would only need the last entry for each ID and month. So if a have two codes within the same month I only need the entry with the last date. If an ID has two same codes at the same day, it doesnt matter which entry I take.

The data should not have different codes on the same day, but maybe you have an idea how I can check for that before reducing the table?

 

So my needed tabe would look like this (want):

IDCodedate
1abc26.10.2017
1def23.02.2018
2ghi04.05.2018
2def16.04.2019
3abc07.08.2019
4abc28.01.2016
4ghi03.11.2018
4def17.12.2019

 In this example I only have maximum two entries for the same month, but theoretically it could also be more than two, relevant for me would only be the very last per each month.

 

I very much appreciate your help.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I too found the problem description a bit confusing.  What I think you want is, for each ID/MONTH take the latest date.   If there is are tied records for the latest date, then you can take either reocrd because you expect them to have the same CODE value.

 

If that is correct, and if your data are already sorted by ID/DATE, then :

 

data have;
  input ID	Code :$3.	date ddmmyy10.;
  format date date9.;
datalines;
1	abc	26.10.2017
1	abc	17.02.2018
1	def	23.02.2018
2	ghi	04.05.2018
2	def	16.04.2019
3	abc	07.08.2019
4	abc	28.01.2016
4	ghi	03.11.2018
4	def	17.12.2019
4	def	17.12.2019
run;

data want;
  set have  end=end_of_have;
  by id ;
  if end_of_have=0 then set have (keep=date firstobs=2 rename=(date=nxt_date));
  if last.id=1 or intck('month',date,nxt_date)>0);
run;

This program allows each iteration of the data step to look ahead one record to examine the subsequent date (variable nxt_date). The look-ahead is done in the second (conditional) SET statement which starts at record 2, keeps only DATE and renames it to NXT_DATE).

 

If the record in hand is the last record for a given ID, or not in the same month as the subsequent record then keep it.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
ballardw
Super User

Your example output does not match your description. You say "So if a have two codes within the same month I only need the entry with the last date". Your data for ID=1 and Code=abc is from two different months, not within the same month.

 

If you meant that you want the last Date within each ID and code that would be

Proc sort data=have;
   by id code date;
run;

data want;
   set have;
   by id code;
   if last.code;
run;

Or provide a more consistent start data set and matching output.

Jay_Aguilar
Calcite | Level 5
Thank you for the input.
Maybe I did not explain it correctly, I try again:
What I meant was that I would need for each month the last dated entry and not for each ID only the last entry which would be the result of the code you provided.
Is it clear now?
ballardw
Super User

@Jay_Aguilar wrote:
Thank you for the input.
Maybe I did not explain it correctly, I try again:
What I meant was that I would need for each month the last dated entry and not for each ID only the last entry which would be the result of the code you provided.
Is it clear now?

Not understanding "each month" without a better example. WHY is the first row of

1 abc 26.10.2017
1 abc 17.02.2018
 

 

not in your output? It is a different month. So that should be there if Month is considered at all, in fact those aren't even in the same year. So All is see is the LAST DATE, not month. Provide an example where "last month" is easier to see and point it out.

 

And my code does not have the last ID but the last date for each Id and Code combination.

Jay_Aguilar
Calcite | Level 5
The output table does not contain

ID Code date
1 abc 17.02.2018

and just
ID Code date
1 abc 26.10.2017
1 def 23.02.2018

because the entry of the 17.02.2018 for ID=1 is not needed. I just need the latest entry from February since there are two entries for February. For October 2017 there is only 1 entry, so I am fine. Cause per month and ID I need maximum 1 entry.
If I apply your code, I end up with only one row per ID.
ballardw
Super User

You know your problem.

You are not describing it well to others.

So is February to be considered across ALL Id values? You did not say so very clearly

. Or All Code values? Or does ID and Code have no role at all?

Jay_Aguilar
Calcite | Level 5

hmm ok, I try it differently: The table I need at the end which should look like the want table above will be ued to join to a table which looks like this:
ID startdate enddate
1 01.10.2017 31.10.2017
1 01.11.2017 30.11.2017
1 01.12.2017 31.12.2017

1 01.02.2018 28.02.2018
1 01.03.2018 31.03.2018

2 01.05.2018 31.05.2018
2 01.06.2018 30.06.2018

2 01.04.2019 30.04.2019
2 01.05.2019 31.05.2019

I have for every ID one row per month. The target would be now only get the information contained in the Code column for each row, so it would look like this:
ID startdate enddate Code
1 01.10.2017 31.10.2017 abc
1 01.11.2017 30.11.2017
1 01.12.2017 31.12.2017

1 01.02.2018 28.02.2018 def
1 01.03.2018 31.03.2018

2 01.05.2018 31.05.2018 ghi
2 01.06.2018 30.06.2018

2 01.04.2019 30.04.2019 def
2 01.05.2019 31.05.2019

Therefore ID plays a role and date as well. The information contained in Code is only relevant once per month. Since in the have table I have 2 entries for ID=1 and month=Februray, I could not join them correctly to the final table, cause I would have two matching values for Code.

mkeintz
PROC Star

I too found the problem description a bit confusing.  What I think you want is, for each ID/MONTH take the latest date.   If there is are tied records for the latest date, then you can take either reocrd because you expect them to have the same CODE value.

 

If that is correct, and if your data are already sorted by ID/DATE, then :

 

data have;
  input ID	Code :$3.	date ddmmyy10.;
  format date date9.;
datalines;
1	abc	26.10.2017
1	abc	17.02.2018
1	def	23.02.2018
2	ghi	04.05.2018
2	def	16.04.2019
3	abc	07.08.2019
4	abc	28.01.2016
4	ghi	03.11.2018
4	def	17.12.2019
4	def	17.12.2019
run;

data want;
  set have  end=end_of_have;
  by id ;
  if end_of_have=0 then set have (keep=date firstobs=2 rename=(date=nxt_date));
  if last.id=1 or intck('month',date,nxt_date)>0);
run;

This program allows each iteration of the data step to look ahead one record to examine the subsequent date (variable nxt_date). The look-ahead is done in the second (conditional) SET statement which starts at record 2, keeps only DATE and renames it to NXT_DATE).

 

If the record in hand is the last record for a given ID, or not in the same month as the subsequent record then keep it.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Jay_Aguilar
Calcite | Level 5

@mkeintz that is exactly what I wanted: "each ID/MONTH take the latest date." Thank you for providing the code.

Thanks a lot!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1446 views
  • 0 likes
  • 3 in conversation