BookmarkSubscribeRSS Feed
AliMN
Calcite | Level 5

Hello,

Can some one help me how to goup id numbers with specific date frame? see attached sample file. I would like to group into 3 groups:

Group1 = all ids that has 3 consecutive years (e.g: see id# 2: 2001, 2002, 2003), and ids that has 1 or more consecutive years (see id#1: 2002, 2003)

Group2 = all ids that has 2007 (see id#1 ) and ids that has 2005 (see id#2)

Group3 = everything else under this group like 2010 (see id#1) and 2008 (see id#2)

Please note I need to return the dates as well not the year only as you see the attached file contain mdy. The attached is just an example of two ids but my raw file contain thousands of rows

Thanks for your help in advance.

10 REPLIES 10
Patrick
Opal | Level 21

Which group would you select for a case where an id has dates for 2006, 2007 and 2008?

AliMN
Calcite | Level 5

if you look the file id# 2. One of the dates has is 2005, So to answer your question, 2005, 2006, 2007 are 3 consecutive years and will be part of Group2. But, date 2008 will be part of group3. Does that make sense? please let me know

Patrick
Opal | Level 21

What I'm trying to tell you: The rules for group selection you've posted are not mutually exclusive. I could now spend time analyzing the color coding in your data and see if I can work out the missing bits - but actually: I believe you should try and provide the full set of selection rules.

I would also suggest that you add a third column to your sample data for "Expected Group" so that it's easier for us to test our suggestions and make sure that it matches your expected result.

AliMN
Calcite | Level 5

Patrick - I just attached same file with the 3 expected grouping results.

Patrick
Opal | Level 21

I've asked how you would group for a sequence of years 2006, 2007 and 2008. So no, you haven't answered my question. The answer you've given indicates that there is some hierarchy of how to assign groups - eg. if there are consecutive years (that's normally group 1) and one of these years is 2005 or 2007 then assign group 2? This stuff is not in your selection criteria and that's why they are not exhaustive. To have the rules sorted is something one needs to do before starting to write any code.


But o.k., here some code. It returns the expected result but I'm not so sure if it assigns the groups really in the way you want it to. It just happens to work with your sample data. It still might give you and inspiration of how to tackle the problem with your real data using the correct rules.



data have;

  infile datalines truncover dlm=' ';

  input id dt:ddmmyy. exp_group;

  format dt date9.;

  datalines;

1 12/02/2002 1

1 06/03/2002 1

1 17/04/2002 1

1 02/05/2002 1

1 03/05/2003 1

1 05/09/2007 2

1 11/10/2007 2

1 12/02/2010 3

1 06/03/2010 3

1 17/04/2010 3

1 02/05/2010 3

2 18/12/2001 1

2 18/12/2001 1

2 19/12/2001 1

2 19/12/2001 1

2 19/12/2001 1

2 20/12/2001 1

2 20/12/2001 1

2 20/12/2001 1

2 11/01/2002 1

2 11/01/2002 1

2 11/01/2002 1

2 15/01/2002 1

2 20/12/2002 1

2 20/12/2002 1

2 20/12/2002 1

2 23/12/2002 1

2 24/12/2002 1

2 24/12/2002 1

2 13/01/2003 1

2 13/01/2003 1

2 13/01/2003 1

2 15/01/2003 1

2 17/08/2005 2

2 17/08/2005 2

2 17/08/2005 2

2 17/08/2005 2

2 18/08/2005 2

2 18/08/2005 2

2 18/08/2005 2

2 18/08/2005 2

2 24/08/2005 2

2 24/08/2005 2

2 24/08/2005 2

2 25/08/2005 2

2 26/08/2005 2

2 26/08/2005 2

2 26/08/2005 2

2 26/08/2005 2

2 06/09/2005 2

2 06/09/2005 2

2 08/09/2005 2

2 22/05/2008 3

2 25/05/2008 3

2 29/05/2008 3

2 29/05/2008 3

;

run;

proc sql;

  create view inter as

    select id, year(dt) as _year

    from have

    where calculated _year not in (2005,2007)

    ;

quit;

data want(drop=_:);

  set have;

  _year=year(dt);

  if _n_=1 then

    do;

      dcl hash h1(dataset:'inter',multidata:'no');

      _rc=h1.defineKey('id','_year');

      _rc=h1.defineDone();

    end;

  if _year in (2005,2007) then group=2;

  else if (h1.check(key:id, key:_year+1) =0 or h1.check(key:id, key:_year-1) =0 ) then group=1;

  else group=3;

  check = (exp_group=group);

run;

AliMN
Calcite | Level 5

Patrick and Steelers_In_DC: Thanks for taking my question. I think the best way is to post more list than what I posted last week. Please see the attached sample. Patrick - Your code below returns exaclty the expected rslt of the previous sample but i think with your current code if there's away to tweak to work with this attached sample that will work. One more thing to know, the consective years can be in either group. For example, id# 13. So, all consecutive groups can be found in any group. Please let me know and thanks a lot for your help in advance!

AliMN
Calcite | Level 5

Hello - Can you please help to answer my question? My question goes back from last week and Patrick's respond was the closest answer. However, I was not able to tweak his code to to return my expected result (see latest attached sample). Appreciate your time and help!

Patrick
Opal | Level 21

As you might get from and my answer: We don't understand your rules. You need to amend/reformulate them or add some explanation to it. I for my part won't spend any further time with your question until you've done this. You need to help us to help you.

AliMN
Calcite | Level 5

Appreciate your time! First disregard the expected result tab shown on my last sample file. After I spoke with my cooworker here's the rules we thought may help you understand. Any id (see the latest file sample ) that has 3 consective years should go to group1. For example, if an Id has their first date on 1/2005  then the group will span to 1/2008 and any date that comes during that period will be part of group1.  Another example, if an Id has their first date in 1/2005 and same id has another date 1/2009 that means this id will be splitted 2 groups because there're 4 years between 2005 and 2009.

This code worked for our group1 and group2, althougth it may not be the most efficient way. Please let me know your thoughts if there's away to tweek my code below. I

data tst;
set sample_08;
three_yr = date + (365.25*3);
format three_yr MMDDYY10.;
run;

proc sort data = tst; by id date; run;

data group1;

set tst;

by id date;

if first.id;

run;

****Detail of group1*******************;

proc sql;

create table group1_detail as

select b.*, a.three_yr

  from group1 as a inner join

       tst as b

       on a.id = b.id

where a.date <= b.date <= a.three_yr ;

quit;

proc sort data = group1_detail; by id date; run;

****Detail of group2*******************;

proc sql;
create table group2_detail as
select b.*, a.three_yr
  from group1 as a inner join
       tst as b
       on a.id = b.id
where b.date > a.three_yr;
quit;


proc sort data = group2_detail; by id date; run;

data group2;
set group2_detail;
by id date;
if first.id;
run;

Steelers_In_DC
Barite | Level 11


AliMN, I'm not sure what you are asking for group one but here is a solution you can look at:

data one two three;

set data;

year = year(date);

lyear = lag(year);

if (year = lyear) or (lyear = year -1) then output one;

if year(date) in (2007,2005) then output two;

else output three;

run;

Mark

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1774 views
  • 1 like
  • 3 in conversation