Hello,
Can some one help me how to goup id numbers with specific date frame? see attached sample file. I would like to group into 3 groups:
Group1 = all ids that has 3 consecutive years (e.g: see id# 2: 2001, 2002, 2003), and ids that has 1 or more consecutive years (see id#1: 2002, 2003)
Group2 = all ids that has 2007 (see id#1 ) and ids that has 2005 (see id#2)
Group3 = everything else under this group like 2010 (see id#1) and 2008 (see id#2)
Please note I need to return the dates as well not the year only as you see the attached file contain mdy. The attached is just an example of two ids but my raw file contain thousands of rows
Thanks for your help in advance.
Which group would you select for a case where an id has dates for 2006, 2007 and 2008?
if you look the file id# 2. One of the dates has is 2005, So to answer your question, 2005, 2006, 2007 are 3 consecutive years and will be part of Group2. But, date 2008 will be part of group3. Does that make sense? please let me know
What I'm trying to tell you: The rules for group selection you've posted are not mutually exclusive. I could now spend time analyzing the color coding in your data and see if I can work out the missing bits - but actually: I believe you should try and provide the full set of selection rules.
I would also suggest that you add a third column to your sample data for "Expected Group" so that it's easier for us to test our suggestions and make sure that it matches your expected result.
Patrick - I just attached same file with the 3 expected grouping results.
I've asked how you would group for a sequence of years 2006, 2007 and 2008. So no, you haven't answered my question. The answer you've given indicates that there is some hierarchy of how to assign groups - eg. if there are consecutive years (that's normally group 1) and one of these years is 2005 or 2007 then assign group 2? This stuff is not in your selection criteria and that's why they are not exhaustive. To have the rules sorted is something one needs to do before starting to write any code.
But o.k., here some code. It returns the expected result but I'm not so sure if it assigns the groups really in the way you want it to. It just happens to work with your sample data. It still might give you and inspiration of how to tackle the problem with your real data using the correct rules.
data have;
infile datalines truncover dlm=' ';
input id dt:ddmmyy. exp_group;
format dt date9.;
datalines;
1 12/02/2002 1
1 06/03/2002 1
1 17/04/2002 1
1 02/05/2002 1
1 03/05/2003 1
1 05/09/2007 2
1 11/10/2007 2
1 12/02/2010 3
1 06/03/2010 3
1 17/04/2010 3
1 02/05/2010 3
2 18/12/2001 1
2 18/12/2001 1
2 19/12/2001 1
2 19/12/2001 1
2 19/12/2001 1
2 20/12/2001 1
2 20/12/2001 1
2 20/12/2001 1
2 11/01/2002 1
2 11/01/2002 1
2 11/01/2002 1
2 15/01/2002 1
2 20/12/2002 1
2 20/12/2002 1
2 20/12/2002 1
2 23/12/2002 1
2 24/12/2002 1
2 24/12/2002 1
2 13/01/2003 1
2 13/01/2003 1
2 13/01/2003 1
2 15/01/2003 1
2 17/08/2005 2
2 17/08/2005 2
2 17/08/2005 2
2 17/08/2005 2
2 18/08/2005 2
2 18/08/2005 2
2 18/08/2005 2
2 18/08/2005 2
2 24/08/2005 2
2 24/08/2005 2
2 24/08/2005 2
2 25/08/2005 2
2 26/08/2005 2
2 26/08/2005 2
2 26/08/2005 2
2 26/08/2005 2
2 06/09/2005 2
2 06/09/2005 2
2 08/09/2005 2
2 22/05/2008 3
2 25/05/2008 3
2 29/05/2008 3
2 29/05/2008 3
;
run;
proc sql;
create view inter as
select id, year(dt) as _year
from have
where calculated _year not in (2005,2007)
;
quit;
data want(drop=_:);
set have;
_year=year(dt);
if _n_=1 then
do;
dcl hash h1(dataset:'inter',multidata:'no');
_rc=h1.defineKey('id','_year');
_rc=h1.defineDone();
end;
if _year in (2005,2007) then group=2;
else if (h1.check(key:id, key:_year+1) =0 or h1.check(key:id, key:_year-1) =0 ) then group=1;
else group=3;
check = (exp_group=group);
run;
Patrick and Steelers_In_DC: Thanks for taking my question. I think the best way is to post more list than what I posted last week. Please see the attached sample. Patrick - Your code below returns exaclty the expected rslt of the previous sample but i think with your current code if there's away to tweak to work with this attached sample that will work. One more thing to know, the consective years can be in either group. For example, id# 13. So, all consecutive groups can be found in any group. Please let me know and thanks a lot for your help in advance!
Hello - Can you please help to answer my question? My question goes back from last week and Patrick's respond was the closest answer. However, I was not able to tweak his code to to return my expected result (see latest attached sample). Appreciate your time and help!
Appreciate your time! First disregard the expected result tab shown on my last sample file. After I spoke with my cooworker here's the rules we thought may help you understand. Any id (see the latest file sample ) that has 3 consective years should go to group1. For example, if an Id has their first date on 1/2005 then the group will span to 1/2008 and any date that comes during that period will be part of group1. Another example, if an Id has their first date in 1/2005 and same id has another date 1/2009 that means this id will be splitted 2 groups because there're 4 years between 2005 and 2009.
This code worked for our group1 and group2, althougth it may not be the most efficient way. Please let me know your thoughts if there's away to tweek my code below. I
data tst;
set sample_08;
three_yr = date + (365.25*3);
format three_yr MMDDYY10.;
run;
proc sort data = tst; by id date; run;
data group1;
set tst;
by id date;
if first.id;
run;
****Detail of group1*******************;
proc sql;
create table group1_detail as
select b.*, a.three_yr
from group1 as a inner join
tst as b
on a.id = b.id
where a.date <= b.date <= a.three_yr ;
quit;
proc sort data = group1_detail; by id date; run;
****Detail of group2*******************;
proc sql;
create table group2_detail as
select b.*, a.three_yr
from group1 as a inner join
tst as b
on a.id = b.id
where b.date > a.three_yr;
quit;
proc sort data = group2_detail; by id date; run;
data group2;
set group2_detail;
by id date;
if first.id;
run;
AliMN, I'm not sure what you are asking for group one but here is a solution you can look at:
data one two three;
set data;
year = year(date);
lyear = lag(year);
if (year = lyear) or (lyear = year -1) then output one;
if year(date) in (2007,2005) then output two;
else output three;
run;
Mark
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.