Hi Team,
I want to identify id's having a flag=1 for two or more of the following variables in two consecutive months. The thing here is there should at least be two or more variables with flag=1, in any combination (could be var 1 and 2 or var 1 and 6 or var 1, 5, 7...), and should be present in two consecutive months. Say in the following dataset id 1 have flag=1 for var5, var6 and var7 in the 7th month, and var6 and var7 in 8th month ..hence id 1 should be present in the output. Just one episode of such scenario is enough to be in the output dataset, disregard of the value in the remaining months.
dataset;
id year month var1 var2 var3 var 4 var5 var6 var7 var..........multiple
1 2012 7 0 0 0 0 1 1 1
1 2012 8 0 0 0 0 0 1 1
1 2012 9 1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1 0 0 1 1 1 0 0
2 2012 7 0 0 0 0 0 0 0
2 2012 8 0 0 0 0 0 0 0
2 2012 9 0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1 0 0 1 1 1 0 0
2 2013 3 1 1 0 0 0 0 0
3 2012 7 0 0 0 0 1 1 1
3 2012 8 0 0 1 1 0 0 0
3 2012 9 0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1 1 0 0 0 0 1 0
4 2013 4 0 0 1 1 0 0 0
Output;
id
1
2
3
ok. Makes sense then.
Do something like this
data have;
input id$ year month var1-var7;
datalines;
1 2012 7 0 0 0 0 1 1 1
1 2012 8 0 0 0 0 0 1 1
1 2012 9 1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1 0 0 1 1 1 0 0
2 2012 7 0 0 0 0 0 0 0
2 2012 8 0 0 0 0 0 0 0
2 2012 9 0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1 0 0 1 1 1 0 0
2 2013 3 1 1 0 0 0 0 0
3 2012 7 0 0 0 0 1 1 1
3 2012 8 0 0 0 0 0 0 0
3 2012 9 0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1 1 0 0 0 0 1 0
4 2013 4 0 0 1 1 0 0 0
;
proc sort data=have;
by id year month;
run;
data want;
set have;
by id;
array vars{*} var1-var7;
monthsbetween = intck('month', lag(mdy(month, 1, year)), mdy(month, 1, year));
if first.id then do;
monthsbetween=.;
outflag=0;
end;
if sum(of vars[*])>1 then twoormore=1;
else twoormore=0;
if monthsbetween=1 and twoormore=1 and lag(twoormore)=1 then outflag=1;
retain outflag;
keep id;
if last.id and outflag=1;
run;
I don't see how id=3 meets these criteria from your sample data? it has flag=1 for more than one variable in year 2012 month 7 only?
Thanks,
That is true, I missed the flag, will correct it now
ok. Makes sense then.
Do something like this
data have;
input id$ year month var1-var7;
datalines;
1 2012 7 0 0 0 0 1 1 1
1 2012 8 0 0 0 0 0 1 1
1 2012 9 1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1 0 0 1 1 1 0 0
2 2012 7 0 0 0 0 0 0 0
2 2012 8 0 0 0 0 0 0 0
2 2012 9 0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1 0 0 1 1 1 0 0
2 2013 3 1 1 0 0 0 0 0
3 2012 7 0 0 0 0 1 1 1
3 2012 8 0 0 0 0 0 0 0
3 2012 9 0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1 1 0 0 0 0 1 0
4 2013 4 0 0 1 1 0 0 0
;
proc sort data=have;
by id year month;
run;
data want;
set have;
by id;
array vars{*} var1-var7;
monthsbetween = intck('month', lag(mdy(month, 1, year)), mdy(month, 1, year));
if first.id then do;
monthsbetween=.;
outflag=0;
end;
if sum(of vars[*])>1 then twoormore=1;
else twoormore=0;
if monthsbetween=1 and twoormore=1 and lag(twoormore)=1 then outflag=1;
retain outflag;
keep id;
if last.id and outflag=1;
run;
That is perfect. I really appreciate your help.
What if I want to include the vars which contributed for the output dataset.
Thanks draycut,
Now, I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.
Here,
if sum(of vars[*])>1 then twoormore=1;
else twoormore=0; this counts the sum of any flag=1 month by month, but i want the sum to be of the same variables in both the first and second month.
When selecting those having twoormore=1 for two or more consecutive months, i want only the same variables in the first month to contribute to the sum not a different variable in the second month.
Based on the above data, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.
if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.
The output desired in the @PeterClemmensen dataset have is only id '1'
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.