Hi Team,
I want to identify id's having a flag=1 for two or more of the following variables in two consecutive months. The thing here is there should at least be two or more variables with flag=1, in any combination (could be var 1 and 2 or var 1 and 6 or var 1, 5, 7...), and should be present in two consecutive months. Say in the following dataset id 1 have flag=1 for var5, var6 and var7 in the 7th month, and var6 and var7 in 8th month ..hence id 1 should be present in the output. Just one episode of such scenario is enough to be in the output dataset, disregard of the value in the remaining months.
dataset;
id year month var1 var2 var3 var 4 var5 var6 var7 var..........multiple
1 2012 7 0 0 0 0 1 1 1
1 2012 8 0 0 0 0 0 1 1
1 2012 9 1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1 0 0 1 1 1 0 0
2 2012 7 0 0 0 0 0 0 0
2 2012 8 0 0 0 0 0 0 0
2 2012 9 0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1 0 0 1 1 1 0 0
2 2013 3 1 1 0 0 0 0 0
3 2012 7 0 0 0 0 1 1 1
3 2012 8 0 0 1 1 0 0 0
3 2012 9 0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1 1 0 0 0 0 1 0
4 2013 4 0 0 1 1 0 0 0
Output;
id
1
2
3
ok. Makes sense then.
Do something like this
data have;
input id$ year month var1-var7;
datalines;
1 2012 7 0 0 0 0 1 1 1
1 2012 8 0 0 0 0 0 1 1
1 2012 9 1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1 0 0 1 1 1 0 0
2 2012 7 0 0 0 0 0 0 0
2 2012 8 0 0 0 0 0 0 0
2 2012 9 0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1 0 0 1 1 1 0 0
2 2013 3 1 1 0 0 0 0 0
3 2012 7 0 0 0 0 1 1 1
3 2012 8 0 0 0 0 0 0 0
3 2012 9 0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1 1 0 0 0 0 1 0
4 2013 4 0 0 1 1 0 0 0
;
proc sort data=have;
by id year month;
run;
data want;
set have;
by id;
array vars{*} var1-var7;
monthsbetween = intck('month', lag(mdy(month, 1, year)), mdy(month, 1, year));
if first.id then do;
monthsbetween=.;
outflag=0;
end;
if sum(of vars[*])>1 then twoormore=1;
else twoormore=0;
if monthsbetween=1 and twoormore=1 and lag(twoormore)=1 then outflag=1;
retain outflag;
keep id;
if last.id and outflag=1;
run;
I don't see how id=3 meets these criteria from your sample data? it has flag=1 for more than one variable in year 2012 month 7 only?
Thanks,
That is true, I missed the flag, will correct it now
ok. Makes sense then.
Do something like this
data have;
input id$ year month var1-var7;
datalines;
1 2012 7 0 0 0 0 1 1 1
1 2012 8 0 0 0 0 0 1 1
1 2012 9 1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1 0 0 1 1 1 0 0
2 2012 7 0 0 0 0 0 0 0
2 2012 8 0 0 0 0 0 0 0
2 2012 9 0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1 0 0 1 1 1 0 0
2 2013 3 1 1 0 0 0 0 0
3 2012 7 0 0 0 0 1 1 1
3 2012 8 0 0 0 0 0 0 0
3 2012 9 0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1 1 0 0 0 0 1 0
4 2013 4 0 0 1 1 0 0 0
;
proc sort data=have;
by id year month;
run;
data want;
set have;
by id;
array vars{*} var1-var7;
monthsbetween = intck('month', lag(mdy(month, 1, year)), mdy(month, 1, year));
if first.id then do;
monthsbetween=.;
outflag=0;
end;
if sum(of vars[*])>1 then twoormore=1;
else twoormore=0;
if monthsbetween=1 and twoormore=1 and lag(twoormore)=1 then outflag=1;
retain outflag;
keep id;
if last.id and outflag=1;
run;
That is perfect. I really appreciate your help.
What if I want to include the vars which contributed for the output dataset.
Thanks draycut,
Now, I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.
Here,
if sum(of vars[*])>1 then twoormore=1;
else twoormore=0; this counts the sum of any flag=1 month by month, but i want the sum to be of the same variables in both the first and second month.
When selecting those having twoormore=1 for two or more consecutive months, i want only the same variables in the first month to contribute to the sum not a different variable in the second month.
Based on the above data, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.
if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.
The output desired in the @PeterClemmensen dataset have is only id '1'
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.