I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.
I want only the same variables in the first month to contribute to the case not a different variable in the second month.
Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.
if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.
The output desired in the dataset have below is only id '1'.
data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;
You've now clarified your object. For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.
This is a single step solution that, for each ID, reads the series of observations twice. In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.
The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:
data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;
data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;
  array consec_ones{30} _temporary_;
  array var{*} var: ;
  if first.id then call missing(F,S,of consec_ones{*});
  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;
  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;
Edit: I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.
 
Notes:
The major "trick" here is to interleave dataset HAVE with itself - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID; .
ID1, included because of these records? Even though Var5 is not the same?
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
ID2, not included even though VAR4 is the same?
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
This is inconsistent with the rule you've specified, so you either need to clarify your data or your rules.
I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month.
My translation of your criteria:
@Abimal_Zippi wrote:
I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.
I want only the same variables in the first month to contribute to the case not a different variable in the second month.
Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.
if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.
The output desired in the dataset have below is only id '1'.
data have; input id $ year month var1-var7; datalines; 1 2012 7 0 0 0 0 1 1 1 1 2012 8 0 0 0 0 0 1 1 1 2012 9 1 0 0 0 0 0 0 1 2012 10 0 0 1 0 0 0 0 1 2012 11 0 1 0 1 0 1 0 1 2012 12 0 0 0 0 0 0 1 1 2013 1 0 0 1 1 1 0 0 2 2012 7 0 0 0 0 0 0 0 2 2012 8 0 0 0 0 0 0 0 2 2012 9 0 0 0 0 0 0 0 2 2012 10 0 0 0 0 0 0 0 2 2012 11 0 0 0 0 0 0 0 2 2012 12 0 0 0 1 0 1 0 2 2013 1 0 0 1 1 1 0 0 2 2013 3 1 1 0 0 0 0 0 3 2012 7 0 0 0 0 1 1 1 3 2012 8 0 0 0 0 0 0 0 3 2012 9 0 0 0 0 0 0 0 3 2012 10 0 0 1 0 0 0 0 4 2012 11 0 1 0 0 0 1 0 4 2012 12 0 0 0 0 0 0 0 4 2013 1 1 0 0 0 0 1 0 4 2013 4 0 0 1 1 0 0 0 ;
Thank you,
The difference here, in id 1 there is a flag=1 for two or more variables where the names of the variables is same for the two consecutive months , but in id 2 there is a flag=1 for two or more variables but the name of the variables are different, so I will not count id2.
Hi,
Please try this:
proc sort data = have out=sorted;
by id year month;
run;
data want(drop=i);
set sorted;
cnt=0;
array v(*) var1--var7;
do i = 1 to dim(v)-1;
 if (v(i)=1 and v(i+1)=1) then cnt=cnt+1;
end;
if cnt>=1;
run;
proc transpose data=want out=final(where=(col1 ne 0));
by id year month;
var var1--var7;
run;
proc sql;
select id, year, _name_,sum(col1) from final
group by id, year, _name_
having mean(month)=median(month) and sum(col1)>1;
quit;
You've now clarified your object. For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.
This is a single step solution that, for each ID, reads the series of observations twice. In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.
The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:
data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;
data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;
  array consec_ones{30} _temporary_;
  array var{*} var: ;
  if first.id then call missing(F,S,of consec_ones{*});
  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;
  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;
Edit: I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.
 
Notes:
The major "trick" here is to interleave dataset HAVE with itself - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID; .
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
