Solved: Re: Multiple variables with same value

Abimal_Zippi · Posted 11-30-2017 05:45 PM

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

mkeintz · Posted 12-01-2017 12:38 AM

You've now clarified your object. For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

This is a single step solution that, for each ID, reads the series of observations twice. In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

Edit: I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

Notes:

Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID. I used 30, meaning I expected no more that 29 records for any ID.
F=1,2,3,... for record sequence within the first pass through an ID. S=1,2,3 for the same during the second pass.
The DIF(x) function is equivalent to X-lag(X). So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

The major "trick" here is to interleave dataset HAVE with itself - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID; .

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Reeza · Posted 11-30-2017 08:04 PM

ID1, included because of these records? Even though Var5 is not the same?

1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1

ID2, not included even though VAR4 is the same?

2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0

This is inconsistent with the rule you've specified, so you either need to clarify your data or your rules.

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month.

My translation of your criteria:

Flag=1
On two or more consecutive months
On same variable

@Abimal_Zippi wrote:

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.
data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

Abimal_Zippi · Posted 11-30-2017 11:27 PM

Thank you,

The difference here, in id 1 there is a flag=1 for two or more variables where the names of the variables is same for the two consecutive months , but in id 2 there is a flag=1 for two or more variables but the name of the variables are different, so I will not count id2.

stat_sas · Posted 11-30-2017 10:15 PM

Hi,

Please try this:

proc sort data = have out=sorted;
by id year month;
run;

data want(drop=i);
set sorted;
cnt=0;
array v(*) var1--var7;
do i = 1 to dim(v)-1;
if (v(i)=1 and v(i+1)=1) then cnt=cnt+1;
end;
if cnt>=1;
run;

proc transpose data=want out=final(where=(col1 ne 0));
by id year month;
var var1--var7;
run;

proc sql;
select id, year, _name_,sum(col1) from final
group by id, year, _name_
having mean(month)=median(month) and sum(col1)>1;
quit;

mkeintz · Posted 12-01-2017 12:38 AM

You've now clarified your object. For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

This is a single step solution that, for each ID, reads the series of observations twice. In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

Edit: I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

Notes:

Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID. I used 30, meaning I expected no more that 29 records for any ID.
F=1,2,3,... for record sequence within the first pass through an ID. S=1,2,3 for the same during the second pass.
The DIF(x) function is equivalent to X-lag(X). So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

The major "trick" here is to interleave dataset HAVE with itself - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID; .

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away