SAS Programming

Abimal_Zippi · Posted 11-30-2017 05:45 PM

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

mkeintz · Posted 12-01-2017 12:38 AM

You've now clarified your object. For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

This is a single step solution that, for each ID, reads the series of observations twice. In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

Edit: I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

Notes:

Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID. I used 30, meaning I expected no more that 29 records for any ID.
F=1,2,3,... for record sequence within the first pass through an ID. S=1,2,3 for the same during the second pass.
The DIF(x) function is equivalent to X-lag(X). So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

The major "trick" here is to interleave dataset HAVE with itself - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID; .

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Reeza · Posted 11-30-2017 08:04 PM

ID1, included because of these records? Even though Var5 is not the same?

1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1

ID2, not included even though VAR4 is the same?

2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0

This is inconsistent with the rule you've specified, so you either need to clarify your data or your rules.

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month.

My translation of your criteria:

Flag=1
On two or more consecutive months
On same variable

@Abimal_Zippi wrote:

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and 8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.
data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

Abimal_Zippi · Posted 11-30-2017 11:27 PM

Thank you,

The difference here, in id 1 there is a flag=1 for two or more variables where the names of the variables is same for the two consecutive months , but in id 2 there is a flag=1 for two or more variables but the name of the variables are different, so I will not count id2.

stat_sas · Posted 11-30-2017 10:15 PM

Hi,

Please try this:

proc sort data = have out=sorted;
by id year month;
run;

data want(drop=i);
set sorted;
cnt=0;
array v(*) var1--var7;
do i = 1 to dim(v)-1;
if (v(i)=1 and v(i+1)=1) then cnt=cnt+1;
end;
if cnt>=1;
run;

proc transpose data=want out=final(where=(col1 ne 0));
by id year month;
var var1--var7;
run;

proc sql;
select id, year, _name_,sum(col1) from final
group by id, year, _name_
having mean(month)=median(month) and sum(col1)>1;
quit;

mkeintz · Posted 12-01-2017 12:38 AM

You've now clarified your object. For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

This is a single step solution that, for each ID, reads the series of observations twice. In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

Edit: I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

Notes:

Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID. I used 30, meaning I expected no more that 29 records for any ID.
F=1,2,3,... for record sequence within the first pass through an ID. S=1,2,3 for the same during the second pass.
The DIF(x) function is equivalent to X-lag(X). So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

The major "trick" here is to interleave dataset HAVE with itself - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID; .

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Re: Multiple variables with same value

Conditional replacement of a variable values

Macro variable with multiple values

Replacing multiple variable values for multiple variables

Change Multiple variables with different values

Filter data by multiple variables

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...