DATA Step, Macro, Functions and more

Multiple variables with same value

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Multiple variables with same value

[ Edited ]

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and  8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

 

 

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

 


Accepted Solutions
Solution
‎12-02-2017 09:45 PM
Trusted Advisor
Posts: 1,284

Re: Multiple variables with same value

[ Edited ]
Posted in reply to eshty_tes

You've now clarified your object.  For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

 

This is a single step solution that, for each ID, reads the series of observations twice.  In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

 

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

 

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

 

 

Edit:  I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

 

 

 

Notes:

  1. Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID.  I used 30, meaning I expected no more that 29 records for any ID.
  2. F=1,2,3,... for record sequence within the first pass through an ID.   S=1,2,3 for the same during the second pass.
  3. The DIF(x) function is equivalent to X-lag(X).  So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

 

The major "trick" here is to interleave dataset HAVE with itself  - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID;

View solution in original post


All Replies
Super User
Posts: 22,845

Re: Multiple variables with same value

[ Edited ]
Posted in reply to eshty_tes

ID1, included because of these records? Even though Var5 is not the same?

1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1

 

ID2, not included even though VAR4 is the same? 

 

2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0

 

This is inconsistent with the rule you've specified, so you either need to clarify your data or your rules. 

 

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. 

 

My translation of your criteria:

 

  • Flag=1
  • On two or more consecutive months
  • On same variable

 

 


eshty_tes wrote:

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and  8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

 

 

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

 


 

Occasional Contributor
Posts: 16

Re: Multiple variables with same value

[ Edited ]

Thank you,

 

The difference here, in id 1  there is a flag=1 for two or more variables where the names of the variables is same for the two consecutive months , but in id 2 there is a flag=1 for two or more variables but the name of the variables are different, so I will not count id2.

Trusted Advisor
Posts: 1,270

Re: Multiple variables with same value

Posted in reply to eshty_tes

Hi,

 

Please try this:

 

proc sort data = have out=sorted;
by id year month;
run;

 

data want(drop=i);
set sorted;
cnt=0;
array v(*) var1--var7;
do i = 1 to dim(v)-1;
if (v(i)=1 and v(i+1)=1) then cnt=cnt+1;
end;
if cnt>=1;
run;

 

proc transpose data=want out=final(where=(col1 ne 0));
by id year month;
var var1--var7;
run;


proc sql;
select id, year, _name_,sum(col1) from final
group by id, year, _name_
having mean(month)=median(month) and sum(col1)>1;
quit;

Solution
‎12-02-2017 09:45 PM
Trusted Advisor
Posts: 1,284

Re: Multiple variables with same value

[ Edited ]
Posted in reply to eshty_tes

You've now clarified your object.  For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

 

This is a single step solution that, for each ID, reads the series of observations twice.  In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

 

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

 

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

 

 

Edit:  I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

 

 

 

Notes:

  1. Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID.  I used 30, meaning I expected no more that 29 records for any ID.
  2. F=1,2,3,... for record sequence within the first pass through an ID.   S=1,2,3 for the same during the second pass.
  3. The DIF(x) function is equivalent to X-lag(X).  So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

 

The major "trick" here is to interleave dataset HAVE with itself  - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 538 views
  • 0 likes
  • 4 in conversation