SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Abimal_Zippi
Fluorite | Level 6

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and  8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

 

 

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You've now clarified your object.  For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

 

This is a single step solution that, for each ID, reads the series of observations twice.  In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

 

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

 

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

 

 

Edit:  I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

 

 

 

Notes:

  1. Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID.  I used 30, meaning I expected no more that 29 records for any ID.
  2. F=1,2,3,... for record sequence within the first pass through an ID.   S=1,2,3 for the same during the second pass.
  3. The DIF(x) function is equivalent to X-lag(X).  So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

 

The major "trick" here is to interleave dataset HAVE with itself  - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Reeza
Super User

ID1, included because of these records? Even though Var5 is not the same?

1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1

 

ID2, not included even though VAR4 is the same? 

 

2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0

 

This is inconsistent with the rule you've specified, so you either need to clarify your data or your rules. 

 

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. 

 

My translation of your criteria:

 

  • Flag=1
  • On two or more consecutive months
  • On same variable

 

 


@Abimal_Zippi wrote:

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and  8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

 

 

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

 


 

Abimal_Zippi
Fluorite | Level 6

Thank you,

 

The difference here, in id 1  there is a flag=1 for two or more variables where the names of the variables is same for the two consecutive months , but in id 2 there is a flag=1 for two or more variables but the name of the variables are different, so I will not count id2.

stat_sas
Ammonite | Level 13

Hi,

 

Please try this:

 

proc sort data = have out=sorted;
by id year month;
run;

 

data want(drop=i);
set sorted;
cnt=0;
array v(*) var1--var7;
do i = 1 to dim(v)-1;
if (v(i)=1 and v(i+1)=1) then cnt=cnt+1;
end;
if cnt>=1;
run;

 

proc transpose data=want out=final(where=(col1 ne 0));
by id year month;
var var1--var7;
run;


proc sql;
select id, year, _name_,sum(col1) from final
group by id, year, _name_
having mean(month)=median(month) and sum(col1)>1;
quit;

mkeintz
PROC Star

You've now clarified your object.  For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

 

This is a single step solution that, for each ID, reads the series of observations twice.  In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

 

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

 

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

 

 

Edit:  I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

 

 

 

Notes:

  1. Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID.  I used 30, meaning I expected no more that 29 records for any ID.
  2. F=1,2,3,... for record sequence within the first pass through an ID.   S=1,2,3 for the same during the second pass.
  3. The DIF(x) function is equivalent to X-lag(X).  So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

 

The major "trick" here is to interleave dataset HAVE with itself  - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 8018 views
  • 0 likes
  • 4 in conversation