BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Abimal_Zippi
Fluorite | Level 6

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and  8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

 

 

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You've now clarified your object.  For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

 

This is a single step solution that, for each ID, reads the series of observations twice.  In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

 

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

 

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

 

 

Edit:  I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

 

 

 

Notes:

  1. Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID.  I used 30, meaning I expected no more that 29 records for any ID.
  2. F=1,2,3,... for record sequence within the first pass through an ID.   S=1,2,3 for the same during the second pass.
  3. The DIF(x) function is equivalent to X-lag(X).  So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

 

The major "trick" here is to interleave dataset HAVE with itself  - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Reeza
Super User

ID1, included because of these records? Even though Var5 is not the same?

1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1

 

ID2, not included even though VAR4 is the same? 

 

2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0

 

This is inconsistent with the rule you've specified, so you either need to clarify your data or your rules. 

 

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. 

 

My translation of your criteria:

 

  • Flag=1
  • On two or more consecutive months
  • On same variable

 

 


@Abimal_Zippi wrote:

I want to keep only those id's having a flag=1 for two or more consecutive months for the same variables for the first as well as the second month. You see some id have a flag=1 for different variables in the first and second month.

I want only the same variables in the first month to contribute to the case not a different variable in the second month.

Based on the data below, for instance id 1 have flag=1 for var 6 and 7 at month 7 and  8.... this perfectly applies to my case.

if you see id 2, there is a flag=1 for var 4 and 6 at month 12 2012, but in the next month, i.e. month 1 of 2013, the value for var 6 is 0 not 1 that leads id 2 to be excluded in the output.

The output desired in the dataset have below is only id '1'.

 

 

data have;
input id $ year month var1-var7;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
;

 


 

Abimal_Zippi
Fluorite | Level 6

Thank you,

 

The difference here, in id 1  there is a flag=1 for two or more variables where the names of the variables is same for the two consecutive months , but in id 2 there is a flag=1 for two or more variables but the name of the variables are different, so I will not count id2.

stat_sas
Ammonite | Level 13

Hi,

 

Please try this:

 

proc sort data = have out=sorted;
by id year month;
run;

 

data want(drop=i);
set sorted;
cnt=0;
array v(*) var1--var7;
do i = 1 to dim(v)-1;
if (v(i)=1 and v(i+1)=1) then cnt=cnt+1;
end;
if cnt>=1;
run;

 

proc transpose data=want out=final(where=(col1 ne 0));
by id year month;
var var1--var7;
run;


proc sql;
select id, year, _name_,sum(col1) from final
group by id, year, _name_
having mean(month)=median(month) and sum(col1)>1;
quit;

mkeintz
PROC Star

You've now clarified your object.  For any pair of consecutive observations within an ID, you want to assign flag=1 for both of those observations if they have at least 2 variables with consecutive values of 1.

 

This is a single step solution that, for each ID, reads the series of observations twice.  In the first pass, each record F=1,2,3, ... is compared to the prior record and a temporary variable consec_ones{F} counts the number of vars with a value of 1 in the current and prior record.

 

The second pass, rereads the series, and for each record (S=1,2,3) and if the current record has consec_ones>1 or the next record has consec_ones > 1 then set the flag:

 

data have;
input id $ year month var1-var7 ;
datalines;
1 2012 7  0 0 0 0 1 1 1
1 2012 8  0 0 0 0 0 1 1
1 2012 9  1 0 0 0 0 0 0
1 2012 10 0 0 1 0 0 0 0
1 2012 11 0 1 0 1 0 1 0
1 2012 12 0 0 0 0 0 0 1
1 2013 1  0 0 1 1 1 0 0
2 2012 6  0 0 1 1 1 0 0
2 2012 7  0 0 0 0 0 0 0
2 2012 8  0 0 0 0 0 0 0
2 2012 9  0 0 0 0 0 0 0
2 2012 10 0 0 0 0 0 0 0
2 2012 11 0 0 0 0 0 0 0
2 2012 12 0 0 0 1 0 1 0
2 2013 1  0 0 1 1 1 0 0
2 2013 3  1 1 0 0 0 0 0
3 2012 7  0 0 0 0 1 1 1
3 2012 8  0 0 0 0 0 0 0
3 2012 9  0 0 0 0 0 0 0
3 2012 10 0 0 1 0 0 0 0
4 2012 11 0 1 0 0 0 1 0
4 2012 12 0 0 0 0 0 0 0
4 2013 1  1 0 0 0 0 1 0
4 2013 4  0 0 1 1 0 0 0
run;


data want (drop=v f s);
  set have (in=firstpass)
      have (in=secondpass);
  by id;

  array consec_ones{30} _temporary_;
  array var{*} var: ;

  if first.id then call missing(F,S,of consec_ones{*});

  if firstpass then do;
    F+1;
    consec_ones{F}=0;
    do v=1 to dim(var);
      if dif(var{v})=0 and var{v}=1 then consec_ones{F}+1;
    end;
  end;
  if first.id then consec_ones{1}=0;

  if secondpass;
  S+1;
  flag= max(consec_ones{S},consec_ones{S+1})>1;
run;

 

 

Edit:  I added the if first.id then consec_ones{1}=0; statement to eliminate artificial creation of consec_one{1} from comparing the beginning of one ID to its predecessor at the end of the previous ID.

 

 

 

Notes:

  1. Set the size of the array consec_ones to at least 1 greater than the largest number of observations expected for any ID.  I used 30, meaning I expected no more that 29 records for any ID.
  2. F=1,2,3,... for record sequence within the first pass through an ID.   S=1,2,3 for the same during the second pass.
  3. The DIF(x) function is equivalent to X-lag(X).  So if DIF(x)=0 and the current value of X=1, then you've revealed consecutive 1's.

 

The major "trick" here is to interleave dataset HAVE with itself  - the pair of statements SET HAVE (in=firstpass) HAVE (in=secondpass); BY ID;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 7070 views
  • 0 likes
  • 4 in conversation