## categorical data management

Solved
Occasional Contributor
Posts: 7

# categorical data management

Hi,

I got a question about below problem:

I have some records for every person's job per month. If he/she complete, that month will be 1, otherwise 0. The form like below:

 Name Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec Amy 1 0 0 0 0 1 0 0 1 0 0 1 David 1 1 1 1 1 0 0 1 0 1 0 0 John 0 0 1 0 0 0 1 1 1 1 1 1

The question is: How to find person that doesn't complete job more than 3 continuous months (contains 3) in one record? For example, Amy has 0 from Feb to May, and John has 0 from Apr to June, Amy and John should be extract.

Anyone has some ideas?

Many thanks,

Chen

Accepted Solutions
Solution
‎01-15-2016 10:00 AM
Super User
Posts: 13,332

## Re: categorical data management

By extract do you mean that the result of this operation is a data set with those rows of data? If so this may get you started:

``````/* example data to work with*/
data have;
input Name \$	Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec ;
datalines;
Amy	1	0	0	0	0	1	0	0	1	0	0	1
David	1	1	1	1	1	0	0	1	0	1	0	0
John	0	0	1	0	0	0	1	1	1	1	1	1
;
run;

data want;
set have;
array m Jan  Feb Mar Apr May June July Aug Sep Oct Nov Dec;
do i= 1 to (dim(m) -2);
if sum(m[i],m[i+1],m[i+2])=0 then leave;
else if i=10 then delete;
end;
drop i;
run;``````

BUT you will need a completely different approach if you ever want to look across the year boundary such as Dec 2014, Jan 2015, Feb 2015.

All Replies
Super User
Posts: 23,307

## Re: categorical data management

Concatenate all the months into one string variable and search for the string 111. If you find at least that, then you know they've done 3 consecutive months.

Look at CATS/CATT functions for the concatenation and index or find for searching for the string 111

Occasional Contributor
Posts: 7

## Re: categorical data management

It works!! Thanks so much Reeza.

Solution
‎01-15-2016 10:00 AM
Super User
Posts: 13,332

## Re: categorical data management

By extract do you mean that the result of this operation is a data set with those rows of data? If so this may get you started:

``````/* example data to work with*/
data have;
input Name \$	Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec ;
datalines;
Amy	1	0	0	0	0	1	0	0	1	0	0	1
David	1	1	1	1	1	0	0	1	0	1	0	0
John	0	0	1	0	0	0	1	1	1	1	1	1
;
run;

data want;
set have;
array m Jan  Feb Mar Apr May June July Aug Sep Oct Nov Dec;
do i= 1 to (dim(m) -2);
if sum(m[i],m[i+1],m[i+2])=0 then leave;
else if i=10 then delete;
end;
drop i;
run;``````

BUT you will need a completely different approach if you ever want to look across the year boundary such as Dec 2014, Jan 2015, Feb 2015.

Occasional Contributor
Posts: 7

## Re: categorical data management

Yes, it works!!! Thanks so much!

Posts: 1,159

## Re: categorical data management

I support @Reeza's suggestion (with the obvious modification to search for '000'), although @ballardw's is fine, too. The selection (based on a HAVE dataset like that created by ballardw) could be as short as this:

``````data want;
set have;
if find(cats(of Jan--Dec),'000');
run;``````

🔒 This topic is solved and locked.