SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

categorical data management

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

categorical data management

Hi,

I got a question about below problem:

I have some records for every person's job per month. If he/she complete, that month will be 1, otherwise 0. The form like below:

Name

Jan

Feb

Mar

Apr

May

June

July

Aug

Sep

Oct

Nov

Dec

Amy

1

0

0

0

0

1

0

0

1

0

0

1

David

1

1

1

1

1

0

0

1

0

1

0

0

John

0

0

1

0

0

0

1

1

1

1

1

1

The question is: How to find person that doesn't complete job more than 3 continuous months (contains 3) in one record? For example, Amy has 0 from Feb to May, and John has 0 from Apr to June, Amy and John should be extract.

 

Anyone has some ideas?

Many thanks,

Chen 


Accepted Solutions
Solution
‎01-15-2016 10:00 AM
Super User
Posts: 10,516

Re: categorical data management

By extract do you mean that the result of this operation is a data set with those rows of data? If so this may get you started:

 

/* example data to work with*/
data have;
   input Name $	Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec ;
datalines;
Amy	1	0	0	0	0	1	0	0	1	0	0	1
David	1	1	1	1	1	0	0	1	0	1	0	0
John	0	0	1	0	0	0	1	1	1	1	1	1
;
run;



data want;
   set have;
   array m Jan  Feb Mar Apr May June July Aug Sep Oct Nov Dec;
   do i= 1 to (dim(m) -2);
      if sum(m[i],m[i+1],m[i+2])=0 then leave;
      else if i=10 then delete;
   end;
   drop i;
run;

BUT you will need a completely different approach if you ever want to look across the year boundary such as Dec 2014, Jan 2015, Feb 2015.

 

View solution in original post


All Replies
Super User
Posts: 17,868

Re: categorical data management

Concatenate all the months into one string variable and search for the string 111. If you find at least that, then you know they've done 3 consecutive months.

Look at CATS/CATT functions for the concatenation and index or find for searching for the string 111

Occasional Contributor
Posts: 7

Re: categorical data management

It works!! Thanks so much Reeza. 

Solution
‎01-15-2016 10:00 AM
Super User
Posts: 10,516

Re: categorical data management

By extract do you mean that the result of this operation is a data set with those rows of data? If so this may get you started:

 

/* example data to work with*/
data have;
   input Name $	Jan	Feb	Mar	Apr	May	June	July	Aug	Sep	Oct	Nov	Dec ;
datalines;
Amy	1	0	0	0	0	1	0	0	1	0	0	1
David	1	1	1	1	1	0	0	1	0	1	0	0
John	0	0	1	0	0	0	1	1	1	1	1	1
;
run;



data want;
   set have;
   array m Jan  Feb Mar Apr May June July Aug Sep Oct Nov Dec;
   do i= 1 to (dim(m) -2);
      if sum(m[i],m[i+1],m[i+2])=0 then leave;
      else if i=10 then delete;
   end;
   drop i;
run;

BUT you will need a completely different approach if you ever want to look across the year boundary such as Dec 2014, Jan 2015, Feb 2015.

 

Occasional Contributor
Posts: 7

Re: categorical data management

Yes, it works!!! Thanks so much!

Trusted Advisor
Posts: 1,115

Re: categorical data management

I support @Reeza's suggestion (with the obvious modification to search for '000'), although @ballardw's is fine, too. The selection (based on a HAVE dataset like that created by ballardw) could be as short as this:

data want;
set have;
if find(cats(of Jan--Dec),'000');
run;

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 371 views
  • 4 likes
  • 4 in conversation