Calcite | Level 5

## Output of statistics

I am struggling to clearly write my question so please hang in there.  I have a dataset that has a column of unique identification numbers.  This column is called “person”.  Then I have a column called “tasks” that lists individual tasks for the unique person. The tasks are provides as text such as tasks A,B, and C.  There in only one task listed for each row.  So if person 5 did three tasks then there are three rows for person 5.  It looks like this:

5                      A

5                      B

5                      C

I am trying to get SAS to provide a summary of statistics that will tell me how many persons only completed task A, how many persons completed both tasks A and B, how many persons completed both tasks A and C, and so on to cover all the other combinations of tasks.  I don't care about the task of specific individuals.  I just want to know the overall statistics on the number of individuals of how many persons did the tasks.  My attempts with proc means were unsuccessful and I would greatly appreciate any help with this project.

5 REPLIES 5
Rhodochrosite | Level 12

## Re: Output of statistics

hi, one way (sure there are many) ...

only three tasks ABC (if more change the LENGTH statement in the 2nd data step)

data x;

datalines;

1 A 1 C

2 C

3 B 3 C

4 B 4 C

5 A 5 B 5 C

6 C 6 A

7 B

;

proc sort data=x;

run;

data x;

do until (last.person);

set x;

by person;

end;

run;

proc freq data=x;

run;

ABC             1       14.29

AC              2       28.57

B               1       14.29

BC              2       28.57

C               1       14.29

Onyx | Level 15

## Re: Output of statistics

data have;

cards;

1 A

2 B

3 A

3 B

3 C

4 B

4 C

5 A

5 B

5 C

6 A

run;

proc sort;

data want;

set have;

by person;

if first.person then person_tasks = "";

if last.person then output;

run;

proc freq data=want;

run;

Tom

Calcite | Level 5

## Re: Output of statistics

Thank you to both of you for the excellent solutions to my problem.

TomKari, can you explain to me what these three lines of code are doing?  Thanks again for your help.

if first.person then person_tasks = "";

if last.person then output;

Rhodochrosite | Level 12

## Re: Output of statistics

hi ... Tom is doing the same thing I am doing, but without a loop

since I use a loop to read the observations one person at a time ...

#1  no need for a RETAIN statement

#2  no need to set cumulative tasks (Tom's variable person_tasks) to missing for each new person

#3  no need to check for last person to output an observation

in Tom's code there's no loop for person-by-person reading of the data, so ...

#1  if first.person then person_tasks = ""; ---> set variable person_task variable to missing each time a new person is enountered

#3  if last.person then output; ---> when the last observation for a person is encountered, write an observation to the new data set

by the way, the loop I used is referred to as a "DOW Loop" and there are a lot of good papers on the topics, for example ...

HOW to DOW

http://support.sas.com/resources/papers/proceedings12/156-2012.pdf

just use a Google search for more ... SAS DOW

also ... another way (once again with a DOW loop) ... uses SUBSTR instead of a CAT function ...

data x;

datalines;

1 A 1 C

2 C

3 B 3 C

4 B 4 C

5 A 5 B 5 C

6 C 6 A

7 B

;

proc sort data=x;

run;

data x;

do _n_ = 1 by 1 until (last.person);

set x;

by person;

end;

run;

Onyx | Level 15

## Re: Output of statistics

Yes, Mike has explained it perfectly. Mostly, I think it's a matter of style preferences, the two pieces of code will be pretty much identical in terms of performance.

Although it's not required in this instance, I strongly recommend you review the SAS documentation on how the following structure works:

SET X;

BY VARY;

IF FIRST.VARY ...;

IF LAST.VARY ...;

It is a very useful construct in solving a number of problems, and (I think) fairly unique to SAS.

Best,

Tom

Discussion stats
• 5 replies
• 861 views
• 0 likes
• 3 in conversation