Help using Base SAS procedures

Output of statistics

Reply
Occasional Contributor
Posts: 9

Output of statistics

I am struggling to clearly write my question so please hang in there.  I have a dataset that has a column of unique identification numbers.  This column is called “person”.  Then I have a column called “tasks” that lists individual tasks for the unique person. The tasks are provides as text such as tasks A,B, and C.  There in only one task listed for each row.  So if person 5 did three tasks then there are three rows for person 5.  It looks like this:

Person             tasks

5                      A

5                      B

5                      C  

I am trying to get SAS to provide a summary of statistics that will tell me how many persons only completed task A, how many persons completed both tasks A and B, how many persons completed both tasks A and C, and so on to cover all the other combinations of tasks.  I don't care about the task of specific individuals.  I just want to know the overall statistics on the number of individuals of how many persons did the tasks.  My attempts with proc means were unsuccessful and I would greatly appreciate any help with this project. 

Valued Guide
Posts: 765

Re: Output of statistics

hi, one way (sure there are many) ...

only three tasks ABC (if more change the LENGTH statement in the 2nd data step)

data x;

input person task :$1. @@;

datalines;

1 A 1 C

2 C

3 B 3 C

4 B 4 C

5 A 5 B 5 C

6 C 6 A

7 B

;

proc sort data=x;

by person task;

run;


data x;

length tasks $3;

do until (last.person);

   set x;

   by person;

   tasks = catt(tasks, task);

end;

run;


proc freq data=x;

table tasks / nocum;

run;


tasks    Frequency     Percent

ABC             1       14.29

AC              2       28.57

B               1       14.29

BC              2       28.57

C               1       14.29

PROC Star
Posts: 1,089

Re: Output of statistics

How about:

data have;

input person task $;

cards;

1 A

2 B

3 A

3 B

3 C

4 B

4 C

5 A

5 B

5 C

6 A

run;

proc sort;

by person task;

data want;

length person_tasks $ 255;

retain person_tasks;

set have;

by person;

if first.person then person_tasks = "";

person_tasks = cats(person_tasks, task);

if last.person then output;

run;

proc freq data=want;

tables person_tasks;

run;

Tom

Occasional Contributor
Posts: 9

Re: Output of statistics

Thank you to both of you for the excellent solutions to my problem.

TomKari, can you explain to me what these three lines of code are doing?  Thanks again for your help.

if first.person then person_tasks = "";

person_tasks = cats(person_tasks, task);

if last.person then output;

Valued Guide
Posts: 765

Re: Output of statistics

hi ... Tom is doing the same thing I am doing, but without a loop

since I use a loop to read the observations one person at a time ...

#1  no need for a RETAIN statement

#2  no need to set cumulative tasks (Tom's variable person_tasks) to missing for each new person

#3  no need to check for last person to output an observation

in Tom's code there's no loop for person-by-person reading of the data, so ...

#1  if first.person then person_tasks = ""; ---> set variable person_task variable to missing each time a new person is enountered

#2  person_tasks = cats(person_tasks, task); ---> make a new variable by concatenating the values of the variable TASK into one variable

#3  if last.person then output; ---> when the last observation for a person is encountered, write an observation to the new data set

by the way, the loop I used is referred to as a "DOW Loop" and there are a lot of good papers on the topics, for example ...

HOW to DOW

http://support.sas.com/resources/papers/proceedings12/156-2012.pdf

just use a Google search for more ... SAS DOW

also ... another way (once again with a DOW loop) ... uses SUBSTR instead of a CAT function ...

data x;

input person task :$1. @@;

datalines;

1 A 1 C

2 C

3 B 3 C

4 B 4 C

5 A 5 B 5 C

6 C 6 A

7 B

;

proc sort data=x;

by person task;

run;

data x;

length tasks $3;

do _n_ = 1 by 1 until (last.person);

   set x;

   by person;

   substr(tasks,_n_,1) = task;

end;

run;

PROC Star
Posts: 1,089

Re: Output of statistics

Yes, Mike has explained it perfectly. Mostly, I think it's a matter of style preferences, the two pieces of code will be pretty much identical in terms of performance.

Although it's not required in this instance, I strongly recommend you review the SAS documentation on how the following structure works:

SET X;

  BY VARY;

IF FIRST.VARY ...;

IF LAST.VARY ...;

It is a very useful construct in solving a number of problems, and (I think) fairly unique to SAS.

Best,

  Tom

Ask a Question
Discussion stats
  • 5 replies
  • 231 views
  • 0 likes
  • 3 in conversation