BookmarkSubscribeRSS Feed
megalops
Calcite | Level 5

I am struggling to clearly write my question so please hang in there.  I have a dataset that has a column of unique identification numbers.  This column is called “person”.  Then I have a column called “tasks” that lists individual tasks for the unique person. The tasks are provides as text such as tasks A,B, and C.  There in only one task listed for each row.  So if person 5 did three tasks then there are three rows for person 5.  It looks like this:

Person             tasks

5                      A

5                      B

5                      C  

I am trying to get SAS to provide a summary of statistics that will tell me how many persons only completed task A, how many persons completed both tasks A and B, how many persons completed both tasks A and C, and so on to cover all the other combinations of tasks.  I don't care about the task of specific individuals.  I just want to know the overall statistics on the number of individuals of how many persons did the tasks.  My attempts with proc means were unsuccessful and I would greatly appreciate any help with this project. 

5 REPLIES 5
MikeZdeb
Rhodochrosite | Level 12

hi, one way (sure there are many) ...

only three tasks ABC (if more change the LENGTH statement in the 2nd data step)

data x;

input person task :$1. @@;

datalines;

1 A 1 C

2 C

3 B 3 C

4 B 4 C

5 A 5 B 5 C

6 C 6 A

7 B

;

proc sort data=x;

by person task;

run;


data x;

length tasks $3;

do until (last.person);

   set x;

   by person;

   tasks = catt(tasks, task);

end;

run;


proc freq data=x;

table tasks / nocum;

run;


tasks    Frequency     Percent

ABC             1       14.29

AC              2       28.57

B               1       14.29

BC              2       28.57

C               1       14.29

TomKari
Onyx | Level 15

How about:

data have;

input person task $;

cards;

1 A

2 B

3 A

3 B

3 C

4 B

4 C

5 A

5 B

5 C

6 A

run;

proc sort;

by person task;

data want;

length person_tasks $ 255;

retain person_tasks;

set have;

by person;

if first.person then person_tasks = "";

person_tasks = cats(person_tasks, task);

if last.person then output;

run;

proc freq data=want;

tables person_tasks;

run;

Tom

megalops
Calcite | Level 5

Thank you to both of you for the excellent solutions to my problem.

TomKari, can you explain to me what these three lines of code are doing?  Thanks again for your help.

if first.person then person_tasks = "";

person_tasks = cats(person_tasks, task);

if last.person then output;

MikeZdeb
Rhodochrosite | Level 12

hi ... Tom is doing the same thing I am doing, but without a loop

since I use a loop to read the observations one person at a time ...

#1  no need for a RETAIN statement

#2  no need to set cumulative tasks (Tom's variable person_tasks) to missing for each new person

#3  no need to check for last person to output an observation

in Tom's code there's no loop for person-by-person reading of the data, so ...

#1  if first.person then person_tasks = ""; ---> set variable person_task variable to missing each time a new person is enountered

#2  person_tasks = cats(person_tasks, task); ---> make a new variable by concatenating the values of the variable TASK into one variable

#3  if last.person then output; ---> when the last observation for a person is encountered, write an observation to the new data set

by the way, the loop I used is referred to as a "DOW Loop" and there are a lot of good papers on the topics, for example ...

HOW to DOW

http://support.sas.com/resources/papers/proceedings12/156-2012.pdf

just use a Google search for more ... SAS DOW

also ... another way (once again with a DOW loop) ... uses SUBSTR instead of a CAT function ...

data x;

input person task :$1. @@;

datalines;

1 A 1 C

2 C

3 B 3 C

4 B 4 C

5 A 5 B 5 C

6 C 6 A

7 B

;

proc sort data=x;

by person task;

run;

data x;

length tasks $3;

do _n_ = 1 by 1 until (last.person);

   set x;

   by person;

   substr(tasks,_n_,1) = task;

end;

run;

TomKari
Onyx | Level 15

Yes, Mike has explained it perfectly. Mostly, I think it's a matter of style preferences, the two pieces of code will be pretty much identical in terms of performance.

Although it's not required in this instance, I strongly recommend you review the SAS documentation on how the following structure works:

SET X;

  BY VARY;

IF FIRST.VARY ...;

IF LAST.VARY ...;

It is a very useful construct in solving a number of problems, and (I think) fairly unique to SAS.

Best,

  Tom

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1065 views
  • 0 likes
  • 3 in conversation