Hi all,
I have a data set which contains information on participants that were a part of projects for multiple rounds. The variables include:
ParticipantID - Participant ID, because participants are in multiple rounds, there are multiple rows for each participant.
Round_Project - Indicates the round the project was in (i.e. Round_1A_Project_ID indicates Round 1 Project 1, Round_1B_Project_ID indicates Round 1 Project 2,
Round_2A_Project_ID indicates Round 2 Project 1) There are a total of 4 rounds, with up to three different projects per round.
Project_ID - gives the specific project ID
I would like to construct a new variable (I would name it 'Team') that groups the participants based on the same ProjectID and Round. For example I would want to know the best way to construct the 'Team' variable such as this:
ParticipantID | Round_Project | Project_ID | Team |
1387 | Round_1A_Project_ID | 2020-01-2102 | 1 |
1406 | Round_1A_Project_ID | 2020-01-2102 | 1 |
1387 | Round_1B_Project_ID | 2020-01-2133 | 1 |
1406 | Round_1B_Project_ID | 2020-01-2133 | 1 |
1387 | Round_1C_Project_ID | 2020-01-2136 | 1 |
1406 | Round_1C_Project_ID | 2020-01-2136 | 1 |
1387 | Round_3A_Project_ID | 2020-01-2310 | 1 |
1406 | Round_3A_Project_ID | 2020-01-2310 | 1 |
1371 | Round_1A_Project_ID | 2020-01-2103 | 2 |
1389 | Round_1A_Project_ID | 2020-01-2103 | 2 |
1371 | Round_3A_Project_ID | 2020-01-2311 | 2 |
1389 | Round_3A_Project_ID | 2020-01-2311 | 2 |
1371 | Round_3B_Project_ID | 2020-01-2304 | 2 |
1389 | Round_3B_Project_ID | 2020-01-2304 | 2 |
1385 | Round_1A_Project_ID | 2020-01-2104 | 3 |
1414 | Round_1A_Project_ID | 2020-01-2104 | 3 |
1429 | Round_1A_Project_ID | 2020-01-2104 | 3 |
1385 | Round_2A_Project_ID | 2020-01-2217 | 3 |
1414 | Round_2A_Project_ID | 2020-01-2217 | 3 |
1429 | Round_2A_Project_ID | 2020-01-2217 | 3 |
Code for the original data.
data SASHELP; input ParticipantID:32. Round_Project:$20. Project_ID:$20.; datalines; 1387 Round_1A_Project_ID 2020-01-2102 1406 Round_1A_Project_ID 2020-01-2102 1387 Round_1B_Project_ID 2020-01-2133 1406 Round_1B_Project_ID 2020-01-2133 1387 Round_1C_Project_ID 2020-01-2136 1406 Round_1C_Project_ID 2020-01-2136 1387 Round_3A_Project_ID 2020-01-2310 1406 Round_3A_Project_ID 2020-01-2310 1371 Round_1A_Project_ID 2020-01-2103 1389 Round_1A_Project_ID 2020-01-2103 1371 Round_3A_Project_ID 2020-01-2311 1389 Round_3A_Project_ID 2020-01-2311 1371 Round_3B_Project_ID 2020-01-2304 1389 Round_3B_Project_ID 2020-01-2304 1385 Round_1A_Project_ID 2020-01-2104 1414 Round_1A_Project_ID 2020-01-2104 1429 Round_1A_Project_ID 2020-01-2104 1385 Round_2A_Project_ID 2020-01-2217 1414 Round_2A_Project_ID 2020-01-2217 1429 Round_2A_Project_ID 2020-01-2217 ;;;;
I was advised that it may be possible to do this with loops, but any solution is welcome.
Thank you!
I suspect you need to transpose then it is easy to see how is sharing the same projects in the same round.
Your values and variable names are too long for easy typing so let's shorten them to make it easier.
data have;
input PID Round $ Project :$20.;
datalines;
1387 1A 2020-01-2102
1406 1A 2020-01-2102
1387 1B 2020-01-2133
1406 1B 2020-01-2133
1387 1C 2020-01-2136
1406 1C 2020-01-2136
1387 3A 2020-01-2310
1406 3A 2020-01-2310
1371 1A 2020-01-2103
1389 1A 2020-01-2103
1371 3A 2020-01-2311
1389 3A 2020-01-2311
1371 3B 2020-01-2304
1389 3B 2020-01-2304
1385 1A 2020-01-2104
1414 1A 2020-01-2104
1429 1A 2020-01-2104
1385 2A 2020-01-2217
1414 2A 2020-01-2217
1429 2A 2020-01-2217
;;;;
So sort by the PID so you can then transpose the data.
proc sort data=have out=step1;
by pid round;
run;
proc transpose data=step1 out=step2(drop=_name_) prefix=round_;
by pid;
id round;
var project ;
run;
Then sort bye the new ROUND... variables and create the teams. The extra copy of the ROUND_1A variable is just to make it easier to find the final by variable so we can test the FIRST. flag for that variable. (If the same variable is listed twice the FIRST./LAST. values reflect the result of second location in the BY variable list.)
proc sort data=step2; by round: ; run;
data want;
set step2;
by round: round_1a ;
team+first.round_1a ;
run;
Results
I would like to construct a new variable (I would name it 'Team') that groups the participants based on the same ProjectID and Round.
This is not something that requires loops. However, I do not understand the value of TEAM that you show in your table; it would seem to me that the first two rows are Team 1, and then next two rows (because the ROUND has changed) are Team 2, and so on. Is that a correct statement, or have I misunderstood something?
Here's how I would do it.
data have;
input ParticipantID:32. Round_Project:$20. Project_ID:$20.;
datalines;
1387 Round_1A_Project_ID 2020-01-2102
1406 Round_1A_Project_ID 2020-01-2102
1387 Round_1B_Project_ID 2020-01-2133
1406 Round_1B_Project_ID 2020-01-2133
1387 Round_1C_Project_ID 2020-01-2136
1406 Round_1C_Project_ID 2020-01-2136
1387 Round_3A_Project_ID 2020-01-2310
1406 Round_3A_Project_ID 2020-01-2310
1371 Round_1A_Project_ID 2020-01-2103
1389 Round_1A_Project_ID 2020-01-2103
1371 Round_3A_Project_ID 2020-01-2311
1389 Round_3A_Project_ID 2020-01-2311
1371 Round_3B_Project_ID 2020-01-2304
1389 Round_3B_Project_ID 2020-01-2304
1385 Round_1A_Project_ID 2020-01-2104
1414 Round_1A_Project_ID 2020-01-2104
1429 Round_1A_Project_ID 2020-01-2104
1385 Round_2A_Project_ID 2020-01-2217
1414 Round_2A_Project_ID 2020-01-2217
1429 Round_2A_Project_ID 2020-01-2217
;;;;
proc sort data=have;
by project_id round_project;
run;
data want;
set have;
by project_id round_project;
if first.round_project then team+1;
run;
Hi, Thanks for your response. Just to clarify the team variable is not based only on Round_project variable. The 'team' variable is meant to group people that had the same project ID for that specific round. In the original message of what I would like the team variable to look like, Participant 1387 and Participant 1406 were in the same project (2020-01-2102) for Round 1A and they were in the same project (2020-01-2133) for Round 1B. Thus they are a team. Two other participants (#1371 and 1389) were both in a different project (2020-01-2103) in Round 1A and then their next project was in Round 3A, thus they are on a different team then participants #1387 and #1406. I apologize if this is confusing.
I suspect you need to transpose then it is easy to see how is sharing the same projects in the same round.
Your values and variable names are too long for easy typing so let's shorten them to make it easier.
data have;
input PID Round $ Project :$20.;
datalines;
1387 1A 2020-01-2102
1406 1A 2020-01-2102
1387 1B 2020-01-2133
1406 1B 2020-01-2133
1387 1C 2020-01-2136
1406 1C 2020-01-2136
1387 3A 2020-01-2310
1406 3A 2020-01-2310
1371 1A 2020-01-2103
1389 1A 2020-01-2103
1371 3A 2020-01-2311
1389 3A 2020-01-2311
1371 3B 2020-01-2304
1389 3B 2020-01-2304
1385 1A 2020-01-2104
1414 1A 2020-01-2104
1429 1A 2020-01-2104
1385 2A 2020-01-2217
1414 2A 2020-01-2217
1429 2A 2020-01-2217
;;;;
So sort by the PID so you can then transpose the data.
proc sort data=have out=step1;
by pid round;
run;
proc transpose data=step1 out=step2(drop=_name_) prefix=round_;
by pid;
id round;
var project ;
run;
Then sort bye the new ROUND... variables and create the teams. The extra copy of the ROUND_1A variable is just to make it easier to find the final by variable so we can test the FIRST. flag for that variable. (If the same variable is listed twice the FIRST./LAST. values reflect the result of second location in the BY variable list.)
proc sort data=step2; by round: ; run;
data want;
set step2;
by round: round_1a ;
team+first.round_1a ;
run;
Results
Thank you Tom, this was very helpful.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.