BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
stjsmith
Fluorite | Level 6

Hi all,

I have a data set which contains information on participants that were a part of projects for multiple rounds. The variables include:

ParticipantID - Participant ID, because participants are in multiple rounds, there are multiple rows for each participant.

Round_Project - Indicates the round the project was in (i.e. Round_1A_Project_ID indicates Round 1 Project 1, Round_1B_Project_ID  indicates Round 1 Project 2,

Round_2A_Project_ID  indicates Round 2 Project 1) There are a total of 4 rounds, with up to three different projects per round.

Project_ID - gives the specific project ID

 

I would like to construct a new variable (I would name it 'Team') that groups the participants based on the same ProjectID and Round. For example I would want to know the best way to construct the 'Team' variable such as this:

ParticipantIDRound_ProjectProject_IDTeam
1387Round_1A_Project_ID2020-01-21021
1406Round_1A_Project_ID2020-01-21021
1387Round_1B_Project_ID2020-01-21331
1406Round_1B_Project_ID2020-01-21331
1387Round_1C_Project_ID2020-01-21361
1406Round_1C_Project_ID2020-01-21361
1387Round_3A_Project_ID2020-01-23101
1406Round_3A_Project_ID2020-01-23101
1371Round_1A_Project_ID2020-01-21032
1389Round_1A_Project_ID2020-01-21032
1371Round_3A_Project_ID2020-01-23112
1389Round_3A_Project_ID2020-01-23112
1371Round_3B_Project_ID2020-01-23042
1389Round_3B_Project_ID2020-01-23042
1385Round_1A_Project_ID2020-01-21043
1414Round_1A_Project_ID2020-01-21043
1429Round_1A_Project_ID2020-01-21043
1385Round_2A_Project_ID2020-01-22173
1414Round_2A_Project_ID2020-01-22173
1429Round_2A_Project_ID2020-01-22173

 

Code for the original data.

data SASHELP;
  input ParticipantID:32. Round_Project:$20. Project_ID:$20.;
datalines;
1387 Round_1A_Project_ID 2020-01-2102
1406 Round_1A_Project_ID 2020-01-2102
1387 Round_1B_Project_ID 2020-01-2133
1406 Round_1B_Project_ID 2020-01-2133
1387 Round_1C_Project_ID 2020-01-2136
1406 Round_1C_Project_ID 2020-01-2136
1387 Round_3A_Project_ID 2020-01-2310
1406 Round_3A_Project_ID 2020-01-2310
1371 Round_1A_Project_ID 2020-01-2103
1389 Round_1A_Project_ID 2020-01-2103
1371 Round_3A_Project_ID 2020-01-2311
1389 Round_3A_Project_ID 2020-01-2311
1371 Round_3B_Project_ID 2020-01-2304
1389 Round_3B_Project_ID 2020-01-2304
1385 Round_1A_Project_ID 2020-01-2104
1414 Round_1A_Project_ID 2020-01-2104
1429 Round_1A_Project_ID 2020-01-2104
1385 Round_2A_Project_ID 2020-01-2217
1414 Round_2A_Project_ID 2020-01-2217
1429 Round_2A_Project_ID 2020-01-2217
;;;;

 

I was advised that it may be possible to do this with loops, but any solution is welcome.

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I suspect you need to transpose then it is easy to see how is sharing the same projects in the same round.

Your values and variable names are too long for easy typing so let's shorten them to make it easier.

data have;
  input PID Round $ Project :$20.;
datalines;
1387 1A 2020-01-2102
1406 1A 2020-01-2102
1387 1B 2020-01-2133
1406 1B 2020-01-2133
1387 1C 2020-01-2136
1406 1C 2020-01-2136
1387 3A 2020-01-2310
1406 3A 2020-01-2310
1371 1A 2020-01-2103
1389 1A 2020-01-2103
1371 3A 2020-01-2311
1389 3A 2020-01-2311
1371 3B 2020-01-2304
1389 3B 2020-01-2304
1385 1A 2020-01-2104
1414 1A 2020-01-2104
1429 1A 2020-01-2104
1385 2A 2020-01-2217
1414 2A 2020-01-2217
1429 2A 2020-01-2217
;;;;

So sort by the PID so you can then transpose the data.

proc sort data=have out=step1;
  by pid round;
run;

proc transpose data=step1 out=step2(drop=_name_) prefix=round_;
  by pid;
  id round;
  var project ;
run;

Then sort bye the new ROUND... variables and create the teams.  The extra copy of the ROUND_1A variable is just to make it easier to find the final by variable so we can test the FIRST. flag for that variable. (If the same variable is listed twice the FIRST./LAST. values reflect the result of second location in the BY variable list.)

 

proc sort data=step2; by round: ; run;

data want;
  set step2;
  by round: round_1a ;
  team+first.round_1a ;
run;

Results

Tom_0-1686629187962.png

 

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

I would like to construct a new variable (I would name it 'Team') that groups the participants based on the same ProjectID and Round.

 

This is not something that requires loops. However, I do not understand the value of TEAM that you show in your table; it would seem to me that the first two rows are Team 1, and then next two rows (because the ROUND has changed) are Team 2, and so on. Is that a correct statement, or have I misunderstood something?

 

Here's how I would do it.

 

data have;
  input ParticipantID:32. Round_Project:$20. Project_ID:$20.;
datalines;
1387 Round_1A_Project_ID 2020-01-2102
1406 Round_1A_Project_ID 2020-01-2102
1387 Round_1B_Project_ID 2020-01-2133
1406 Round_1B_Project_ID 2020-01-2133
1387 Round_1C_Project_ID 2020-01-2136
1406 Round_1C_Project_ID 2020-01-2136
1387 Round_3A_Project_ID 2020-01-2310
1406 Round_3A_Project_ID 2020-01-2310
1371 Round_1A_Project_ID 2020-01-2103
1389 Round_1A_Project_ID 2020-01-2103
1371 Round_3A_Project_ID 2020-01-2311
1389 Round_3A_Project_ID 2020-01-2311
1371 Round_3B_Project_ID 2020-01-2304
1389 Round_3B_Project_ID 2020-01-2304
1385 Round_1A_Project_ID 2020-01-2104
1414 Round_1A_Project_ID 2020-01-2104
1429 Round_1A_Project_ID 2020-01-2104
1385 Round_2A_Project_ID 2020-01-2217
1414 Round_2A_Project_ID 2020-01-2217
1429 Round_2A_Project_ID 2020-01-2217
;;;;

proc sort data=have;
    by project_id round_project;
run;

data want;
    set have;
    by project_id round_project;
    if first.round_project then team+1;
run;

 

 

 

--
Paige Miller
stjsmith
Fluorite | Level 6

Hi, Thanks for your response.  Just to clarify the team variable is not based only on Round_project variable. The 'team' variable is meant to group people that had the same project ID for that specific round. In the original message of what I would like the team variable to look like, Participant 1387 and Participant 1406 were in the same project (2020-01-2102) for Round 1A and they were in the same project (2020-01-2133) for Round 1B. Thus they are a team. Two other participants (#1371 and 1389) were both in a different project (2020-01-2103) in Round 1A and then their next project was in Round 3A, thus they are on a different team then participants #1387 and #1406. I apologize if this is confusing. 

Tom
Super User Tom
Super User

I suspect you need to transpose then it is easy to see how is sharing the same projects in the same round.

Your values and variable names are too long for easy typing so let's shorten them to make it easier.

data have;
  input PID Round $ Project :$20.;
datalines;
1387 1A 2020-01-2102
1406 1A 2020-01-2102
1387 1B 2020-01-2133
1406 1B 2020-01-2133
1387 1C 2020-01-2136
1406 1C 2020-01-2136
1387 3A 2020-01-2310
1406 3A 2020-01-2310
1371 1A 2020-01-2103
1389 1A 2020-01-2103
1371 3A 2020-01-2311
1389 3A 2020-01-2311
1371 3B 2020-01-2304
1389 3B 2020-01-2304
1385 1A 2020-01-2104
1414 1A 2020-01-2104
1429 1A 2020-01-2104
1385 2A 2020-01-2217
1414 2A 2020-01-2217
1429 2A 2020-01-2217
;;;;

So sort by the PID so you can then transpose the data.

proc sort data=have out=step1;
  by pid round;
run;

proc transpose data=step1 out=step2(drop=_name_) prefix=round_;
  by pid;
  id round;
  var project ;
run;

Then sort bye the new ROUND... variables and create the teams.  The extra copy of the ROUND_1A variable is just to make it easier to find the final by variable so we can test the FIRST. flag for that variable. (If the same variable is listed twice the FIRST./LAST. values reflect the result of second location in the BY variable list.)

 

proc sort data=step2; by round: ; run;

data want;
  set step2;
  by round: round_1a ;
  team+first.round_1a ;
run;

Results

Tom_0-1686629187962.png

 

stjsmith
Fluorite | Level 6

Thank you Tom, this was very helpful.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1138 views
  • 1 like
  • 3 in conversation