Solved: How to use loops/array to create a indicator variable by group?

stjsmith · Posted 06-12-2023 07:10 PM

Hi all,

I have a data set which contains information on participants that were a part of projects for multiple rounds. The variables include:

ParticipantID - Participant ID, because participants are in multiple rounds, there are multiple rows for each participant.

Round_Project - Indicates the round the project was in (i.e. Round_1A_Project_ID indicates Round 1 Project 1, Round_1B_Project_ID indicates Round 1 Project 2,

Round_2A_Project_ID indicates Round 2 Project 1) There are a total of 4 rounds, with up to three different projects per round.

Project_ID - gives the specific project ID

I would like to construct a new variable (I would name it 'Team') that groups the participants based on the same ProjectID and Round. For example I would want to know the best way to construct the 'Team' variable such as this:

ParticipantID	Round_Project	Project_ID	Team
1387	Round_1A_Project_ID	2020-01-2102	1
1406	Round_1A_Project_ID	2020-01-2102	1
1387	Round_1B_Project_ID	2020-01-2133	1
1406	Round_1B_Project_ID	2020-01-2133	1
1387	Round_1C_Project_ID	2020-01-2136	1
1406	Round_1C_Project_ID	2020-01-2136	1
1387	Round_3A_Project_ID	2020-01-2310	1
1406	Round_3A_Project_ID	2020-01-2310	1
1371	Round_1A_Project_ID	2020-01-2103	2
1389	Round_1A_Project_ID	2020-01-2103	2
1371	Round_3A_Project_ID	2020-01-2311	2
1389	Round_3A_Project_ID	2020-01-2311	2
1371	Round_3B_Project_ID	2020-01-2304	2
1389	Round_3B_Project_ID	2020-01-2304	2
1385	Round_1A_Project_ID	2020-01-2104	3
1414	Round_1A_Project_ID	2020-01-2104	3
1429	Round_1A_Project_ID	2020-01-2104	3
1385	Round_2A_Project_ID	2020-01-2217	3
1414	Round_2A_Project_ID	2020-01-2217	3
1429	Round_2A_Project_ID	2020-01-2217	3

Code for the original data.

data SASHELP;
  input ParticipantID:32. Round_Project:$20. Project_ID:$20.;
datalines;
1387 Round_1A_Project_ID 2020-01-2102
1406 Round_1A_Project_ID 2020-01-2102
1387 Round_1B_Project_ID 2020-01-2133
1406 Round_1B_Project_ID 2020-01-2133
1387 Round_1C_Project_ID 2020-01-2136
1406 Round_1C_Project_ID 2020-01-2136
1387 Round_3A_Project_ID 2020-01-2310
1406 Round_3A_Project_ID 2020-01-2310
1371 Round_1A_Project_ID 2020-01-2103
1389 Round_1A_Project_ID 2020-01-2103
1371 Round_3A_Project_ID 2020-01-2311
1389 Round_3A_Project_ID 2020-01-2311
1371 Round_3B_Project_ID 2020-01-2304
1389 Round_3B_Project_ID 2020-01-2304
1385 Round_1A_Project_ID 2020-01-2104
1414 Round_1A_Project_ID 2020-01-2104
1429 Round_1A_Project_ID 2020-01-2104
1385 Round_2A_Project_ID 2020-01-2217
1414 Round_2A_Project_ID 2020-01-2217
1429 Round_2A_Project_ID 2020-01-2217
;;;;

I was advised that it may be possible to do this with loops, but any solution is welcome.

Thank you!

Tom · Posted 06-13-2023 12:09 AM

I suspect you need to transpose then it is easy to see how is sharing the same projects in the same round.

Your values and variable names are too long for easy typing so let's shorten them to make it easier.

data have;
  input PID Round $ Project :$20.;
datalines;
1387 1A 2020-01-2102
1406 1A 2020-01-2102
1387 1B 2020-01-2133
1406 1B 2020-01-2133
1387 1C 2020-01-2136
1406 1C 2020-01-2136
1387 3A 2020-01-2310
1406 3A 2020-01-2310
1371 1A 2020-01-2103
1389 1A 2020-01-2103
1371 3A 2020-01-2311
1389 3A 2020-01-2311
1371 3B 2020-01-2304
1389 3B 2020-01-2304
1385 1A 2020-01-2104
1414 1A 2020-01-2104
1429 1A 2020-01-2104
1385 2A 2020-01-2217
1414 2A 2020-01-2217
1429 2A 2020-01-2217
;;;;

So sort by the PID so you can then transpose the data.

proc sort data=have out=step1;
  by pid round;
run;

proc transpose data=step1 out=step2(drop=_name_) prefix=round_;
  by pid;
  id round;
  var project ;
run;

Then sort bye the new ROUND... variables and create the teams. The extra copy of the ROUND_1A variable is just to make it easier to find the final by variable so we can test the FIRST. flag for that variable. (If the same variable is listed twice the FIRST./LAST. values reflect the result of second location in the BY variable list.)

proc sort data=step2; by round: ; run;

data want;
  set step2;
  by round: round_1a ;
  team+first.round_1a ;
run;

Results

View solution in original post

PaigeMiller · Posted 06-12-2023 07:36 PM

I would like to construct a new variable (I would name it 'Team') that groups the participants based on the same ProjectID and Round.

This is not something that requires loops. However, I do not understand the value of TEAM that you show in your table; it would seem to me that the first two rows are Team 1, and then next two rows (because the ROUND has changed) are Team 2, and so on. Is that a correct statement, or have I misunderstood something?

Here's how I would do it.

data have;
  input ParticipantID:32. Round_Project:$20. Project_ID:$20.;
datalines;
1387 Round_1A_Project_ID 2020-01-2102
1406 Round_1A_Project_ID 2020-01-2102
1387 Round_1B_Project_ID 2020-01-2133
1406 Round_1B_Project_ID 2020-01-2133
1387 Round_1C_Project_ID 2020-01-2136
1406 Round_1C_Project_ID 2020-01-2136
1387 Round_3A_Project_ID 2020-01-2310
1406 Round_3A_Project_ID 2020-01-2310
1371 Round_1A_Project_ID 2020-01-2103
1389 Round_1A_Project_ID 2020-01-2103
1371 Round_3A_Project_ID 2020-01-2311
1389 Round_3A_Project_ID 2020-01-2311
1371 Round_3B_Project_ID 2020-01-2304
1389 Round_3B_Project_ID 2020-01-2304
1385 Round_1A_Project_ID 2020-01-2104
1414 Round_1A_Project_ID 2020-01-2104
1429 Round_1A_Project_ID 2020-01-2104
1385 Round_2A_Project_ID 2020-01-2217
1414 Round_2A_Project_ID 2020-01-2217
1429 Round_2A_Project_ID 2020-01-2217
;;;;

proc sort data=have;
    by project_id round_project;
run;

data want;
    set have;
    by project_id round_project;
    if first.round_project then team+1;
run;

--
Paige Miller

stjsmith · Posted 06-12-2023 07:49 PM

Hi, Thanks for your response. Just to clarify the team variable is not based only on Round_project variable. The 'team' variable is meant to group people that had the same project ID for that specific round. In the original message of what I would like the team variable to look like, Participant 1387 and Participant 1406 were in the same project (2020-01-2102) for Round 1A and they were in the same project (2020-01-2133) for Round 1B. Thus they are a team. Two other participants (#1371 and 1389) were both in a different project (2020-01-2103) in Round 1A and then their next project was in Round 3A, thus they are on a different team then participants #1387 and #1406. I apologize if this is confusing.

Tom · Posted 06-13-2023 12:09 AM

I suspect you need to transpose then it is easy to see how is sharing the same projects in the same round.

Your values and variable names are too long for easy typing so let's shorten them to make it easier.

data have;
  input PID Round $ Project :$20.;
datalines;
1387 1A 2020-01-2102
1406 1A 2020-01-2102
1387 1B 2020-01-2133
1406 1B 2020-01-2133
1387 1C 2020-01-2136
1406 1C 2020-01-2136
1387 3A 2020-01-2310
1406 3A 2020-01-2310
1371 1A 2020-01-2103
1389 1A 2020-01-2103
1371 3A 2020-01-2311
1389 3A 2020-01-2311
1371 3B 2020-01-2304
1389 3B 2020-01-2304
1385 1A 2020-01-2104
1414 1A 2020-01-2104
1429 1A 2020-01-2104
1385 2A 2020-01-2217
1414 2A 2020-01-2217
1429 2A 2020-01-2217
;;;;

So sort by the PID so you can then transpose the data.

proc sort data=have out=step1;
  by pid round;
run;

proc transpose data=step1 out=step2(drop=_name_) prefix=round_;
  by pid;
  id round;
  var project ;
run;

Then sort bye the new ROUND... variables and create the teams. The extra copy of the ROUND_1A variable is just to make it easier to find the final by variable so we can test the FIRST. flag for that variable. (If the same variable is listed twice the FIRST./LAST. values reflect the result of second location in the BY variable list.)

proc sort data=step2; by round: ; run;

data want;
  set step2;
  by round: round_1a ;
  team+first.round_1a ;
run;

Results

stjsmith · Posted 06-13-2023 06:07 PM

Thank you Tom, this was very helpful.

How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Catch up on SAS Innovate 2026

How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Re: How to use loops/array to create a indicator variable by group?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away