## assigning group identifiers

Solved
Occasional Contributor
Posts: 19

# assigning group identifiers

[ Edited ]

Hi. I would like to assign two different kinds of group identifiers to each obsevation, using the example data shown below, one for 'company by date' and another for 'worker' within 'company by date' subgroups, along with assigning the maximum number for 'worker' for a different column:

data temp;
input company \$1-2 date worker \$6;
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
;

And the final output should be look like one shown below:

SAS Output

company date  worker  company_loop  worker_loop  worker_num
 AB 3 K 1 1 4 AB 3 K 1 1 4 AB 3 R 1 2 4 AB 3 R 1 2 4 AB 3 S 1 3 4 AB 3 T 1 4 4 AB 4 T 2 1 3 AB 4 T 2 1 3 AB 4 U 2 2 3 AB 4 U 2 2 3 AB 4 V 2 3 3 CF 4 A 3 1 2 CF 4 A 3 1 2 CF 4 B 3 2 2 CF 4 B 3 2 2 CF 5 T 4 1 2 CF 5 T 4 1 2 CF 5 E 4 2 2

These kinds of operations are relatively straightforward using Stata, but when it comes to SAS, I cannot come up with some possible intuitive solutions for those operations. Is there any simple solution to those operations? Thank you!

Accepted Solutions
Solution
‎10-19-2016 08:12 AM
Super User
Posts: 9,441

## Re: assigning group identifiers

You need to add a reset of the worker_loop at first.company_date:

```data temp;
input company \$1-2 date worker \$6;
company_date=cats(company,"_",date);
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
;
run;

proc sort data=temp;
by company_date worker;
run;

proc sql;
create table TMP as
select  COMPANY_DATE,
count(distinct WORKER) as WORKER_NUM  from    TEMP
group by COMPANY_DATE;

create table TEMP2 as
select  A.*,
B.WORKER_NUM
from    TEMP A
left join TMP B
on      A.COMPANY_DATE=B.COMPANY_DATE;
quit;

data temp3;  set temp2;
retain company_loop worker_loop;
by company_date worker;
if first.company_date then do;
company_loop=sum(company_loop,1);
worker_loop=0;
end;
if first.worker then worker_loop=sum(worker_loop,1);
run;
```

I would advise to not use Tabs in your code as this renders differently over different viewers - you can change this so that when you press tab it puts in X amount of spaces (in Base SAS it is under Tools->Options->Enhanced Editor->General Tab, insert spaces for tabs.

All Replies
Super User
Posts: 9,441

## Re: assigning group identifiers

Yes, quite simple.  In a datastep, do a by with all the group levels, and have each of your output variables as retain:

```data want;
set have;
retain company_loop worker_loop worker_num;
by company worker;
if first.company then company_loop=sum(company_loop,1);
if first.worker then worker_loop=sum(worker_loop,1);
run;```

What I do not understand is what worker_num is, and how it is defined?

Occasional Contributor
Posts: 19

## Re: assigning group identifiers

[ Edited ]

Thank you for your response, but there are three emaining issues for that problem:

1) First, I would like to use unique combinations of 'company by date' for assigning the first group identifiers to each observation, not just 'company', so 'company: AB - date: 3' combination should be different from 'company: AB - date: 4' combination.

2) Within each subgroup defined by unique 'company-date' combination, I would like to assign unique identifer for each worker.

3) As for the 'worker_num', it is the number of unique workers within a certain 'company-date' pair subgroup. For example, within the subgroup shown below ('company: AB - date: 4' combination subgroup), there are three unique workers ('T', 'U', 'V') within this subgroup. Thank you!

AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V

Super User
Posts: 9,441

## Re: assigning group identifiers

Well, these two points:

1) First, I would like to use unique combinations of 'company by date' for assigning the first group identifiers to each observation, not just 'company', so 'company: AB - date: 3' combination should be different from 'company: AB - date: 4' combination.

2) Within each subgroup defined by unique 'company-date' combination, I would like to assign unique identifer for each worker.

Are both covered by one answer - create a variable which contains your unique by group, then replace company in my datastep, with this new variable which contains the distinct group.

The second part would be a separate step, you need to find the number of workers per group and join this back to the main dataset, then use that in the datastep, maybe something like:

```proc sql;
create table TMP as
select COMPANY_DATE,
count(distinct WORKER) as NUM_WORKERS
from   HAVE
group by COMPANY_DATE;

create table HAVE1 as
select A.*,
B.NUM_WORKERS
from   HAVE A
left join TMP B
on     A.COMPANY_YEAR=B.COMPANY_YEAR;
quit;

data want;
set have;
retain company_loop worker_loop worker_num;
by company_date worker;
if first.company_date then company_loop=sum(company_loop,1);
if first.worker then do;
worker_loop=sum(worker_loop,1);
num_workers=num_workers-1;
end;
run;```
Occasional Contributor
Posts: 19

## Re: assigning group identifiers

(I have no idea why my reply keeps being deleted after posting it. I upload it again.)

Thank you for your answer, but I have one final question. I would like to assign unique 'worker' identifiers ('worker_loop') within a certain 'company-date' combination subgroup ('company_loop'). For example,

company  date  company_loop  worker  worker_loop

AB            3       1                       K           1

AB            3       1                       K           1

AB            3       1                       R           2

AB            3       1                       R           2

AB            3       1                       S           3

AB            3       1                       T           4

AB            4       2                       T           1

AB            4       2                       T           1

AB            4       2                       U           2

AB            4       2                       U           2

AB            4       2                       V           3

CF            4       3                       A           1

CF            4       3                       A           1

CF            4       3                       B           2

...

As you can see above, the 'worker_loop' should restart to 1 when the subgroup ('company_loop') is changed while taking into account unique 'worker' cases within the subgroup. Is there any way that I can assign identfiers within the subgroup? Thank you!

Super User
Posts: 9,441

## Re: assigning group identifiers

Ok, so what have you tried?  The code here:

```data want;
set have;
retain company_loop worker_loop worker_num;
by company worker;
if first.company then company_loop=sum(company_loop,1);
if first.worker then worker_loop=sum(worker_loop,1);
run;```

Shows you everythnig you need to know about grouping the data, retain counts, and you just need to manipulate your data to get this cmpany date variable.

Occasional Contributor
Posts: 19

## Re: assigning group identifiers

Thank you but the code you suggested doesn't seem to work properly. Here are the codes that I have tried based on your previous recommendations and the corresponding results. The code blocks #(1) and (2) work properly, but #(3) doesn't. Please pay attention to the code block #(3).

``````data temp;
input company \$1-2 date worker \$6;
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
;

proc sort data=temp;
by company date worker;

/* (1) For assining 'company by date' combination identifiers */
data temp2;
set temp;
retain company_loop;
by company date worker;
if first.date then company_loop=sum(company_loop,1);
run;

/* (2) For assigning the number of unique workers withint each 'company by date' subgroup' */
proc sql;
create table tmp
as select company_loop, count(distinct worker) as worker_num
from temp2
group by company_loop;

create table temp3
as select a.*, b.worker_num
from temp2 a left join tmp b
on a.company_loop=b.company_loop;
quit;

proc sort data=temp3;
by company_loop worker;
run;

/* (3) For assigning ID for each unique 'worker' within each 'company by date' subgroup - doesn't work */
data temp4;
set temp3;
retain worker_loop;
by company_loop worker;
if first.worker then worker_loop=sum(worker_loop,1);
run;

proc print data=temp4;
run;``````

Here is the result. The columns 'company_loop' and 'worker_num' are the ones that I want to derive, but for the column 'worker_loop', it's not.

SAS Output

 The SAS System

company  date  worker  company_loop  worker_num  worker_loop
 AB 3 K 1 4 1 AB 3 K 1 4 1 AB 3 R 1 4 2 AB 3 R 1 4 2 AB 3 S 1 4 3 AB 3 T 1 4 4 AB 4 T 2 3 5 AB 4 T 2 3 5 AB 4 U 2 3 6 AB 4 U 2 3 6 AB 4 V 2 3 7 CF 4 A 3 2 8 CF 4 A 3 2 8 CF 4 B 3 2 9 CF 4 B 3 2 9 CF 5 E 4 2 10 CF 5 T 4 2 11 CF 5 T 4 2 11

Solution
‎10-19-2016 08:12 AM
Super User
Posts: 9,441

## Re: assigning group identifiers

You need to add a reset of the worker_loop at first.company_date:

```data temp;
input company \$1-2 date worker \$6;
company_date=cats(company,"_",date);
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
;
run;

proc sort data=temp;
by company_date worker;
run;

proc sql;
create table TMP as
select  COMPANY_DATE,
count(distinct WORKER) as WORKER_NUM  from    TEMP
group by COMPANY_DATE;

create table TEMP2 as
select  A.*,
B.WORKER_NUM
from    TEMP A
left join TMP B
on      A.COMPANY_DATE=B.COMPANY_DATE;
quit;

data temp3;  set temp2;
retain company_loop worker_loop;
by company_date worker;
if first.company_date then do;
company_loop=sum(company_loop,1);
worker_loop=0;
end;
if first.worker then worker_loop=sum(worker_loop,1);
run;
```

I would advise to not use Tabs in your code as this renders differently over different viewers - you can change this so that when you press tab it puts in X amount of spaces (in Base SAS it is under Tools->Options->Enhanced Editor->General Tab, insert spaces for tabs.

Occasional Contributor
Posts: 19

## Re: assigning group identifiers

Thank you so much for all the responses and tips! The results are now what I want to derive.

☑ This topic is solved.