SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

assigning group identifiers

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

assigning group identifiers

[ Edited ]

Hi. I would like to assign two different kinds of group identifiers to each obsevation, using the example data shown below, one for 'company by date' and another for 'worker' within 'company by date' subgroups, along with assigning the maximum number for 'worker' for a different column: 

 

data temp;
input company $1-2 date worker $6;
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
;

 

And the final output should be look like one shown below:

 

SAS Output

company date  worker  company_loop  worker_loop  worker_num
AB3K114
AB3K114
AB3R124
AB3R124
AB3S134
AB3T144
AB4T213
AB4T213
AB4U223
AB4U223
AB4V233
CF4A312
CF4A312
CF4B322
CF4B322
CF5T412
CF5T412
CF5E422
 

 

 

These kinds of operations are relatively straightforward using Stata, but when it comes to SAS, I cannot come up with some possible intuitive solutions for those operations. Is there any simple solution to those operations? Thank you!

 
 
 

Accepted Solutions
Solution
‎10-19-2016 08:12 AM
Super User
Super User
Posts: 7,997

Re: assigning group identifiers

Posted in reply to sas_user_k

You need to add a reset of the worker_loop at first.company_date:

data temp; 
  input company $1-2 date worker $6;
  company_date=cats(company,"_",date);
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
; 
run;

proc sort data=temp; 
  by company_date worker;
run;

proc sql;
  create table TMP as 
  select  COMPANY_DATE, 
          count(distinct WORKER) as WORKER_NUM
from TEMP group by COMPANY_DATE; create table TEMP2 as select A.*, B.WORKER_NUM from TEMP A left join TMP B on A.COMPANY_DATE=B.COMPANY_DATE; quit; data temp3;
set temp2; retain company_loop worker_loop; by company_date worker; if first.company_date then do; company_loop=sum(company_loop,1); worker_loop=0; end; if first.worker then worker_loop=sum(worker_loop,1); run;

I would advise to not use Tabs in your code as this renders differently over different viewers - you can change this so that when you press tab it puts in X amount of spaces (in Base SAS it is under Tools->Options->Enhanced Editor->General Tab, insert spaces for tabs.

View solution in original post


All Replies
Super User
Super User
Posts: 7,997

Re: assigning group identifiers

Posted in reply to sas_user_k

Yes, quite simple.  In a datastep, do a by with all the group levels, and have each of your output variables as retain:

data want;
  set have;
  retain company_loop worker_loop worker_num;
  by company worker;
  if first.company then company_loop=sum(company_loop,1);
  if first.worker then worker_loop=sum(worker_loop,1);
run;

What I do not understand is what worker_num is, and how it is defined?

Occasional Contributor
Posts: 19

Re: assigning group identifiers

[ Edited ]

Thank you for your response, but there are three emaining issues for that problem:

 

1) First, I would like to use unique combinations of 'company by date' for assigning the first group identifiers to each observation, not just 'company', so 'company: AB - date: 3' combination should be different from 'company: AB - date: 4' combination.

 

2) Within each subgroup defined by unique 'company-date' combination, I would like to assign unique identifer for each worker.

 

3) As for the 'worker_num', it is the number of unique workers within a certain 'company-date' pair subgroup. For example, within the subgroup shown below ('company: AB - date: 4' combination subgroup), there are three unique workers ('T', 'U', 'V') within this subgroup. Thank you! 

 

AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V

Super User
Super User
Posts: 7,997

Re: assigning group identifiers

Posted in reply to sas_user_k

Well, these two points:

1) First, I would like to use unique combinations of 'company by date' for assigning the first group identifiers to each observation, not just 'company', so 'company: AB - date: 3' combination should be different from 'company: AB - date: 4' combination.

 

2) Within each subgroup defined by unique 'company-date' combination, I would like to assign unique identifer for each worker.

 

Are both covered by one answer - create a variable which contains your unique by group, then replace company in my datastep, with this new variable which contains the distinct group.

 

The second part would be a separate step, you need to find the number of workers per group and join this back to the main dataset, then use that in the datastep, maybe something like:

proc sql;
  create table TMP as
  select COMPANY_DATE,
         count(distinct WORKER) as NUM_WORKERS
  from   HAVE
  group by COMPANY_DATE;

  create table HAVE1 as
  select A.*,
         B.NUM_WORKERS
  from   HAVE A
  left join TMP B
  on     A.COMPANY_YEAR=B.COMPANY_YEAR;
quit;

data want;
  set have;
  retain company_loop worker_loop worker_num;
  by company_date worker;
  if first.company_date then company_loop=sum(company_loop,1);
  if first.worker then do;
    worker_loop=sum(worker_loop,1);
    num_workers=num_workers-1;
  end;
run;
Occasional Contributor
Posts: 19

Re: assigning group identifiers

(I have no idea why my reply keeps being deleted after posting it. I upload it again.)

 

Thank you for your answer, but I have one final question. I would like to assign unique 'worker' identifiers ('worker_loop') within a certain 'company-date' combination subgroup ('company_loop'). For example,

 

 

company  date  company_loop  worker  worker_loop

AB            3       1                       K           1

AB            3       1                       K           1

AB            3       1                       R           2

AB            3       1                       R           2

AB            3       1                       S           3

AB            3       1                       T           4

AB            4       2                       T           1

AB            4       2                       T           1

AB            4       2                       U           2

AB            4       2                       U           2

AB            4       2                       V           3

CF            4       3                       A           1

CF            4       3                       A           1

CF            4       3                       B           2

...

 

As you can see above, the 'worker_loop' should restart to 1 when the subgroup ('company_loop') is changed while taking into account unique 'worker' cases within the subgroup. Is there any way that I can assign identfiers within the subgroup? Thank you!

Super User
Super User
Posts: 7,997

Re: assigning group identifiers

Posted in reply to sas_user_k

Ok, so what have you tried?  The code here:

data want;
  set have;
  retain company_loop worker_loop worker_num;
  by company worker;
  if first.company then company_loop=sum(company_loop,1);
  if first.worker then worker_loop=sum(worker_loop,1);
run;

 

Shows you everythnig you need to know about grouping the data, retain counts, and you just need to manipulate your data to get this cmpany date variable.

 

 

Occasional Contributor
Posts: 19

Re: assigning group identifiers

Thank you but the code you suggested doesn't seem to work properly. Here are the codes that I have tried based on your previous recommendations and the corresponding results. The code blocks #(1) and (2) work properly, but #(3) doesn't. Please pay attention to the code block #(3).

 

 

data temp; 
input company $1-2 date worker $6;
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
; 

proc sort data=temp; 
by company date worker; 

/* (1) For assining 'company by date' combination identifiers */
data temp2;
  set temp;
  retain company_loop;
  by company date worker;
  if first.date then company_loop=sum(company_loop,1);
run;

/* (2) For assigning the number of unique workers withint each 'company by date' subgroup' */
proc sql;
	create table tmp
	as select company_loop, count(distinct worker) as worker_num
	from temp2
	group by company_loop;

	create table temp3
	as select a.*, b.worker_num
	from temp2 a left join tmp b
	on a.company_loop=b.company_loop;
quit;

proc sort data=temp3;
	by company_loop worker;
run;

/* (3) For assigning ID for each unique 'worker' within each 'company by date' subgroup - doesn't work */
data temp4;
	set temp3;
	retain worker_loop;
	by company_loop worker;
	if first.worker then worker_loop=sum(worker_loop,1);
run;

proc print data=temp4;
run;

 

Here is the result. The columns 'company_loop' and 'worker_num' are the ones that I want to derive, but for the column 'worker_loop', it's not.

 

SAS Output

The SAS System

company  date  worker  company_loop  worker_num  worker_loop
AB3K141
AB3K141
AB3R142
AB3R142
AB3S143
AB3T144
AB4T235
AB4T235
AB4U236
AB4U236
AB4V237
CF4A328
CF4A328
CF4B329
CF4B329
CF5E4210
CF5T4211
CF5T4211

 

Solution
‎10-19-2016 08:12 AM
Super User
Super User
Posts: 7,997

Re: assigning group identifiers

Posted in reply to sas_user_k

You need to add a reset of the worker_loop at first.company_date:

data temp; 
  input company $1-2 date worker $6;
  company_date=cats(company,"_",date);
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
; 
run;

proc sort data=temp; 
  by company_date worker;
run;

proc sql;
  create table TMP as 
  select  COMPANY_DATE, 
          count(distinct WORKER) as WORKER_NUM
from TEMP group by COMPANY_DATE; create table TEMP2 as select A.*, B.WORKER_NUM from TEMP A left join TMP B on A.COMPANY_DATE=B.COMPANY_DATE; quit; data temp3;
set temp2; retain company_loop worker_loop; by company_date worker; if first.company_date then do; company_loop=sum(company_loop,1); worker_loop=0; end; if first.worker then worker_loop=sum(worker_loop,1); run;

I would advise to not use Tabs in your code as this renders differently over different viewers - you can change this so that when you press tab it puts in X amount of spaces (in Base SAS it is under Tools->Options->Enhanced Editor->General Tab, insert spaces for tabs.

Occasional Contributor
Posts: 19

Re: assigning group identifiers

Thank you so much for all the responses and tips! The results are now what I want to derive.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 648 views
  • 4 likes
  • 2 in conversation