turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- assigning group identifiers

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2016 09:47 AM - edited 10-18-2016 09:49 AM

Hi. I would like to assign two different kinds of group identifiers to each obsevation, using the example data shown below, one for 'company by date' and another for 'worker' within 'company by date' subgroups, along with assigning the maximum number for 'worker' for a different column:

data temp;

input company $1-2 date worker $6;

datalines;

AB 3 K

AB 3 K

AB 3 R

AB 3 R

AB 3 S

AB 3 T

AB 4 T

AB 4 T

AB 4 U

AB 4 U

AB 4 V

CF 4 A

CF 4 A

CF 4 B

CF 4 B

CF 5 T

CF 5 T

CF 5 E

;

And the final output should be look like one shown below:

SAS Output

company date worker company_loop worker_loop worker_num

AB | 3 | K | 1 | 1 | 4 |

AB | 3 | K | 1 | 1 | 4 |

AB | 3 | R | 1 | 2 | 4 |

AB | 3 | R | 1 | 2 | 4 |

AB | 3 | S | 1 | 3 | 4 |

AB | 3 | T | 1 | 4 | 4 |

AB | 4 | T | 2 | 1 | 3 |

AB | 4 | T | 2 | 1 | 3 |

AB | 4 | U | 2 | 2 | 3 |

AB | 4 | U | 2 | 2 | 3 |

AB | 4 | V | 2 | 3 | 3 |

CF | 4 | A | 3 | 1 | 2 |

CF | 4 | A | 3 | 1 | 2 |

CF | 4 | B | 3 | 2 | 2 |

CF | 4 | B | 3 | 2 | 2 |

CF | 5 | T | 4 | 1 | 2 |

CF | 5 | T | 4 | 1 | 2 |

CF | 5 | E | 4 | 2 | 2 |

These kinds of operations are relatively straightforward using Stata, but when it comes to SAS, I cannot come up with some possible intuitive solutions for those operations. Is there any simple solution to those operations? Thank you!

Accepted Solutions

Solution

10-19-2016
08:12 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2016 08:07 AM

You need to add a reset of the worker_loop at first.company_date:

data temp; input company $1-2 date worker $6; company_date=cats(company,"_",date); datalines; AB 3 K AB 3 K AB 3 R AB 3 R AB 3 S AB 3 T AB 4 T AB 4 T AB 4 U AB 4 U AB 4 V CF 4 A CF 4 A CF 4 B CF 4 B CF 5 T CF 5 T CF 5 E ; run; proc sort data=temp; by company_date worker; run; proc sql; create table TMP as select COMPANY_DATE, count(distinct WORKER) as WORKER_NUM

from TEMP group by COMPANY_DATE; create table TEMP2 as select A.*, B.WORKER_NUM from TEMP A left join TMP B on A.COMPANY_DATE=B.COMPANY_DATE; quit; data temp3;

set temp2; retain company_loop worker_loop; by company_date worker; if first.company_date then do; company_loop=sum(company_loop,1); worker_loop=0; end; if first.worker then worker_loop=sum(worker_loop,1); run;

I would advise to not use Tabs in your code as this renders differently over different viewers - you can change this so that when you press tab it puts in X amount of spaces (in Base SAS it is under Tools->Options->Enhanced Editor->General Tab, insert spaces for tabs.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2016 09:57 AM

Yes, quite simple. In a datastep, do a by with all the group levels, and have each of your output variables as retain:

data want; set have; retain company_loop worker_loop worker_num; by company worker; if first.company then company_loop=sum(company_loop,1); if first.worker then worker_loop=sum(worker_loop,1); run;

What I do not understand is what worker_num is, and how it is defined?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2016 11:57 AM - edited 10-18-2016 12:10 PM

Thank you for your response, but there are three emaining issues for that problem:

1) First, I would like to use unique combinations of 'company by date' for assigning the first group identifiers to each observation, not just 'company', so 'company: AB - date: 3' combination should be different from 'company: AB - date: 4' combination.

2) Within each subgroup defined by unique 'company-date' combination, I would like to assign unique identifer for each worker.

3) As for the 'worker_num', it is the number of unique workers within a certain 'company-date' pair subgroup. For example, within the subgroup shown below ('company: AB - date: 4' combination subgroup), there are three unique workers ('T', 'U', 'V') within this subgroup. Thank you!

AB 4 T

AB 4 T

AB 4 U

AB 4 U

AB 4 V

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2016 12:25 PM

Well, these two points:

1) First, I would like to use unique combinations of 'company by date' for assigning the first group identifiers to each observation, not just 'company', so 'company: AB - date: 3' combination should be different from 'company: AB - date: 4' combination.

2) Within each subgroup defined by unique 'company-date' combination, I would like to assign unique identifer for each worker.

Are both covered by one answer - create a variable which contains your unique by group, then replace company in my datastep, with this new variable which contains the distinct group.

The second part would be a separate step, you need to find the number of workers per group and join this back to the main dataset, then use that in the datastep, maybe something like:

proc sql; create table TMP as select COMPANY_DATE, count(distinct WORKER) as NUM_WORKERS from HAVE group by COMPANY_DATE; create table HAVE1 as select A.*, B.NUM_WORKERS from HAVE A left join TMP B on A.COMPANY_YEAR=B.COMPANY_YEAR; quit; data want; set have; retain company_loop worker_loop worker_num; by company_date worker; if first.company_date then company_loop=sum(company_loop,1); if first.worker then do; worker_loop=sum(worker_loop,1); num_workers=num_workers-1; end; run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-18-2016 02:18 PM

(I have no idea why my reply keeps being deleted after posting it. I upload it again.)

Thank you for your answer, but I have one final question. I would like to assign unique 'worker' identifiers ('worker_loop') __within a certain 'company-date' combination subgroup ('company_loop').__ For example,

company date company_loop worker worker_loop

AB 3 1 K __1__

AB 3 1 K 1

AB 3 1 R 2

AB 3 1 R 2

AB 3 1 S 3

AB 3 1 T 4

AB 4 2 T **1**

AB 4 2 T 1

AB 4 2 U 2

AB 4 2 U 2

AB 4 2 V 3

CF 4 3 A __1__

CF 4 3 A 1

CF 4 3 B 2

...

As you can see above, the 'worker_loop' should restart to 1 when the subgroup ('company_loop') is changed while taking into account unique 'worker' cases within the subgroup. Is there any way that I can assign identfiers within the subgroup? Thank you!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2016 04:26 AM

Ok, so what have you tried? The code here:

data want; set have; retain company_loop worker_loop worker_num; by company worker; if first.company then company_loop=sum(company_loop,1); if first.worker then worker_loop=sum(worker_loop,1); run;

Shows you everythnig you need to know about grouping the data, retain counts, and you just need to manipulate your data to get this cmpany date variable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2016 07:38 AM

Thank you but the code you suggested doesn't seem to work properly. Here are the codes that I have tried based on your previous recommendations and the corresponding results. The code blocks #(1) and (2) work properly, but #(3) doesn't. Please pay attention to the code block #(3).

```
data temp;
input company $1-2 date worker $6;
datalines;
AB 3 K
AB 3 K
AB 3 R
AB 3 R
AB 3 S
AB 3 T
AB 4 T
AB 4 T
AB 4 U
AB 4 U
AB 4 V
CF 4 A
CF 4 A
CF 4 B
CF 4 B
CF 5 T
CF 5 T
CF 5 E
;
proc sort data=temp;
by company date worker;
/* (1) For assining 'company by date' combination identifiers */
data temp2;
set temp;
retain company_loop;
by company date worker;
if first.date then company_loop=sum(company_loop,1);
run;
/* (2) For assigning the number of unique workers withint each 'company by date' subgroup' */
proc sql;
create table tmp
as select company_loop, count(distinct worker) as worker_num
from temp2
group by company_loop;
create table temp3
as select a.*, b.worker_num
from temp2 a left join tmp b
on a.company_loop=b.company_loop;
quit;
proc sort data=temp3;
by company_loop worker;
run;
/* (3) For assigning ID for each unique 'worker' within each 'company by date' subgroup - doesn't work */
data temp4;
set temp3;
retain worker_loop;
by company_loop worker;
if first.worker then worker_loop=sum(worker_loop,1);
run;
proc print data=temp4;
run;
```

Here is the result. The columns 'company_loop' and 'worker_num' are the ones that I want to derive, but for the column 'worker_loop', it's not.

SAS Output

The SAS System |

company date worker company_loop worker_num worker_loop

AB | 3 | K | 1 | 4 | 1 |

AB | 3 | K | 1 | 4 | 1 |

AB | 3 | R | 1 | 4 | 2 |

AB | 3 | R | 1 | 4 | 2 |

AB | 3 | S | 1 | 4 | 3 |

AB | 3 | T | 1 | 4 | 4 |

AB | 4 | T | 2 | 3 | 5 |

AB | 4 | T | 2 | 3 | 5 |

AB | 4 | U | 2 | 3 | 6 |

AB | 4 | U | 2 | 3 | 6 |

AB | 4 | V | 2 | 3 | 7 |

CF | 4 | A | 3 | 2 | 8 |

CF | 4 | A | 3 | 2 | 8 |

CF | 4 | B | 3 | 2 | 9 |

CF | 4 | B | 3 | 2 | 9 |

CF | 5 | E | 4 | 2 | 10 |

CF | 5 | T | 4 | 2 | 11 |

CF | 5 | T | 4 | 2 | 11 |

Solution

10-19-2016
08:12 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2016 08:07 AM

You need to add a reset of the worker_loop at first.company_date:

data temp; input company $1-2 date worker $6; company_date=cats(company,"_",date); datalines; AB 3 K AB 3 K AB 3 R AB 3 R AB 3 S AB 3 T AB 4 T AB 4 T AB 4 U AB 4 U AB 4 V CF 4 A CF 4 A CF 4 B CF 4 B CF 5 T CF 5 T CF 5 E ; run; proc sort data=temp; by company_date worker; run; proc sql; create table TMP as select COMPANY_DATE, count(distinct WORKER) as WORKER_NUM

from TEMP group by COMPANY_DATE; create table TEMP2 as select A.*, B.WORKER_NUM from TEMP A left join TMP B on A.COMPANY_DATE=B.COMPANY_DATE; quit; data temp3;

set temp2; retain company_loop worker_loop; by company_date worker; if first.company_date then do; company_loop=sum(company_loop,1); worker_loop=0; end; if first.worker then worker_loop=sum(worker_loop,1); run;

I would advise to not use Tabs in your code as this renders differently over different viewers - you can change this so that when you press tab it puts in X amount of spaces (in Base SAS it is under Tools->Options->Enhanced Editor->General Tab, insert spaces for tabs.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-19-2016 08:11 AM

Thank you so much for all the responses and tips! The results are now what I want to derive.