I am trying to count of tasks done by workers' id (id variable in the data). The tasks can have sub-tasks. If one works on main and sub tasks, it should only count as 1 task done. If one only works on sub task (without working on main task), it also should count as 1 task done. In the example, worker 1 works on tasks: a, a_s, b. So, 2 tasks done; worker 2 works on: a_s, b. So, also 2 tasks done (even not working on main task: a). My sql code only can capture worker 1's count, but not worker 2 since i cannot find a way to identify who works on sub tasks only.
See code below, please help, thanks!
data test;
input id $ task_id $ sub_task;
datalines;
1 a 0
1 a_s 1
1 b 0
2 a_s 1
2 b 0
;
run;
proc sql;
create table out as
select *,
count(distinct case when not sub_task then task_id end) as done
from test
group by id
;
quit;
Hello @sasecn Should your pattern of subtask follow a pattern such that it takes the value of main_task with a suffix _s , extracting the first char would make it easy. In any event, you can apply the same logic accordingly
So
data test;
input id $ task_id $ sub_task;
datalines;
1 a 0
1 a_s 1
1 b 0
2 a_s 1
2 b 0
;
run;
proc sql;
create table want as
select *,count( distinct first(task_id)) as done
from test
group by id;
quit;
Hello:
Sorry for the mistake. I tried to make the code simple. But the real task id has more chars. e.g. the main task id is 2019-ABC-MAIN-1234; the sub task is 2019-ABC-MAIN-1234-S001 (or can be 2019-ABC-MAIN-1234-S001-S002). It is true that the sub task always like "main task id"-S001 ( or ....-S001-S002-... ).
I don't think the first() works. Sorry for making it more complex.
I expected something like that. Can you post a comprehensive sample so that the very logic can be adjusted?
The logic is still the same i.e to get the grouped pattern and count the distinct. Now you need to let us know the values that's a good representative of your sample.
Okay, i tried to manage to get the last 4 digits from my task_id, and now the ids will be 4 digits followed by sub numbers, see example code. There is no special pattern in the 4 digits. Hope this make sense. So, worker 1 should have 7 tasks done.
data test;
informat id task_id $25. sub_task;
input id task_id sub_task;
datalines;
1 1234 0
1 1234S001 1
1 1234S001S002 1
1 3456 0
1 9685 0
1 3648 0
1 8888 0
1 2222 0
1 3366 0
2 1234S001 1
2 5678 0
;
RUN;
HI @sasecnThank you for the response
"now the ids will be 4 digits followed by sub numbers"
Does this mean a Task or a subtask will conform to a given 4 digit number? Example 1234 series? If this is true, this in itself is a pattern.
"There is no special pattern in the 4 digits"
I don't think we are looking at any special pattern. The logic is basically to group Taskid and sub task into one group. That's all.
You are right that the task id is always 4 digits number + sub number. The sub number could be S001 or S001S002, .... but always ended as SXXX. The reason i used distinct key work is because there are some duplicates records e.g.
1 1234 0
1 1234 0
My problem is that i cannot find a way to handle workers who only works on sub tasks but not the main one. e.g. worker 2 works on task 1234S001, it should count as 1 independent task. But my original code miss-count this.
Does this make sense?
"You are right that the task id is always 4 digits number "
OK, so this goes back to my original idea, where in the same first 4 digits of all tasks will form one group. Sub numbers are not relevant to this logic
data test;
informat id task_id $25. sub_task;
input id task_id sub_task;
datalines;
1 1234 0
1 1234S001 1
1 1234S001S002 1
1 3456 0
1 9685 0
1 3648 0
1 8888 0
1 2222 0
1 3366 0
2 1234S001 1
2 5678 0
;
RUN;
proc sql;
create table want as
select *, count(distinct substr(task_id,1,4)) as done
from test
group by id;
quit;
Alright, i got your idea. Thx, that works in my situation. Now, i understand what you meant the pattern. I will run into other format of the task ids. Hope this does not bother you too much.
I will have to deal with another situation that have to handle different format of ids. For example, the main id will be either 4 digits or 8 digits, and this will be followed by another 4 digits that identify the status of required to redo the task. e.g.
for task 1234, required redo it then the full task id will be 1234R001, for task 1234S001 (8 digits main id) required redo, full id is 1234S001R001. Again, i want to count the tasks done, if worker 1 works on 1234 and 1234R001, this counts as 1 task done. if worker 1 works 1234S001 and 1234S001R001, this also counts as 1. if worker 2 only works on 1234R001, this counts as 1.
I will still have problems to count worker 2. What will be the idea to handle this? Sorry to bother you a lot.
data test;
informat id task_id $25. redo_task;
input id task_id redo_task;
datalines;
1 1234 0
1 1234R001 1
1 1234S001 0
1 1234S001R002 1
1 3456 0
1 9685 0
1 3648 0
2 1234S001R001 1
2 1234R001 1
2 5678 0
;
RUN;
proc sql;
create table out as
select *,
count(distinct case when not redo_task then task_id end) as done
from test
group by id
;
quit;
Hello @sasecn No bother at all.
Basically to give you a clear pic of how the logic works is
Taking your latest sample,
Is the following your expected result ?
Obs | id | task_id | t | grp |
---|---|---|---|---|
1 | 1 | 1234 | 1234 | 1 |
2 | 1 | 1234R001 | 1234 | 1 |
3 | 1 | 1234S001 | 1234 | 1 |
4 | 1 | 1234S001R002 | 1234 | 1 |
5 | 1 | 3456 | 3456 | 2 |
6 | 1 | 9685 | 9685 | 3 |
7 | 1 | 3648 | 3648 | 4 |
8 | 2 | 1234S001R001 | 1234 | 1 |
9 | 2 | 1234R001 | 1234 | 1 |
10 | 2 | 5678 | 5678 | 2 |
Sorry, rush into something else. Will get back to you later.
Sorry for the late reply. I want to count the numbers of tasks done. If worker works on main and re-do task, this only count as 1. e.g. worker 1 works on task: 1234 and 1234R001, this counts as 1 task done. e.g. worker 1 works on task: 1234S001 (note that main task id can be 4 or 8 or more digits) and 1234S001R001, this counts as 1 task done. e.g. worker 2 works only on task: 1234R001, this counts as 1 task done (consider as independent task even worker 2 didn't work on main task 1234).
So, the expected result will be worker 1 has 5 tasks done, worker 2 has 3 tasks done.
Hello @sasecn Looks like the problem here is for us to first identify what constitutes to Independent parent task, and it's sub task.
So taking this example
Obs | id | task_id | t | grp |
---|---|---|---|---|
1 | 1 | 1234 | 1234 | 1 |
2 | 1 | 1234R001 | 1234 | 1 |
3 | 1 | 1234S001 | 1234 | 1 |
4 | 1 | 1234S001R002 | 1234 | 1 |
When you say (note that main task id can be 4 or 8 or more digits) , do you have a classification like the letter S denotes subtask or otherwise? I need your help in determining the classification. That's the reason I presented Grouping GRP in the above. Also, when you say 4 or 8 digits. I see there digits(should mean numbers from 0-9) and characters. Once we have a clear understanding, we can proceed with the logic.
To make it simple, we can ignore the letter S, say main task id can be 4 or 8 digits, e.g. 1234 or 12345678. We only need to identify if the main id followed by R001 or R002 always 4 digits to indicate that the task has been re-done. e.g. 12345678R001 (same as previous 1234S001) meaning this is re-do task of 12345678. If a worker works on both main and re-do task, this only counts as 1 task done; if a worker only works on main or only works on re-do task (consider as independent task), this counts as 1 task done.
e.g.
if a worker works on 12345678 and 12345678R001, this counts as 1 task done; if a worker works only on 12345678, this counts as 1 task done; if a worker works only on 12345678R001 (without working on 12345678), this also counts as 1 task done.
Hope this is clear to you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.