- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to count of tasks done by workers' id (id variable in the data). The tasks can have sub-tasks. If one works on main and sub tasks, it should only count as 1 task done. If one only works on sub task (without working on main task), it also should count as 1 task done. In the example, worker 1 works on tasks: a, a_s, b. So, 2 tasks done; worker 2 works on: a_s, b. So, also 2 tasks done (even not working on main task: a). My sql code only can capture worker 1's count, but not worker 2 since i cannot find a way to identify who works on sub tasks only.
See code below, please help, thanks!
data test;
input id $ task_id $ sub_task;
datalines;
1 a 0
1 a_s 1
1 b 0
2 a_s 1
2 b 0
;
run;
proc sql;
create table out as
select *,
count(distinct case when not sub_task then task_id end) as done
from test
group by id
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @sasecn Should your pattern of subtask follow a pattern such that it takes the value of main_task with a suffix _s , extracting the first char would make it easy. In any event, you can apply the same logic accordingly
So
data test;
input id $ task_id $ sub_task;
datalines;
1 a 0
1 a_s 1
1 b 0
2 a_s 1
2 b 0
;
run;
proc sql;
create table want as
select *,count( distinct first(task_id)) as done
from test
group by id;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello:
Sorry for the mistake. I tried to make the code simple. But the real task id has more chars. e.g. the main task id is 2019-ABC-MAIN-1234; the sub task is 2019-ABC-MAIN-1234-S001 (or can be 2019-ABC-MAIN-1234-S001-S002). It is true that the sub task always like "main task id"-S001 ( or ....-S001-S002-... ).
I don't think the first() works. Sorry for making it more complex.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I expected something like that. Can you post a comprehensive sample so that the very logic can be adjusted?
The logic is still the same i.e to get the grouped pattern and count the distinct. Now you need to let us know the values that's a good representative of your sample.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Okay, i tried to manage to get the last 4 digits from my task_id, and now the ids will be 4 digits followed by sub numbers, see example code. There is no special pattern in the 4 digits. Hope this make sense. So, worker 1 should have 7 tasks done.
data test;
informat id task_id $25. sub_task;
input id task_id sub_task;
datalines;
1 1234 0
1 1234S001 1
1 1234S001S002 1
1 3456 0
1 9685 0
1 3648 0
1 8888 0
1 2222 0
1 3366 0
2 1234S001 1
2 5678 0
;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
HI @sasecnThank you for the response
"now the ids will be 4 digits followed by sub numbers"
Does this mean a Task or a subtask will conform to a given 4 digit number? Example 1234 series? If this is true, this in itself is a pattern.
"There is no special pattern in the 4 digits"
I don't think we are looking at any special pattern. The logic is basically to group Taskid and sub task into one group. That's all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are right that the task id is always 4 digits number + sub number. The sub number could be S001 or S001S002, .... but always ended as SXXX. The reason i used distinct key work is because there are some duplicates records e.g.
1 1234 0
1 1234 0
My problem is that i cannot find a way to handle workers who only works on sub tasks but not the main one. e.g. worker 2 works on task 1234S001, it should count as 1 independent task. But my original code miss-count this.
Does this make sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"You are right that the task id is always 4 digits number "
OK, so this goes back to my original idea, where in the same first 4 digits of all tasks will form one group. Sub numbers are not relevant to this logic
data test;
informat id task_id $25. sub_task;
input id task_id sub_task;
datalines;
1 1234 0
1 1234S001 1
1 1234S001S002 1
1 3456 0
1 9685 0
1 3648 0
1 8888 0
1 2222 0
1 3366 0
2 1234S001 1
2 5678 0
;
RUN;
proc sql;
create table want as
select *, count(distinct substr(task_id,1,4)) as done
from test
group by id;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Alright, i got your idea. Thx, that works in my situation. Now, i understand what you meant the pattern. I will run into other format of the task ids. Hope this does not bother you too much.
I will have to deal with another situation that have to handle different format of ids. For example, the main id will be either 4 digits or 8 digits, and this will be followed by another 4 digits that identify the status of required to redo the task. e.g.
for task 1234, required redo it then the full task id will be 1234R001, for task 1234S001 (8 digits main id) required redo, full id is 1234S001R001. Again, i want to count the tasks done, if worker 1 works on 1234 and 1234R001, this counts as 1 task done. if worker 1 works 1234S001 and 1234S001R001, this also counts as 1. if worker 2 only works on 1234R001, this counts as 1.
I will still have problems to count worker 2. What will be the idea to handle this? Sorry to bother you a lot.
data test;
informat id task_id $25. redo_task;
input id task_id redo_task;
datalines;
1 1234 0
1 1234R001 1
1 1234S001 0
1 1234S001R002 1
1 3456 0
1 9685 0
1 3648 0
2 1234S001R001 1
2 1234R001 1
2 5678 0
;
RUN;
proc sql;
create table out as
select *,
count(distinct case when not redo_task then task_id end) as done
from test
group by id
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @sasecn No bother at all.
Basically to give you a clear pic of how the logic works is
Taking your latest sample,
Is the following your expected result ?
Obs | id | task_id | t | grp |
---|---|---|---|---|
1 | 1 | 1234 | 1234 | 1 |
2 | 1 | 1234R001 | 1234 | 1 |
3 | 1 | 1234S001 | 1234 | 1 |
4 | 1 | 1234S001R002 | 1234 | 1 |
5 | 1 | 3456 | 3456 | 2 |
6 | 1 | 9685 | 9685 | 3 |
7 | 1 | 3648 | 3648 | 4 |
8 | 2 | 1234S001R001 | 1234 | 1 |
9 | 2 | 1234R001 | 1234 | 1 |
10 | 2 | 5678 | 5678 | 2 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, rush into something else. Will get back to you later.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the late reply. I want to count the numbers of tasks done. If worker works on main and re-do task, this only count as 1. e.g. worker 1 works on task: 1234 and 1234R001, this counts as 1 task done. e.g. worker 1 works on task: 1234S001 (note that main task id can be 4 or 8 or more digits) and 1234S001R001, this counts as 1 task done. e.g. worker 2 works only on task: 1234R001, this counts as 1 task done (consider as independent task even worker 2 didn't work on main task 1234).
So, the expected result will be worker 1 has 5 tasks done, worker 2 has 3 tasks done.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @sasecn Looks like the problem here is for us to first identify what constitutes to Independent parent task, and it's sub task.
So taking this example
Obs | id | task_id | t | grp |
---|---|---|---|---|
1 | 1 | 1234 | 1234 | 1 |
2 | 1 | 1234R001 | 1234 | 1 |
3 | 1 | 1234S001 | 1234 | 1 |
4 | 1 | 1234S001R002 | 1234 | 1 |
When you say (note that main task id can be 4 or 8 or more digits) , do you have a classification like the letter S denotes subtask or otherwise? I need your help in determining the classification. That's the reason I presented Grouping GRP in the above. Also, when you say 4 or 8 digits. I see there digits(should mean numbers from 0-9) and characters. Once we have a clear understanding, we can proceed with the logic.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To make it simple, we can ignore the letter S, say main task id can be 4 or 8 digits, e.g. 1234 or 12345678. We only need to identify if the main id followed by R001 or R002 always 4 digits to indicate that the task has been re-done. e.g. 12345678R001 (same as previous 1234S001) meaning this is re-do task of 12345678. If a worker works on both main and re-do task, this only counts as 1 task done; if a worker only works on main or only works on re-do task (consider as independent task), this counts as 1 task done.
e.g.
if a worker works on 12345678 and 12345678R001, this counts as 1 task done; if a worker works only on 12345678, this counts as 1 task done; if a worker works only on 12345678R001 (without working on 12345678), this also counts as 1 task done.
Hope this is clear to you.