Re: shared subgroup values

Rahul_SAS · Posted 06-22-2018 05:29 AM

Hi Experts,

I have a data like given below:

ID	Attempts
1000000002	A2
1000000002	A2
1000000002	A1
1000000002	NA
1000000003	NA
1000000007	NA
1000000008	A3
1000000017	NA
1000000018	A1
1000000018	A2
1000000018	A3
1000000018	A2
1000000018	NA
1000000019	A1
1000000019	A3
1000000019	NA
1000000020	A1
1000000020	A3
1000000022	A2

want the output as

ID	Attempts	Attemptbyid	shared
1000000002	A2	1	0.33
1000000002	A2	0	0.33
1000000002	A1	0	0.33
1000000002	NA	0	0
1000000003	NA	0	0
1000000007	NA	0	0
1000000008	A3	1	1
1000000017	NA	0	0
1000000018	A1	1	0.25
1000000018	A2	0	0.25
1000000018	A3	0	0.25
1000000018	A2	0	0.25
1000000018	NA	0	0
1000000019	A1	1	0.5
1000000019	A3	0	0.5
1000000019	NA	0	0
1000000020	A1	1	0.5
1000000020	A3	0	0.5
1000000022	A2	1	0

need to created a varialble Shared which should have the total count of distince IDs where IDs with attemps NA has to be excluded.

created Attemptbyid variable as count for IDs exculuding NAs but have to distribute distinct count (Attemptbyid) 1 to all subgroup observations.

thanks

RW9 · Posted 06-22-2018 05:37 AM

As I am not going to type that test data in, this is untested.

proc sql;
  create table want as
  select a.*,
         (select count(attempts) from have where id=a.id and attempts=a.attempts) / 
           (select count(attempts) from have where id=a.id and attempts ne "NA") as shared
  from   have a;
quit;

Rahul_SAS · Posted 06-22-2018 06:13 AM

HI RW9,

its not giving the desired result.

Example:

1000000018	NA
1000000018	A2
1000000018	A2
1000000018	A1

for this ID, the distinct count is 1 and excluding NA total Attempts is 3 so i need 1/3=.33 to be populated to all rows.

1000000018	NA	0	0
1000000018	A2	1	0.33
1000000018	A2	0	0.33
1000000018	A1	0	0.33

similarly if there are 4 Attempts then 1/4=.25 to be populated in Attempts

RW9 · Posted 06-22-2018 06:39 AM

"As I am not going to type that test data in, this is untested." - and am still not typing test data in.

Why is distinct count 1? There is at least 3 distinct values in the example provided- A1, A2, NA.

Rahul_SAS · Posted 06-22-2018 06:46 AM

I have to get this as 1 to match the numbers.

if it gets populated for all rows then the total could will be so high and won't give correct result while aggregation.

Reeza · Posted 06-22-2018 10:58 AM

data temp;
set have;

group = substr(attempts, 1, 1);
run;

proc sql;
create table want as 
select *, 1/count(*) as mean_value
from temp
group by id, group;
quit;

@Rahul_SAS wrote:

Hi Experts,

I have a data like given below:

ID Attempts

1000000002 A2

1000000002 A2

1000000002 A1

1000000002 NA

1000000003 NA

1000000007 NA

1000000008 A3

1000000017 NA

1000000018 A1

1000000018 A2

1000000018 A3

1000000018 A2

1000000018 NA

1000000019 A1

1000000019 A3

1000000019 NA

1000000020 A1

1000000020 A3

1000000022 A2

want the output as

ID Attempts Attemptbyid shared

1000000002 A2 1 0.33

1000000002 A2 0 0.33

1000000002 A1 0 0.33

1000000002 NA 0 0

1000000003 NA 0 0

1000000007 NA 0 0

1000000008 A3 1 1

1000000017 NA 0 0

1000000018 A1 1 0.25

1000000018 A2 0 0.25

1000000018 A3 0 0.25

1000000018 A2 0 0.25

1000000018 NA 0 0

1000000019 A1 1 0.5

1000000019 A3 0 0.5

1000000019 NA 0 0

1000000020 A1 1 0.5

1000000020 A3 0 0.5

1000000022 A2 1 0

need to created a varialble Shared which should have the total count of distince IDs where IDs with attemps NA has to be excluded.

created Attemptbyid variable as count for IDs exculuding NAs but have to distribute distinct count (Attemptbyid) 1 to all subgroup observations.

thanks

shared subgroup values