Solved: Conditional ratio of values

NewUsrStat · Posted 08-28-2023 09:48 AM

Hi guys,

suppose to have the following table:

dataset 1

Region	Value	Class
1	34	0
2	23	0
3	12	0
1	65	1
2	32	1
3	43	1
1	98	2
2	76	2
3	44	2
1	1	3
2	2	3
3	3	3
1	45	4
2	38	4
3	43	4
1	0	5
2	0	5
3	0	5

Then you have another dataset as follows:

dataset 2

ID	Region
1	1
2	1
3	3
4	2
5	2
6	3
7	1
8	2
9	1
10	1
11	1
12	3
13	3
14	3
15	2
16	1
17	2
18	3

Then you want to divide each value of the column "Value" of the first dataset by the total number of corresponding "Region" class in the second dataset. Lets say: 34 will be divided by 7 because in dataset 2 there are 7 IDs from region = 1. 23 in dataset 1 will be divided by 5 because there are 5 IDs in dataset 2 with "Region" = 2. This, will be done in dataset 1 regardless of the "Class" variable. It will be there and will not take any part into the calculation. Can anyone help me please?

Thank you in advance

Patrick · Posted 08-28-2023 08:25 PM

Below a SQL option.

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

And in case the id's in table two can be repeated but you only want to count distinct id's then below variant should work.

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(distinct id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

View solution in original post

ballardw · Posted 08-28-2023 06:10 PM

First please provide data in the form of data step code. Like this and pasted into a text box so it is easy to copy and run.

data two;
 input ID 	Region;
datalines;
1 	1
2 	1
3 	3
4 	2
5 	2
6 	3
7 	1
8 	2
9 	1
10 	1
11 	1
12 	3
13 	3
14 	3
15 	2
16 	1
17 	2
18 	3
;

data one;
  input Region 	Value 	Class;
datalines;
1 	34 	0
2 	23 	0
3 	12 	0
1 	65 	1
2 	32 	1
3 	43 	1
1 	98 	2
2 	76 	2
3 	44 	2
1 	1 	3
2 	2 	3
3 	3 	3
1 	45 	4
2 	38 	4
3 	43 	4
1 	0 	5
2 	0 	5
3 	0 	5
;

One way:

/* get a count of how many times each "region" appears
in data two*/

proc freq data=two noprint;
   tables region/out=regioncount nopercent nocum;
run;

proc sort data=one;
  by region;
run;

data want;
  merge one 
        regioncount(drop=percent)
  ;
  by region;
  ratio = value/count;
run;

However there are some places that your description may be incomplete. You say "total number of corresponding "Region" class in the second dataset." and "divided by 7 because in dataset 2 there are 7 IDs from region = 1". This would be correct if and only if ID is never repeated in a Region. If there are, or could be, duplicates of ID then you need to clarify if the divisor is "number of Id per region" or "number of records with region" as they would not be the same thing.

@NewUsrStat wrote:

Hi guys,

suppose to have the following table:

dataset 1

Region Value Class

1 34 0

2 23 0

3 12 0

1 65 1

2 32 1

3 43 1

1 98 2

2 76 2

3 44 2

1 1 3

2 2 3

3 3 3

1 45 4

2 38 4

3 43 4

1 0 5

2 0 5

3 0 5

Then you have another dataset as follows:

dataset 2

ID Region

1 1

2 1

3 3

4 2

5 2

6 3

7 1

8 2

9 1

10 1

11 1

12 3

13 3

14 3

15 2

16 1

17 2

18 3

Then you want to divide each value of the column "Value" of the first dataset by the total number of corresponding "Region" class in the second dataset. Lets say: 34 will be divided by 7 because in dataset 2 there are 7 IDs from region = 1. 23 in dataset 1 will be divided by 5 because there are 5 IDs in dataset 2 with "Region" = 2. This, will be done in dataset 1 regardless of the "Class" variable. It will be there and will not take any part into the calculation. Can anyone help me please?

Thank you in advance

NewUsrStat · Posted 08-29-2023 02:47 AM

Thank you very much for help @ballardw. IDs in dataset2 are all unique. My point was how to compress in a single piece of code both operations: the calculations of totals (by regions 1-3) in dataset2 and the ratio calculation on records of dataset1.

ballardw · Posted 08-29-2023 10:02 AM

@NewUsrStat wrote:

Thank you very much for help @ballardw. IDs in dataset2 are all unique. My point was how to compress in a single piece of code both operations: the calculations of totals (by regions 1-3) in dataset2 and the ratio calculation on records of dataset1.

You will quite likely find than an insistence on a "single piece of code" can lead to a lot of time invested and often leading to fragile processes that minor changes to your data require drastic rewrites possibly requiring more time to fix than was invested in writing the first bit of code.

Patrick · Posted 08-28-2023 08:25 PM

Below a SQL option.

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

And in case the id's in table two can be repeated but you only want to count distinct id's then below variant should work.

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(distinct id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

Re: Conditional ratio of values

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away