Hi guys,
suppose to have the following table:
dataset 1
Region | Value | Class |
1 | 34 | 0 |
2 | 23 | 0 |
3 | 12 | 0 |
1 | 65 | 1 |
2 | 32 | 1 |
3 | 43 | 1 |
1 | 98 | 2 |
2 | 76 | 2 |
3 | 44 | 2 |
1 | 1 | 3 |
2 | 2 | 3 |
3 | 3 | 3 |
1 | 45 | 4 |
2 | 38 | 4 |
3 | 43 | 4 |
1 | 0 | 5 |
2 | 0 | 5 |
3 | 0 | 5 |
Then you have another dataset as follows:
dataset 2
ID | Region |
1 | 1 |
2 | 1 |
3 | 3 |
4 | 2 |
5 | 2 |
6 | 3 |
7 | 1 |
8 | 2 |
9 | 1 |
10 | 1 |
11 | 1 |
12 | 3 |
13 | 3 |
14 | 3 |
15 | 2 |
16 | 1 |
17 | 2 |
18 | 3 |
Then you want to divide each value of the column "Value" of the first dataset by the total number of corresponding "Region" class in the second dataset. Lets say: 34 will be divided by 7 because in dataset 2 there are 7 IDs from region = 1. 23 in dataset 1 will be divided by 5 because there are 5 IDs in dataset 2 with "Region" = 2. This, will be done in dataset 1 regardless of the "Class" variable. It will be there and will not take any part into the calculation. Can anyone help me please?
Thank you in advance
Below a SQL option.
proc sql;
select
l.region
,l.value
,l.class
,r.cnt
,l.value/r.cnt as calc_value format=f16.2
from one l left join (select region, count(id) as cnt from two group by region) r
on l.region=r.region
;
quit;
And in case the id's in table two can be repeated but you only want to count distinct id's then below variant should work.
proc sql;
select
l.region
,l.value
,l.class
,r.cnt
,l.value/r.cnt as calc_value format=f16.2
from one l left join (select region, count(distinct id) as cnt from two group by region) r
on l.region=r.region
;
quit;
First please provide data in the form of data step code. Like this and pasted into a text box so it is easy to copy and run.
data two; input ID Region; datalines; 1 1 2 1 3 3 4 2 5 2 6 3 7 1 8 2 9 1 10 1 11 1 12 3 13 3 14 3 15 2 16 1 17 2 18 3 ; data one; input Region Value Class; datalines; 1 34 0 2 23 0 3 12 0 1 65 1 2 32 1 3 43 1 1 98 2 2 76 2 3 44 2 1 1 3 2 2 3 3 3 3 1 45 4 2 38 4 3 43 4 1 0 5 2 0 5 3 0 5 ;
One way:
/* get a count of how many times each "region" appears in data two*/ proc freq data=two noprint; tables region/out=regioncount nopercent nocum; run; proc sort data=one; by region; run; data want; merge one regioncount(drop=percent) ; by region; ratio = value/count; run;
However there are some places that your description may be incomplete. You say "total number of corresponding "Region" class in the second dataset." and "divided by 7 because in dataset 2 there are 7 IDs from region = 1". This would be correct if and only if ID is never repeated in a Region. If there are, or could be, duplicates of ID then you need to clarify if the divisor is "number of Id per region" or "number of records with region" as they would not be the same thing.
@NewUsrStat wrote:
Hi guys,
suppose to have the following table:
dataset 1
Region Value Class 1 34 0 2 23 0 3 12 0 1 65 1 2 32 1 3 43 1 1 98 2 2 76 2 3 44 2 1 1 3 2 2 3 3 3 3 1 45 4 2 38 4 3 43 4 1 0 5 2 0 5 3 0 5
Then you have another dataset as follows:
dataset 2
ID Region 1 1 2 1 3 3 4 2 5 2 6 3 7 1 8 2 9 1 10 1 11 1 12 3 13 3 14 3 15 2 16 1 17 2 18 3
Then you want to divide each value of the column "Value" of the first dataset by the total number of corresponding "Region" class in the second dataset. Lets say: 34 will be divided by 7 because in dataset 2 there are 7 IDs from region = 1. 23 in dataset 1 will be divided by 5 because there are 5 IDs in dataset 2 with "Region" = 2. This, will be done in dataset 1 regardless of the "Class" variable. It will be there and will not take any part into the calculation. Can anyone help me please?
Thank you in advance
Thank you very much for help @ballardw. IDs in dataset2 are all unique. My point was how to compress in a single piece of code both operations: the calculations of totals (by regions 1-3) in dataset2 and the ratio calculation on records of dataset1.
@NewUsrStat wrote:
Thank you very much for help @ballardw. IDs in dataset2 are all unique. My point was how to compress in a single piece of code both operations: the calculations of totals (by regions 1-3) in dataset2 and the ratio calculation on records of dataset1.
You will quite likely find than an insistence on a "single piece of code" can lead to a lot of time invested and often leading to fragile processes that minor changes to your data require drastic rewrites possibly requiring more time to fix than was invested in writing the first bit of code.
Below a SQL option.
proc sql;
select
l.region
,l.value
,l.class
,r.cnt
,l.value/r.cnt as calc_value format=f16.2
from one l left join (select region, count(id) as cnt from two group by region) r
on l.region=r.region
;
quit;
And in case the id's in table two can be repeated but you only want to count distinct id's then below variant should work.
proc sql;
select
l.region
,l.value
,l.class
,r.cnt
,l.value/r.cnt as calc_value format=f16.2
from one l left join (select region, count(distinct id) as cnt from two group by region) r
on l.region=r.region
;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.