BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Pyrite | Level 9

Hi guys, 

suppose to have the following table: 

dataset 1

 

Region Value Class
1 34 0
2 23 0
3 12 0
1 65 1
2 32 1
3 43 1
1 98 2
2 76 2
3 44 2
1 1 3
2 2 3
3 3 3
1 45 4
2 38 4
3 43 4
1 0 5
2 0 5
3 0 5

 

 

Then you have another dataset as follows: 

dataset 2

ID Region
1 1
2 1
3 3
4 2
5 2
6 3
7 1
8 2
9 1
10 1
11 1
12 3
13 3
14 3
15 2
16 1
17 2
18 3

 

Then you want to divide each value of the column "Value" of the first dataset by the total number of corresponding "Region" class in the second dataset. Lets say: 34 will be divided by 7 because in dataset 2 there are 7 IDs from region = 1.  23 in dataset 1 will be divided by 5 because there are 5 IDs in dataset 2 with "Region" = 2. This, will be done in dataset 1 regardless of the "Class" variable. It will be there and will not take any part into the calculation. Can anyone help me please? 

 

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below a SQL option.

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

And in case the id's in table two can be repeated but you only want to count distinct id's then below variant should work. 

Patrick_0-1693268687460.png

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(distinct id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

View solution in original post

4 REPLIES 4
ballardw
Super User

First please provide data in the form of data step code. Like this and pasted into a text box so it is easy to copy and run.

data two;
 input ID 	Region;
datalines;
1 	1
2 	1
3 	3
4 	2
5 	2
6 	3
7 	1
8 	2
9 	1
10 	1
11 	1
12 	3
13 	3
14 	3
15 	2
16 	1
17 	2
18 	3
;

data one;
  input Region 	Value 	Class;
datalines;
1 	34 	0
2 	23 	0
3 	12 	0
1 	65 	1
2 	32 	1
3 	43 	1
1 	98 	2
2 	76 	2
3 	44 	2
1 	1 	3
2 	2 	3
3 	3 	3
1 	45 	4
2 	38 	4
3 	43 	4
1 	0 	5
2 	0 	5
3 	0 	5
;

One way:

/* get a count of how many times each "region" appears
in data two*/

proc freq data=two noprint;
   tables region/out=regioncount nopercent nocum;
run;

proc sort data=one;
  by region;
run;

data want;
  merge one 
        regioncount(drop=percent)
  ;
  by region;
  ratio = value/count;
run;

However there are some places that your description may be incomplete. You say "total number of corresponding "Region" class in the second dataset."  and "divided by 7 because in dataset 2 there are 7 IDs from region = 1". This would be correct if and only if ID is never repeated in a Region. If there are, or could be, duplicates of ID then you need to clarify if the divisor is  "number of Id per region" or "number of records with region" as they would not be the same thing.

 

 


@NewUsrStat wrote:

Hi guys, 

suppose to have the following table: 

dataset 1

 

Region Value Class
1 34 0
2 23 0
3 12 0
1 65 1
2 32 1
3 43 1
1 98 2
2 76 2
3 44 2
1 1 3
2 2 3
3 3 3
1 45 4
2 38 4
3 43 4
1 0 5
2 0 5
3 0 5

 

 

Then you have another dataset as follows: 

dataset 2

ID Region
1 1
2 1
3 3
4 2
5 2
6 3
7 1
8 2
9 1
10 1
11 1
12 3
13 3
14 3
15 2
16 1
17 2
18 3

 

Then you want to divide each value of the column "Value" of the first dataset by the total number of corresponding "Region" class in the second dataset. Lets say: 34 will be divided by 7 because in dataset 2 there are 7 IDs from region = 1.  23 in dataset 1 will be divided by 5 because there are 5 IDs in dataset 2 with "Region" = 2. This, will be done in dataset 1 regardless of the "Class" variable. It will be there and will not take any part into the calculation. Can anyone help me please? 

 

Thank you in advance


 

NewUsrStat
Pyrite | Level 9

Thank you very much for help @ballardw. IDs in dataset2 are all unique. My point was how to compress in a single piece of code both operations: the calculations of totals (by regions 1-3) in dataset2 and the ratio calculation on records of dataset1.

ballardw
Super User

@NewUsrStat wrote:

Thank you very much for help @ballardw. IDs in dataset2 are all unique. My point was how to compress in a single piece of code both operations: the calculations of totals (by regions 1-3) in dataset2 and the ratio calculation on records of dataset1.


You will quite likely find than an insistence on a "single piece of code" can lead to a lot of time invested and often leading to fragile processes that minor changes to your data require drastic rewrites possibly requiring more time to fix than was invested in writing the first bit of code.

 

Patrick
Opal | Level 21

Below a SQL option.

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

And in case the id's in table two can be repeated but you only want to count distinct id's then below variant should work. 

Patrick_0-1693268687460.png

proc sql;
  select 
    l.region
    ,l.value
    ,l.class
    ,r.cnt
    ,l.value/r.cnt as calc_value format=f16.2
  from one l left join (select region, count(distinct id) as cnt from two group by region) r
  on l.region=r.region
  ;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 754 views
  • 3 likes
  • 3 in conversation