## merge proportionally (split into equal weights)

Super Contributor
Posts: 409

# merge proportionally (split into equal weights)

This is definately not an easy task.. so total kudos if anyone that can answer this.

My task is to merge 2 tables into a main 1 (so 3 all together). One of the 2 has data that is aggregated by city (so population by city) and the 2nd is aggregated by regions not in a city in table 1 (so out of city population).... The 3rd main one has data not aggregated, it's at the Postal Code level, but attached to each row is a City Name and an Out-of-City name.  So far so good.  When I merge all 3, the postal Codes that belong in the City (eg. Toronto) will get Toronto's total population, and that would be double/triple/etc. counting, considering Toronto has many many postal codes.  So my task is to equally divide Toronto's population into each of the many many Postal Codes residing under Toronto in List 3.  All this by different dates as well..

Confused ?, let me try visually..   THANK YOU..

Table 1

CITY             Date                   Pop

Toronto         2011.01           5,795,389

Toronto         2011.02           5,803,064

Toronto         2011.03           5,810,739

Table2 (Note, Ontario is non-Toronto)

CITY            Date                   Pop

Ontario         2011.01           2,678,462

Ontario         2011.02           2,678,754

Table3

POSTALCODE        City                   Date

M1P 3F7                 Toronto             2011.01

M1P 3F7                 Toronto             2011.02

M1J 2G8                  Toronto            2011.01

M1J 2G8                  Toronto            2011.02

M1J 2G8                  Toronto            2011.03

L0A 2F6                   Ontario            2011.01

L0A 2F6                   Ontario            2011.02

L2G 7D3                  Ontario             2011.01

L2G 7D3                  Ontario             2011.02

WANT:

POSTALCODE        City                   Date          Population

M1P 3F7                 Toronto             2011.01         1,931,796   <- 5,795,389 divided by the 3 different Postal Codes in Toronto within the same date

M1P 3F7                 Toronto             2011.02         2,901,532   <- 5,803,064 divided by the 2 different Postal Codes in Toronto within the same date

M1J 2G8                  Toronto            2011.01         1,931,796   <- 5,795,389 divided by the 3 different Postal Codes in Toronto within the same date

M1J 2G8                  Toronto            2011.02         2,901,532   <- 5,803,064 divided by the 2 different Postal Codes in Toronto within the same date

M1J 4K9                  Toronto            2011.01         1,931,796   <- 5,795,389 divided by the 3 different Postal Codes in Toronto within the same date

L0A 2F6                   Ontario            2011.01         1,339,231   <- 2,678,462 divided by the 2 different Postal Codes in Ontario within the same date

L0A 2F6                   Ontario            2011.02         1,339,377   <- 2,678,754 divided by the 2 different Postal Codes in Ontario within the same date

L2G 7D3                  Ontario             2011.01        1,339,231   <- 2,678,462 divided by the 2 different Postal Codes in Ontario within the same date

L2G 7D3                  Ontario             2011.02        1,339,377   <- 2,678,754 divided by the 2 different Postal Codes in Ontario within the same date

Posts: 3,167

## Re: merge proportionally (split into equal weights)

Can't figure out the order of your output, but the contents seem to match:

data h1;

input (CITY Date) (:\$10.) Pop comma20.;

cards;

Toronto 2011.01 5,795,389

Toronto 2011.02 5,803,064

Toronto 2011.03 5,810,739

;

data h2;

input (CITY Date) (:\$10.) Pop comma20.;

cards;

Ontario 2011.01 2,678,462

Ontario 2011.02 2,678,754

;

data h3;

input (POSTALCODE City Date ) (:\$&10.) ;

cards;

M1P 3F7 Toronto 2011.01

M1P 3F7 Toronto 2011.02

M1J 2G8 Toronto 2011.01

M1J 2G8 Toronto 2011.02

M1J 2G8 Toronto 2011.03

M1J 4K9 Toronto 2011.01

L0A 2F6 Ontario 2011.01

L0A 2F6 Ontario 2011.02

L2G 7D3 Ontario 2011.01

L2G 7D3 Ontario 2011.02

;

proc sql;

create table want as

select a.*, coalesce(b.pop,c.pop)/count(*) as pop format=10.0  from h3 a

left join

h1 b

on cats(a.city,a.date)=cats(b.city,b.date)

left join

h2 c

on cats(a.city,a.date)=cats(c.city,c.date)

group by a.city,a.date;

quit;

proc print;run;

Haikuo

Posts: 3,167

## Re: merge proportionally (split into equal weights)

As a possible data step alternative, here is a Hash approach:

data want (drop=_;

if _n_=1 then do;

if 0 then do; set h1;set h2; set h3; end;

dcl hash h1(dataset:'h1'); h1.definekey('city','date'); h1.definedata('pop'); h1.definedone();

dcl hash h2(dataset: 'h2');h2.definekey('city','date'); h2.definedata('pop'); h2.definedone();

dcl hash h3(dataset: 'h3', multidata:'y');h3.definekey('city','date'); h3.definedata(all:'y'); h3.definedone();

dcl hiter h3i('h3');

end;

_rc=h3i.first();

do while (_rc=0);

_ct=0;

do _rc=h3.find() by 0 while (_rc=0); _ct+1; _rc=h3.find_next(); end;

_rc=h1.find();

if _rc ne 0 then _rc=h2.find();

pop=pop/_ct;

output;

_rc=h3i.next();

end;

run;

Haikuo

Super Contributor
Posts: 409

## Re: merge proportionally (split into equal weights)

Thank you.. I'll give it a try..

Super User
Posts: 10,784

## Re: merge proportionally (split into equal weights)

```data h1;
input (CITY _Date) (:\$10.) Pop comma20.;
date=input(cats(_date,'.01'),yymmdd10.);
drop _date;
format date yymmd7.;
cards;
Toronto 2011.01 5,795,389
Toronto 2011.02 5,803,064
Toronto 2011.03 5,810,739
;
run;

data h2;
input (CITY _Date) (:\$10.) Pop comma20.;
date=input(cats(_date,'.01'),yymmdd10.);
drop _date;
format date yymmd7.;
cards;
Ontario 2011.01 2,678,462
Ontario 2011.02 2,678,754
;
run;
data h3;
input (POSTALCODE City _Date ) (:\$&10.) ;
date=input(cats(_date,'.01'),yymmdd10.);
drop _date;
format date yymmd7.;
cards;
M1P 3F7    Toronto   2011.01
M1P 3F7    Toronto   2011.02
M1J 2G8    Toronto   2011.01
M1J 2G8    Toronto   2011.02
M1J 2G8    Toronto   2011.03
M1J 4K9    Toronto   2011.01
L0A 2F6    Ontario   2011.01
L0A 2F6    Ontario   2011.02
L2G 7D3    Ontario   2011.01
L2G 7D3    Ontario   2011.02
;
run;
proc sql ;
create table want as
select d.*,Population format=comma20.
from  h3 as d,
(select a.city,a.date,a.pop/level as Population
from (select * from h1 union all select * from h2) as a ,
(select city,date,count(distinct postalcode) as level from h3  group by city,date) as b
where a.city=b.city and a.date=b.date )  as c
where d.city=c.city and d.date=c.date
order by d.POSTALCODE,d.City,d.Date ;
quit;

```

Ksharp

Super Contributor
Posts: 409

## Re: merge proportionally (split into equal weights)

Thanks KSharp..this is perfect..

Discussion stats
• 5 replies
• 650 views
• 3 likes
• 3 in conversation