BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

This is definately not an easy task.. so total kudos if anyone that can answer this.

My task is to merge 2 tables into a main 1 (so 3 all together). One of the 2 has data that is aggregated by city (so population by city) and the 2nd is aggregated by regions not in a city in table 1 (so out of city population).... The 3rd main one has data not aggregated, it's at the Postal Code level, but attached to each row is a City Name and an Out-of-City name.  So far so good.  When I merge all 3, the postal Codes that belong in the City (eg. Toronto) will get Toronto's total population, and that would be double/triple/etc. counting, considering Toronto has many many postal codes.  So my task is to equally divide Toronto's population into each of the many many Postal Codes residing under Toronto in List 3.  All this by different dates as well..

Confused ?, let me try visually..   THANK YOU..

Table 1

CITY             Date                   Pop

Toronto         2011.01           5,795,389

Toronto         2011.02           5,803,064

Toronto         2011.03           5,810,739

Table2 (Note, Ontario is non-Toronto)

CITY            Date                   Pop

Ontario         2011.01           2,678,462

Ontario         2011.02           2,678,754

Table3

POSTALCODE        City                   Date  

M1P 3F7                 Toronto             2011.01

M1P 3F7                 Toronto             2011.02      

M1J 2G8                  Toronto            2011.01

M1J 2G8                  Toronto            2011.02

M1J 2G8                  Toronto            2011.03         

L0A 2F6                   Ontario            2011.01

L0A 2F6                   Ontario            2011.02

L2G 7D3                  Ontario             2011.01

L2G 7D3                  Ontario             2011.02

WANT:

POSTALCODE        City                   Date          Population

M1P 3F7                 Toronto             2011.01         1,931,796   <- 5,795,389 divided by the 3 different Postal Codes in Toronto within the same date

M1P 3F7                 Toronto             2011.02         2,901,532   <- 5,803,064 divided by the 2 different Postal Codes in Toronto within the same date      

M1J 2G8                  Toronto            2011.01         1,931,796   <- 5,795,389 divided by the 3 different Postal Codes in Toronto within the same date

M1J 2G8                  Toronto            2011.02         2,901,532   <- 5,803,064 divided by the 2 different Postal Codes in Toronto within the same date

M1J 4K9                  Toronto            2011.01         1,931,796   <- 5,795,389 divided by the 3 different Postal Codes in Toronto within the same date

L0A 2F6                   Ontario            2011.01         1,339,231   <- 2,678,462 divided by the 2 different Postal Codes in Ontario within the same date

L0A 2F6                   Ontario            2011.02         1,339,377   <- 2,678,754 divided by the 2 different Postal Codes in Ontario within the same date

L2G 7D3                  Ontario             2011.01        1,339,231   <- 2,678,462 divided by the 2 different Postal Codes in Ontario within the same date

L2G 7D3                  Ontario             2011.02        1,339,377   <- 2,678,754 divided by the 2 different Postal Codes in Ontario within the same date

5 REPLIES 5
Haikuo
Onyx | Level 15

Can't figure out the order of your output, but the contents seem to match:

data h1;

input (CITY Date) (:$10.) Pop comma20.;

cards;

Toronto 2011.01 5,795,389

Toronto 2011.02 5,803,064

Toronto 2011.03 5,810,739

;

data h2;

input (CITY Date) (:$10.) Pop comma20.;

cards;

Ontario 2011.01 2,678,462

Ontario 2011.02 2,678,754

;

data h3;

input (POSTALCODE City Date ) (:$&10.) ;

cards;

M1P 3F7 Toronto 2011.01

M1P 3F7 Toronto 2011.02

M1J 2G8 Toronto 2011.01

M1J 2G8 Toronto 2011.02

M1J 2G8 Toronto 2011.03

M1J 4K9 Toronto 2011.01

L0A 2F6 Ontario 2011.01

L0A 2F6 Ontario 2011.02

L2G 7D3 Ontario 2011.01

L2G 7D3 Ontario 2011.02

;

proc sql;

create table want as

select a.*, coalesce(b.pop,c.pop)/count(*) as pop format=10.0  from h3 a

left join

h1 b

on cats(a.city,a.date)=cats(b.city,b.date)

left join

h2 c

on cats(a.city,a.date)=cats(c.city,c.date)

group by a.city,a.date;

quit;

proc print;run;

Haikuo

Haikuo
Onyx | Level 15

As a possible data step alternative, here is a Hash approach:

data want (drop=_:);

  if _n_=1 then do;

  if 0 then do; set h1;set h2; set h3; end;

dcl hash h1(dataset:'h1'); h1.definekey('city','date'); h1.definedata('pop'); h1.definedone();

dcl hash h2(dataset: 'h2');h2.definekey('city','date'); h2.definedata('pop'); h2.definedone();

dcl hash h3(dataset: 'h3', multidata:'y');h3.definekey('city','date'); h3.definedata(all:'y'); h3.definedone();

  dcl hiter h3i('h3');

end;

_rc=h3i.first();

do while (_rc=0);

_ct=0;

do _rc=h3.find() by 0 while (_rc=0); _ct+1; _rc=h3.find_next(); end;

_rc=h1.find();

if _rc ne 0 then _rc=h2.find();

pop=pop/_ct;

output;

_rc=h3i.next();

end;

run;

Haikuo

podarum
Quartz | Level 8

Thank you.. I'll give it a try.. 

Ksharp
Super User

How about:

data h1;
input (CITY _Date) (:$10.) Pop comma20.;
date=input(cats(_date,'.01'),yymmdd10.);
drop _date;
format date yymmd7.;
cards;
Toronto 2011.01 5,795,389
Toronto 2011.02 5,803,064
Toronto 2011.03 5,810,739
;
run;

data h2;
input (CITY _Date) (:$10.) Pop comma20.;
date=input(cats(_date,'.01'),yymmdd10.);
drop _date;
format date yymmd7.;
cards;
Ontario 2011.01 2,678,462
Ontario 2011.02 2,678,754
;
run;
data h3;
input (POSTALCODE City _Date ) (:$&10.) ;
date=input(cats(_date,'.01'),yymmdd10.);
drop _date;
format date yymmd7.;
cards;
M1P 3F7    Toronto   2011.01
M1P 3F7    Toronto   2011.02
M1J 2G8    Toronto   2011.01
M1J 2G8    Toronto   2011.02
M1J 2G8    Toronto   2011.03
M1J 4K9    Toronto   2011.01
L0A 2F6    Ontario   2011.01
L0A 2F6    Ontario   2011.02
L2G 7D3    Ontario   2011.01
L2G 7D3    Ontario   2011.02
;
run;
proc sql ;
create table want as
select d.*,Population format=comma20.
from  h3 as d,
    (select a.city,a.date,a.pop/level as Population
       from (select * from h1 union all select * from h2) as a ,
            (select city,date,count(distinct postalcode) as level from h3  group by city,date) as b
        where a.city=b.city and a.date=b.date )  as c
 where d.city=c.city and d.date=c.date 
  order by d.POSTALCODE,d.City,d.Date ; 
quit;
 


Ksharp

podarum
Quartz | Level 8

Thanks KSharp..this is perfect..

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1229 views
  • 3 likes
  • 3 in conversation