BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasecn
Quartz | Level 8

I have two large source data sets. Below code can create example of my data sets (source data sets contain more obs and more years of information).

 data old_1;
input id PC2000 $ PC2001 $ PC2002 $ PC2003 $;
datalines;
1 AAAAA AAAAA BBBBB AAAAA
2 AAAAA BBBBB CCCCC BBBBB
3 BBBBB BBBBB BBBBB BBBBB
4 CCCCC CCCCC AAAAA BBBBB
5 DDDDD AAAAA AAAAA BBBBB
6 AAAAA BBBBB CCCCC DDDDD
7 EEEEE EEEEE DDDDD BBBBB
;

data old_2;
input PC $ region $;
datalines;
AAAAA 1111
BBBBB 1111
CCCCC 2222
DDDDD 2222
EEEEE 3333
;

My goal is to construct a new data set which counts amount of movement flows (including number of people stay and move) between each pair of region for each year of 2001, 2002, 2003. My idea is to merge the two source data first to get something like:

 

id PC2000 Region2000 PC2001 Region2001 PC2002 Region20002 PC2003 Region2003

1  .............................

2 AAAAA        1111        BBBBB      1111         CCCCC       22222         BBBBB     11111

3 ......

.

.

Then use the information to get people movement flows between each pair of region for each year of 2001, 2002, 2003. For example, since no one moved from Region 1111 in 2000 to Region 2222 in 2001 (Origin is 1111, Destination is 2222), the number of movers from Region 1111 to 2222 in 2001 is 0;   since there are 2 people (person id 2 and 6) moved from Region 1111 in 20001 to Region 2222 in 2002, the number of movers from Region 1111 to 2222 in 2001 is 2.  .....   My new data set likes:

 

 data new;
input Origin_Region $ Destination_Region $ year flow;
datalines;
1111 1111 2001 4 
1111 2222 2001 0
1111 3333 2001 0
1111 1111 2002 3
1111 2222 2002 2
1111 3333 2002 0
1111 1111 2003 4
1111 2222 2003 0
1111 3333 2003 0
2222 2222 2001 1
2222 1111 2001 1
2222 3333 2001 0
2222 2222 2002 0
2222 1111 2002 1
2222 3333 2002 0
2222 2222 2003 1
2222 1111 2003 1
2222 3333 2003 0
3333 3333 2001 1
3333 1111 2001 0
3333 2222 2001 0
3333 3333 2002 0
3333 1111 2002 0
3333 2222 2002 1
3333 3333 2003 0
3333 1111 2003 0
3333 2222 2003 0
;

Sorry for the long question! Any help will be great!

Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I'm sure there is a more direct approach but, since no one has suggested any yet, here is one way to achieve your desired result:

data old_1;
  input id PC2000 $ PC2001 $ PC2002 $ PC2003 $;
  datalines;
1 AAAAA AAAAA BBBBB AAAAA
2 AAAAA BBBBB CCCCC BBBBB
3 BBBBB BBBBB BBBBB BBBBB
4 CCCCC CCCCC AAAAA BBBBB
5 DDDDD AAAAA AAAAA BBBBB
6 AAAAA BBBBB CCCCC DDDDD
7 EEEEE EEEEE DDDDD BBBBB
;

data old_2;
  input PC $ region $;
  datalines;
AAAAA 1111
BBBBB 1111
CCCCC 2222
DDDDD 2222
EEEEE 3333
;

data fmtdata;
  set old_2 (rename=(PC=Start region=label));
  retain fmtname 'rcodes' type 'C';
run;

proc format cntlin = fmtdata;
run;

data need1 (keep=id region year destination_region count);
  set old_1;
  array PCs(2000:2003) $ PC2000-PC2003;
  do _n_=2001 to 2003;
    region=put(PCs(_n_-1),$rcodes.);
    year=_n_;
    destination_region=put(PCs(_n_),$rcodes.);
    count=1;
    output;
  end;
run;

proc sql;
  create table region as
    select distinct region
      from old_2
  ;
quit;
proc sql;
  create table need2 as
   select a.region,
          b.region as destination_region
     from region a, region b
  ;
quit;

proc sql noprint;
  select id
    into :ids separated by ','
      from old_1
  ;
quit;
    
data need2;
  set need2;
  do id=&ids.;
    do year=2001 to 2003;
      output;
    end;
  end;
run;

proc sort data=need1;
  by id year region destination_region;
run;

proc sort data=need2;
  by id year region destination_region;
run;

data need3;
  update need2 need1;
  by id year  region destination_region;
  if missing(count) then count=0;
run;

proc sql;
  create table want as
    select year, region, destination_region, sum(count) as flow
      from need3
        group by region, year, destination_region
  ;
quit;

Art, CEO, AnalystFinder.com

 

View solution in original post

10 REPLIES 10
pau13rown
Lapis Lazuli | Level 10

very difficult to understand what you're saying, but you might want proc transpose

sasecn
Quartz | Level 8

PC is the postal code, Region has id as 1111, 2222 , 3333. So, postal code AAAAA and BBBBB are in Region 1111. I am trying to count the number of movers between Regions for each year.

pau13rown
Lapis Lazuli | Level 10

firstly i think you want old1 on id-pc level, not id level ie pc as rows, not as cols, then you want to merge region from old2 onto old1, by pc (first sort old1 by pc). Then you have id and region on the same dataset, and you could use eg a simple proc freq or proc sort with nodupkey and dupout=...., to see how many ids have multiple regions

art297
Opal | Level 21

I'm sure there is a more direct approach but, since no one has suggested any yet, here is one way to achieve your desired result:

data old_1;
  input id PC2000 $ PC2001 $ PC2002 $ PC2003 $;
  datalines;
1 AAAAA AAAAA BBBBB AAAAA
2 AAAAA BBBBB CCCCC BBBBB
3 BBBBB BBBBB BBBBB BBBBB
4 CCCCC CCCCC AAAAA BBBBB
5 DDDDD AAAAA AAAAA BBBBB
6 AAAAA BBBBB CCCCC DDDDD
7 EEEEE EEEEE DDDDD BBBBB
;

data old_2;
  input PC $ region $;
  datalines;
AAAAA 1111
BBBBB 1111
CCCCC 2222
DDDDD 2222
EEEEE 3333
;

data fmtdata;
  set old_2 (rename=(PC=Start region=label));
  retain fmtname 'rcodes' type 'C';
run;

proc format cntlin = fmtdata;
run;

data need1 (keep=id region year destination_region count);
  set old_1;
  array PCs(2000:2003) $ PC2000-PC2003;
  do _n_=2001 to 2003;
    region=put(PCs(_n_-1),$rcodes.);
    year=_n_;
    destination_region=put(PCs(_n_),$rcodes.);
    count=1;
    output;
  end;
run;

proc sql;
  create table region as
    select distinct region
      from old_2
  ;
quit;
proc sql;
  create table need2 as
   select a.region,
          b.region as destination_region
     from region a, region b
  ;
quit;

proc sql noprint;
  select id
    into :ids separated by ','
      from old_1
  ;
quit;
    
data need2;
  set need2;
  do id=&ids.;
    do year=2001 to 2003;
      output;
    end;
  end;
run;

proc sort data=need1;
  by id year region destination_region;
run;

proc sort data=need2;
  by id year region destination_region;
run;

data need3;
  update need2 need1;
  by id year  region destination_region;
  if missing(count) then count=0;
run;

proc sql;
  create table want as
    select year, region, destination_region, sum(count) as flow
      from need3
        group by region, year, destination_region
  ;
quit;

Art, CEO, AnalystFinder.com

 

sasecn
Quartz | Level 8

Thanks for the code. It works!

saseles
Calcite | Level 5

Here is what i tried. Is that right?

 data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

data _null_;
if _n_=1 then do;
if 0 then set have;
length o d $50;
  dcl hash H1 (multidata:'y',ordered:'y') ;
   h1.definekey  ('year','o','d') ;
   h1.definedata ('year',"o","d") ;
   h1.definedone () ;
   call missing(o,d);
end;
set have end=last;
array t(*) pc:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
o=t(_n_-1);
d=t(_n_);
rc=h1.add();
end;
if last then h1.output(dataset:'temp2');
run;

data want;
if _n_=1 then do;
if 0 then set temp3;
  dcl hash H1 (dataset:'temp3',multidata:'y') ;
   h1.definekey  ('year','o','d') ;
   h1.definedata ('flows') ;
   h1.definedone () ;
end;
if 0 then set old_1;
array t(*) pc:;
set sort_2(rename=(region=o));
do _n_= 2 to dim(t);
year=vname(t(_n_));
 do n=1 to nobs;
 set sort_2(keep=region rename=(region=d)) nobs=nobs point=n;
flows=0;
rc= h1.find();
 output;
end;
end;
keep year o d flows;
run;
MarkWik
Quartz | Level 8

Can you explain the logic for one obs in want

1111 1111 2001 4

 

like how you derive this?

sasecn
Quartz | Level 8

In the year of 2001, from obs 1, 2, 3, and 6, there are movement but all within region 1111. 

novinosrin
Tourmaline | Level 20
data old_1;
  input id PC2000 $ PC2001 $ PC2002 $ PC2003 $;
  datalines;
1 AAAAA AAAAA BBBBB AAAAA
2 AAAAA BBBBB CCCCC BBBBB
3 BBBBB BBBBB BBBBB BBBBB
4 CCCCC CCCCC AAAAA BBBBB
5 DDDDD AAAAA AAAAA BBBBB
6 AAAAA BBBBB CCCCC DDDDD
7 EEEEE EEEEE DDDDD BBBBB
;

data old_2;
  input PC $ region $;
  datalines;
AAAAA 1111
BBBBB 1111
CCCCC 2222
DDDDD 2222
EEEEE 3333
;

data temp;
if _n_=1 then do;
if 0 then set old_2;
   dcl hash H (dataset:'old_2') ;
   h.definekey  ("pc") ;
   h.definedata ("region") ;
   h.definedone () ;
end;
set old_1;
array t(*) pc:;
do _n_=1 to dim(t);
if h.find(key:t(_n_))=0 then t(_n_)=region;
end;
drop pc region;
run;

data _null_;
if _n_=1 then do;
if 0 then set temp;
length o d $50;
  dcl hash H1 (multidata:'y',ordered:'y') ;
   h1.definekey  ('id','year','o','d') ;
   h1.definedata ('id','year',"o","d") ;
   h1.definedone () ;
   call missing(o,d);
end;
set temp end=last;
array t(*) pc:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
o=t(_n_-1);
d=t(_n_);
rc=h1.add();
end;
if last then h1.output(dataset:'temp2');
run;


proc sort data=old_2 out=sort_2(keep=region) nodupkey;
by region;
run;

proc sql;
  create table temp3 as
    select year, o, d, count(*) as flows
      from temp2
        group by year, o, d;
		  
quit;

data want;
if _n_=1 then do;
if 0 then set temp3;
  dcl hash H1 (dataset:'temp3',multidata:'y') ;
   h1.definekey  ('year','o','d') ;
   h1.definedata ('flows') ;
   h1.definedone () ;
end;
if 0 then set old_1;
array t(*) pc:;
set sort_2(rename=(region=o));
do _n_= 2 to dim(t);
year=vname(t(_n_));
 do n=1 to nobs;
 set sort_2(keep=region rename=(region=d)) nobs=nobs point=n;
flows=0;
rc= h1.find();
 output;
end;
end;
keep year o d flows;
run;
novinosrin
Tourmaline | Level 20

@sasecn I can certainly concise the above response by further shortening the steps by combining (temp and temp2),( temp3 and want) into one step but for testing purpose it seemed much easier for my eyes

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1375 views
  • 1 like
  • 6 in conversation