Hello:
I have a data set as following: (real data have more years and locations)
data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;
I would like to have a data which tracks the movement between all possible pairs of cities by year. So, the result will be alike below. Note that it doesn't include the city pair of itself, but if i want to include pairs like a - a, b - c, c - c. Is that easy to modify code?
data want;
input from $ to $ year number;
datalines;
a b 2001 0
a c 2001 0
b a 2001 1
b c 2001 0
c a 2001 1
c b 2001 0
a b 2002 1
a c 2002 0
b a 2002 0
b c 2002 0
c a 2002 0
c b 2002 0
a b 2003 0
a c 2003 0
b a 2003 0
b c 2003 1
c a 2003 0
c b 2003 0
;
Thanks for any help!
Hi @saseles Your requirement seems very similar to this thread
Can you please review and see if you are able to modify to your needs?
Hello novinosrin:
I can tell the similarity. But i am don't understand the data old_2. Seems that i only have old_1. I don't know how to modify the code since there are some commend are really new to me.
Any suggestions?
Thanks!
@saseles Oh no worries, I will work on the code for you shortly. But did you look at the other accepted solution that doesn't use hashes, which I believe is easier to follow step by step. If not, try to read through it.
And mine does require some cleaning anyways. lol
I tried that one as well. It seems that my problem is easier than that one since I don't need to merge those two data sets. I just got problem to modify the code
Here is what i tried. But not sure it is right or not. Can you please verify it?
Thanks a lot!
data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;
data _null_;
if _n_=1 then do;
if 0 then set have;
length o d $50;
dcl hash H1 (multidata:'y',ordered:'y') ;
h1.definekey ('year','o','d') ;
h1.definedata ('year',"o","d") ;
h1.definedone () ;
call missing(o,d);
end;
set have end=last;
array t(*) pc:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
o=t(_n_-1);
d=t(_n_);
rc=h1.add();
end;
if last then h1.output(dataset:'temp2');
run;
data want;
if _n_=1 then do;
if 0 then set temp3;
dcl hash H1 (dataset:'temp3',multidata:'y') ;
h1.definekey ('year','o','d') ;
h1.definedata ('flows') ;
h1.definedone () ;
end;
if 0 then set old_1;
array t(*) pc:;
set sort_2(rename=(region=o));
do _n_= 2 to dim(t);
year=vname(t(_n_));
do n=1 to nobs;
set sort_2(keep=region rename=(region=d)) nobs=nobs point=n;
flows=0;
rc= h1.find();
output;
end;
end;
keep year o d flows;
run;
@saseles Here you go. Have fun
Edit: Adding one more step to get the order right
data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;
data temp;
set have end=last;
array t(*) city:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
from=t(_n_-1);
to=t(_n_);
output;
end;
drop city:;
run;
proc transpose data=have out=t(keep=col1);
by city: notsorted;
var city:;
run;
proc sort data=t nodupkey;
by col1;
run;
data want;
if _n_=1 then do;
dcl hash H1 (dataset:'temp',multidata:'y') ;
h1.definekey ('year','from','to') ;
h1.definedone () ;
end;
if 0 then set have;
array t(*) city:;
set t(rename=(col1=from));
do _n_= 2 to dim(t);
year=vname(t(_n_));
do n=1 to nobs ;
set t(rename=(col1=to)) nobs=nobs point=n;
if to ne from then do;
if h1.check()=0 then number=1;
else number=0;
output;
end;
end;
end;
keep from year to number;
run;
proc sort data=want out=final_want;
by year;
run;
Thanks a lot! i am almost there!
The problem is that I have large data that cannot use the SAS work library. So, my data is save in my own defined library, say "mydata.have". I modified your code to direct all temp data to my library. But i don't know how to handle the third line in "data want" statement. i.e. "dcl hash H1 (dataset:'temp',multidata:'y') ;" . How to modify the one " ...dataset:'temp' ..." ?
I hope this is the final step.
Thanks again!
Do you mean to reference
"dcl hash H1 (dataset:'temp',multidata:'y') ;
as
"dcl hash H1 (dataset:'mydata.temp',multidata:'y') ;
If everything points to perm library just use the two level ref name like you did for others
I tried using ".... 'mydata.temp' ....", but I got error that "Hash object added XXXXXX items when memory failure occurred" "Insufficient memory to execute data step ...."
Is something that related to my computer, not the code? I notice that the temp data has really large size. It is much larger then my original data.
Oh gosh, memory failure occurred" "Insufficient memory to execute data step ...."
hash doesn't do well if you have Insufficient memory mate.
I may have to change the code for you. Can you confirm this before I modify again?
I tried again. Same thing
Ok no worries. Brb in 10 mins
Try this:
data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;
data temp;
set have end=last;
array t(*) city:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
from=t(_n_-1);
to=t(_n_);
output;
end;
drop city:;
run;
proc transpose data=have out=t(keep=col1);
by city: notsorted;
var city:;
run;
proc sort data=t nodupkey;
by col1;
run;
data want;
if 0 then set have;
array t(*) city:;
set t(rename=(col1=from));
do _n_= 2 to dim(t);
year=vname(t(_n_));
do n=1 to nobs ;
set t(rename=(col1=to)) nobs=nobs point=n;
if to ne from then output;
end;
end;
keep from year to ;
run;
proc sql;
create table final_want as
select *, 1 as Number from
(select year,from,to from want
intersect
select year,from,to from temp)
union
select *,0 as Number
from (select year,from,to from want
except
select year,from,to from temp)
order by year;
quit;
where is the data "t" ? Do i need to run some other codes?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.