BookmarkSubscribeRSS Feed
saseles
Calcite | Level 5

Hello:

I have a data set as following: (real data have more years and locations)

 data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

I would like to have a data which tracks the movement between all possible pairs of cities by year. So, the result will be alike below. Note that it doesn't include the city pair of itself, but if i want to include pairs like a - a, b - c, c - c. Is that easy to modify code?

 data want;
input from $ to $ year number;
datalines;
a b 2001 0
a c 2001 0
b a 2001 1
b c 2001 0
c a 2001 1
c b 2001 0
a b 2002 1
a c 2002 0
b a 2002 0
b c 2002 0
c a 2002 0
c b 2002 0
a b 2003 0
a c 2003 0
b a 2003 0
b c 2003 1
c a 2003 0
c b 2003 0
;

Thanks for any help!

35 REPLIES 35
novinosrin
Tourmaline | Level 20

Hi @saseles Your requirement seems very similar to this thread

https://communities.sas.com/t5/Base-SAS-Programming/construct-my-own-data-sets-from-two-source-data-...

 

Can you please review and see if you are able to modify to your needs?

saseles
Calcite | Level 5

Hello novinosrin:

I can tell the similarity. But i am don't understand the data old_2. Seems that i only have old_1. I don't know how to modify the code since there are some commend are really new to me.

Any suggestions? 

Thanks!

novinosrin
Tourmaline | Level 20

@saseles Oh no worries, I will work on the code for you shortly. But did you look at the other accepted solution that doesn't use hashes, which I believe is easier to follow step by step. If not, try to read  through it. 

 

And mine does require some cleaning anyways. lol

saseles
Calcite | Level 5

I tried that one as well. It seems that my problem is easier than that one since I don't need to merge those two data sets. I just got problem to modify the code Man Indifferent

saseles
Calcite | Level 5

Here is what i tried. But not sure it is right or not. Can you please verify it?

Thanks a lot!

 data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

data _null_;
if _n_=1 then do;
if 0 then set have;
length o d $50;
  dcl hash H1 (multidata:'y',ordered:'y') ;
   h1.definekey  ('year','o','d') ;
   h1.definedata ('year',"o","d") ;
   h1.definedone () ;
   call missing(o,d);
end;
set have end=last;
array t(*) pc:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
o=t(_n_-1);
d=t(_n_);
rc=h1.add();
end;
if last then h1.output(dataset:'temp2');
run;

data want;
if _n_=1 then do;
if 0 then set temp3;
  dcl hash H1 (dataset:'temp3',multidata:'y') ;
   h1.definekey  ('year','o','d') ;
   h1.definedata ('flows') ;
   h1.definedone () ;
end;
if 0 then set old_1;
array t(*) pc:;
set sort_2(rename=(region=o));
do _n_= 2 to dim(t);
year=vname(t(_n_));
 do n=1 to nobs;
 set sort_2(keep=region rename=(region=d)) nobs=nobs point=n;
flows=0;
rc= h1.find();
 output;
end;
end;
keep year o d flows;
run;
novinosrin
Tourmaline | Level 20

@saseles  Here you go. Have fun

 

Edit:  Adding one more step to get the order right

 

 

data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

data temp;
set have end=last;
array t(*) city:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
from=t(_n_-1);
to=t(_n_);
output;
end;
drop city:;
run;
proc transpose data=have out=t(keep=col1);
by city: notsorted;
var city:;
run;

proc sort data=t nodupkey;
by col1;
run;
data want;
if _n_=1 then do;
 dcl hash H1 (dataset:'temp',multidata:'y') ;
   h1.definekey  ('year','from','to') ;
   h1.definedone () ;
end;
if 0 then set have;
array t(*) city:;
set t(rename=(col1=from));
do _n_= 2 to dim(t);
year=vname(t(_n_));
do n=1 to nobs ;
set t(rename=(col1=to)) nobs=nobs point=n;
if to ne from then do;
if h1.check()=0 then number=1;
else number=0;
output;
end;
end;
end;
keep from year to number;
run;

proc sort data=want out=final_want;
by year;
run;

 

saseles
Calcite | Level 5

Thanks a lot! i am almost there!

The problem is that I have large data that cannot use the SAS work library. So, my data is save in my own defined library, say "mydata.have". I modified your code to direct all temp data to my library. But i don't know how to handle the third line in "data want" statement. i.e. "dcl hash H1 (dataset:'temp',multidata:'y') ;" . How to modify the one " ...dataset:'temp' ..." ?

I hope this is the final step.

Thanks again!

novinosrin
Tourmaline | Level 20

Do you mean to reference 

"dcl hash H1 (dataset:'temp',multidata:'y') ;

 

as

 

"dcl hash H1 (dataset:'mydata.temp',multidata:'y') ;

 

If everything points to perm library just use the two level ref name like you did for others

saseles
Calcite | Level 5

I tried using ".... 'mydata.temp' ....", but I got error that "Hash object added XXXXXX items when memory failure occurred" "Insufficient memory to execute data step ...."

 

Is something that related to my computer, not the code? I notice that the temp data has really large size. It is much larger then my original data. 

novinosrin
Tourmaline | Level 20

Oh gosh, memory failure occurred" "Insufficient memory to execute data step ...."

 

hash doesn't do well if you have Insufficient memory mate.

 

I may have to change the code for you. Can you confirm  this before I modify again?

saseles
Calcite | Level 5

I tried again. Same thing Man Indifferent

novinosrin
Tourmaline | Level 20

Ok no worries. Brb in 10 mins

novinosrin
Tourmaline | Level 20

Try this:

 

 

data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

data temp;
set have end=last;
array t(*) city:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
from=t(_n_-1);
to=t(_n_);
output;
end;
drop city:;
run;
proc transpose data=have out=t(keep=col1);
by city: notsorted;
var city:;
run;

proc sort data=t nodupkey;
by col1;
run;

data want;
if 0 then set have;
array t(*) city:;
set t(rename=(col1=from));
do _n_= 2 to dim(t);
year=vname(t(_n_));
do n=1 to nobs ;
set t(rename=(col1=to)) nobs=nobs point=n;
if to ne from then output;
end;
end;
keep from year to ;
run;

proc sql;
create table final_want as
select *, 1 as Number from
 (select year,from,to from want
   intersect
   select year,from,to  from temp)
     union
select *,0 as Number
  from (select year,from,to  from want
         except
      select year,from,to  from temp)
order by year;
quit;
saseles
Calcite | Level 5

where is the data "t" ? Do i need to run some other codes?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 35 replies
  • 1091 views
  • 0 likes
  • 3 in conversation