DATA Step, Macro, Functions and more

reshape data

Reply
Contributor
Posts: 21

reshape data

Hello:

I have a data set as following: (real data have more years and locations)

 data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

I would like to have a data which tracks the movement between all possible pairs of cities by year. So, the result will be alike below. Note that it doesn't include the city pair of itself, but if i want to include pairs like a - a, b - c, c - c. Is that easy to modify code?

 data want;
input from $ to $ year number;
datalines;
a b 2001 0
a c 2001 0
b a 2001 1
b c 2001 0
c a 2001 1
c b 2001 0
a b 2002 1
a c 2002 0
b a 2002 0
b c 2002 0
c a 2002 0
c b 2002 0
a b 2003 0
a c 2003 0
b a 2003 0
b c 2003 1
c a 2003 0
c b 2003 0
;

Thanks for any help!

PROC Star
Posts: 1,765

Re: reshape data

Hi @saseles Your requirement seems very similar to this thread

https://communities.sas.com/t5/Base-SAS-Programming/construct-my-own-data-sets-from-two-source-data-...

 

Can you please review and see if you are able to modify to your needs?

Contributor
Posts: 21

Re: reshape data

Posted in reply to novinosrin

Hello novinosrin:

I can tell the similarity. But i am don't understand the data old_2. Seems that i only have old_1. I don't know how to modify the code since there are some commend are really new to me.

Any suggestions? 

Thanks!

PROC Star
Posts: 1,765

Re: reshape data

[ Edited ]

@saseles Oh no worries, I will work on the code for you shortly. But did you look at the other accepted solution that doesn't use hashes, which I believe is easier to follow step by step. If not, try to read  through it. 

 

And mine does require some cleaning anyways. lol

Contributor
Posts: 21

Re: reshape data

Posted in reply to novinosrin

I tried that one as well. It seems that my problem is easier than that one since I don't need to merge those two data sets. I just got problem to modify the code Man Indifferent

Contributor
Posts: 21

Re: reshape data

Posted in reply to novinosrin

Here is what i tried. But not sure it is right or not. Can you please verify it?

Thanks a lot!

 data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

data _null_;
if _n_=1 then do;
if 0 then set have;
length o d $50;
  dcl hash H1 (multidata:'y',ordered:'y') ;
   h1.definekey  ('year','o','d') ;
   h1.definedata ('year',"o","d") ;
   h1.definedone () ;
   call missing(o,d);
end;
set have end=last;
array t(*) pc:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
o=t(_n_-1);
d=t(_n_);
rc=h1.add();
end;
if last then h1.output(dataset:'temp2');
run;

data want;
if _n_=1 then do;
if 0 then set temp3;
  dcl hash H1 (dataset:'temp3',multidata:'y') ;
   h1.definekey  ('year','o','d') ;
   h1.definedata ('flows') ;
   h1.definedone () ;
end;
if 0 then set old_1;
array t(*) pc:;
set sort_2(rename=(region=o));
do _n_= 2 to dim(t);
year=vname(t(_n_));
 do n=1 to nobs;
 set sort_2(keep=region rename=(region=d)) nobs=nobs point=n;
flows=0;
rc= h1.find();
 output;
end;
end;
keep year o d flows;
run;
PROC Star
Posts: 1,765

Re: reshape data

[ Edited ]

@saseles  Here you go. Have fun

 

Edit:  Adding one more step to get the order right

 

 

data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

data temp;
set have end=last;
array t(*) city:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
from=t(_n_-1);
to=t(_n_);
output;
end;
drop city:;
run;
proc transpose data=have out=t(keep=col1);
by city: notsorted;
var city:;
run;

proc sort data=t nodupkey;
by col1;
run;
data want;
if _n_=1 then do;
 dcl hash H1 (dataset:'temp',multidata:'y') ;
   h1.definekey  ('year','from','to') ;
   h1.definedone () ;
end;
if 0 then set have;
array t(*) city:;
set t(rename=(col1=from));
do _n_= 2 to dim(t);
year=vname(t(_n_));
do n=1 to nobs ;
set t(rename=(col1=to)) nobs=nobs point=n;
if to ne from then do;
if h1.check()=0 then number=1;
else number=0;
output;
end;
end;
end;
keep from year to number;
run;

proc sort data=want out=final_want;
by year;
run;

 

Contributor
Posts: 21

Re: reshape data

Posted in reply to novinosrin

Thanks a lot! i am almost there!

The problem is that I have large data that cannot use the SAS work library. So, my data is save in my own defined library, say "mydata.have". I modified your code to direct all temp data to my library. But i don't know how to handle the third line in "data want" statement. i.e. "dcl hash H1 (dataset:'temp',multidata:'y') ;" . How to modify the one " ...dataset:'temp' ..." ?

I hope this is the final step.

Thanks again!

PROC Star
Posts: 1,765

Re: reshape data

Do you mean to reference 

"dcl hash H1 (dataset:'temp',multidata:'y') ;

 

as

 

"dcl hash H1 (dataset:'mydata.temp',multidata:'y') ;

 

If everything points to perm library just use the two level ref name like you did for others

Contributor
Posts: 21

Re: reshape data

Posted in reply to novinosrin

I tried using ".... 'mydata.temp' ....", but I got error that "Hash object added XXXXXX items when memory failure occurred" "Insufficient memory to execute data step ...."

 

Is something that related to my computer, not the code? I notice that the temp data has really large size. It is much larger then my original data. 

PROC Star
Posts: 1,765

Re: reshape data

Oh gosh, memory failure occurred" "Insufficient memory to execute data step ...."

 

hash doesn't do well if you have Insufficient memory mate.

 

I may have to change the code for you. Can you confirm  this before I modify again?

Contributor
Posts: 21

Re: reshape data

Posted in reply to novinosrin

I tried again. Same thing Man Indifferent

PROC Star
Posts: 1,765

Re: reshape data

Ok no worries. Brb in 10 mins

PROC Star
Posts: 1,765

Re: reshape data

[ Edited ]

Try this:

 

 

data have;
input city2000 $ city2001 $ city2002 $ city2003 $;
datalines;
a a b c
a a a a
b a a a
c c c c
c a a a
;

data temp;
set have end=last;
array t(*) city:;
do _n_= 2 to dim(t);
year=vname(t(_n_));
from=t(_n_-1);
to=t(_n_);
output;
end;
drop city:;
run;
proc transpose data=have out=t(keep=col1);
by city: notsorted;
var city:;
run;

proc sort data=t nodupkey;
by col1;
run;

data want;
if 0 then set have;
array t(*) city:;
set t(rename=(col1=from));
do _n_= 2 to dim(t);
year=vname(t(_n_));
do n=1 to nobs ;
set t(rename=(col1=to)) nobs=nobs point=n;
if to ne from then output;
end;
end;
keep from year to ;
run;

proc sql;
create table final_want as
select *, 1 as Number from
 (select year,from,to from want
   intersect
   select year,from,to  from temp)
     union
select *,0 as Number
  from (select year,from,to  from want
         except
      select year,from,to  from temp)
order by year;
quit;
Contributor
Posts: 21

Re: reshape data

Posted in reply to novinosrin

where is the data "t" ? Do i need to run some other codes?

Ask a Question
Discussion stats
  • 35 replies
  • 189 views
  • 0 likes
  • 3 in conversation