Solved: Transpose long to wide with Do Loop

camper · Posted 07-15-2020 12:23 PM

Hello!

I've spent a couple of days researching forums and reading papers and I'm stumped. I don't know how to use arrays, macros, or do loops - and i think that's what's needed for this task. This is more complicated than I can do.

This is fish habitat data. The ORIGIN is the stream where the fish lives and the DESTINATION is another suitable stream. DISTANCE is the stream length between them, and the LEN and VOL are additional attributes. The data must be sorted by ORIGIN, DISTANCE because I want the transformed columns to be in the order of the closest destination habitats.

My dataset as >6000 obs, and some ORIGINs have 25 DESTINATIONS, so there will be about 100 columns in the resulting table. It's what the boss wants - and I'm bummed that I can't figure it out myself.

Here's the data:

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

Here's the result I want:

As I move the data into the columns, I have abbreviated their variable names. I highlighted the DISTANCEs so you can see how they dictate the ordering of the columns.

Origin	Dest1	Dist1	Len1	Vol1	Dest2	Dist2	Len2	Vol2	Dest3	Dist3	Len3	Vol3
1001	999	11891	26.44	193.00
1005	1016	2422	7.03	8.42	944	19437	16.84	74.17
1010	1012	42986	22.10	262.95	988	48093	9.10	26.36
1012	1010	42986	10.66	472.81	988	44204	9.10	26.36	999	47478	26.44	193.00
1016	1005	2422	30.73	62.22

Thanks a lot! This has been fun, but a dead end. Hope someone out there has fun with it! I'm using SAS 9.4.

Jagadishkatam · Posted 07-15-2020 12:50 PM

Alternatively you may try the arrays as below

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

data want;
set class;
by origin;
retain count dest1 dest2 dest3 Dist1 Dist2 Dist3 Len1 Len2 Len3 vol1 vol2 vol3;
array dest(3) dest1 dest2 dest3;
array Dist(3) Dist1 Dist2 Dist3 ;
array Lens(3) Len1 Len2 Len3;
array Vols(3) vol1 vol2 vol3;
if first.origin then count=1;
else count+1;
do i = 1 to 3;
if first.origin then do;
dest(i)=.;
Dist(i)=.;
Lens(i)=.;
Vols(i)=.;
end;
end;
dest(count)=Destination;
Dist(count)=Distance;
Lens(count)=len;
Vols(count)=vol;
if last.origin;
run;

Thanks,
Jag

View solution in original post

Reeza · Posted 07-15-2020 12:28 PM

Your problem is the second example here exactly, expand it to 4 variables though.

Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

proc transpose data=class out=wide1;
by Origin;
var Destination;
run;

proc transpose data=class out=wide2;
by origin;
var distance;
run;

proc transpose data=class out=wide3;
by origin;
var length;
run;

proc transpose data=class out=wide4;
by origin;
var volume;
run;

data want;
merge wide1 wide2 wide3 wide4;
by Origin;
run;

Then once you have it in this format you can worry about re-ordering variables and then sorting as needed.

Does this get you closer?

camper · Posted 07-15-2020 01:45 PM

This worked great - Thank you! Thanks for the amazingly FAST reply! and thank you for including the link to the paper.

Jagadishkatam · Posted 07-15-2020 12:50 PM

Alternatively you may try the arrays as below

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

data want;
set class;
by origin;
retain count dest1 dest2 dest3 Dist1 Dist2 Dist3 Len1 Len2 Len3 vol1 vol2 vol3;
array dest(3) dest1 dest2 dest3;
array Dist(3) Dist1 Dist2 Dist3 ;
array Lens(3) Len1 Len2 Len3;
array Vols(3) vol1 vol2 vol3;
if first.origin then count=1;
else count+1;
do i = 1 to 3;
if first.origin then do;
dest(i)=.;
Dist(i)=.;
Lens(i)=.;
Vols(i)=.;
end;
end;
dest(count)=Destination;
Dist(count)=Distance;
Lens(count)=len;
Vols(count)=vol;
if last.origin;
run;

Thanks,
Jag

camper · Posted 07-15-2020 01:51 PM

WOW. This is AMAZING! I am so grateful. You make it look so easy - and I could never have figured this out. Thank you so very much!!

ballardw · Posted 07-15-2020 12:54 PM

A data step solution while possible involves some more work. The first thing is to find out the maximum number of records with the same Origin. Then use that value to create the array dimension that are needed.

Then retain all of those.

Then using by processing you can reset the retained values when the first origin is encountered and then write output only on the last origin for each.

Question: How is this data set to be used? The desired output is a moderately awkward structure to do many things with.

If the sole purpose is to read things by a person it may be that a report would be better.

Consider:

proc report data=work.class;
   columns origin destination,(distance len vol);
   define origin /group;
   define destination/group across;
   define distance/ sum;
   define len/ sum;
   define vol/ sum;
run;

Which makes "columns" across but aligns the destinations so you can compare with different origins more easily. By have the values of distance, len and vol grouped under the destination it is much easier to see which destination is involved or to find the values involving any specific destination instead of the "scattered" inside the columns of the data set that might be created.

camper · Posted 07-15-2020 01:54 PM

Thank you for taking the time to send your ideas! I don't know how he's going to use this crazy table, and I agree, it is very awkward!

art297 · Posted 07-15-2020 01:00 PM

I'd recommend the %transpose macro to get what you want. e.g.:

proc sort data=class (rename=(Destination=Dest
                              Distance=Dist))
          out=need;
  by Origin Dist;
run;

filename tr url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include tr ;

%transpose(data=need, out=want, by=Origin, Guessingrows=1000,
  var=Dest Dist Len Vol)

Art

camper · Posted 07-15-2020 01:59 PM

Thank you Art! I appreciate your time!

Ksharp · Posted 07-16-2020 09:04 AM

Merge Skill:

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

data temp;
 set class;
 by origin;
 if first.origin then n=0;
 n+1;
run;
proc sql noprint nowarn;
select distinct catt('temp(where=(n=',n,') rename=(
Destination=Destination',n,' Distance=Distance',n,' Len=Len',n,' Vol=Vol',n,'))') 
into : merge separated by ' '
from temp;
quit;
data want;
 merge &merge;
 by Origin;
 drop n;
run;

Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Re: Transpose long to wide with Do Loop

Registration is open