BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
camper
Fluorite | Level 6

Hello!

      I've spent a couple of days researching forums and reading papers and I'm stumped.  I don't know how to use arrays, macros, or do loops - and i think that's what's needed for this task.  This is more complicated than I can do.

      This is fish habitat data.  The ORIGIN is the stream where the fish lives and the DESTINATION is another suitable stream. DISTANCE is the stream length between them, and the LEN and VOL are additional attributes.  The data must be sorted by ORIGIN, DISTANCE because I want the transformed columns to be in the order of the closest destination habitats. 

      My dataset as >6000 obs, and some ORIGINs have 25 DESTINATIONS, so there will be about 100 columns in the resulting table.  It's what the boss wants - and I'm bummed that I can't figure it out myself.

Here's the data:

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

Here's the result I want:

As I move the data into the columns, I have abbreviated their variable names.  I highlighted the DISTANCEs so you can see how they dictate the ordering of the columns.

OriginDest1Dist1Len1Vol1Dest2Dist2Len2Vol2Dest3Dist3Len3Vol3
10019991189126.44193.00        
1005101624227.038.429441943716.8474.17    
101010124298622.10262.95988480939.1026.36    
101210104298610.66472.81988442049.1026.369994747826.44193.00
10161005242230.7362.22        

 

Thanks a lot!  This has been fun, but a dead end.  Hope someone out there has fun with it!  I'm using SAS 9.4.

1 ACCEPTED SOLUTION

Accepted Solutions
Jagadishkatam
Amethyst | Level 16

Alternatively you may try the arrays as below

 

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

data want;
set class;
by origin;
retain count dest1 dest2 dest3 Dist1 Dist2 Dist3 Len1 Len2 Len3 vol1 vol2 vol3;
array dest(3) dest1 dest2 dest3;
array Dist(3) Dist1 Dist2 Dist3 ;
array Lens(3) Len1 Len2 Len3;
array Vols(3) vol1 vol2 vol3;
if first.origin then count=1;
else count+1;
do i = 1 to 3;
if first.origin then do;
dest(i)=.;
Dist(i)=.;
Lens(i)=.;
Vols(i)=.;
end;
end;
dest(count)=Destination;
Dist(count)=Distance;
Lens(count)=len;
Vols(count)=vol;
if last.origin;
run;
Thanks,
Jag

View solution in original post

9 REPLIES 9
Reeza
Super User

Your problem is the second example here exactly, expand it to 4 variables though.


Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

proc transpose data=class out=wide1;
by Origin;
var Destination;
run;

proc transpose data=class out=wide2;
by origin;
var distance;
run;

proc transpose data=class out=wide3;
by origin;
var length;
run;

proc transpose data=class out=wide4;
by origin;
var volume;
run;

data want;
merge wide1 wide2 wide3 wide4;
by Origin;
run;


Then once you have it in this format you can worry about re-ordering variables and then sorting as needed.

Does this get you closer?

 

 

camper
Fluorite | Level 6
This worked great - Thank you! Thanks for the amazingly FAST reply! and thank you for including the link to the paper.
Jagadishkatam
Amethyst | Level 16

Alternatively you may try the arrays as below

 

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

data want;
set class;
by origin;
retain count dest1 dest2 dest3 Dist1 Dist2 Dist3 Len1 Len2 Len3 vol1 vol2 vol3;
array dest(3) dest1 dest2 dest3;
array Dist(3) Dist1 Dist2 Dist3 ;
array Lens(3) Len1 Len2 Len3;
array Vols(3) vol1 vol2 vol3;
if first.origin then count=1;
else count+1;
do i = 1 to 3;
if first.origin then do;
dest(i)=.;
Dist(i)=.;
Lens(i)=.;
Vols(i)=.;
end;
end;
dest(count)=Destination;
Dist(count)=Distance;
Lens(count)=len;
Vols(count)=vol;
if last.origin;
run;
Thanks,
Jag
camper
Fluorite | Level 6
WOW. This is AMAZING! I am so grateful. You make it look so easy - and I could never have figured this out. Thank you so very much!!
ballardw
Super User

A data step solution while possible involves some more work. The first thing is to find out the maximum number of records with the same Origin. Then use that value to create the array dimension that are needed.

Then retain all of those.

Then using by processing you can reset the retained values when the first origin is encountered and then write output only on the last origin for each.

 

Question: How is this data set to be used? The desired output is a moderately awkward structure to do many things with.

If the sole purpose is to read things by a person it may be that a report would be better.

 

Consider:

proc report data=work.class;
   columns origin destination,(distance len vol);
   define origin /group;
   define destination/group across;
   define distance/ sum;
   define len/ sum;
   define vol/ sum;
run;

Which makes "columns" across but aligns the destinations so you can compare with different origins more easily. By have the values of distance, len and vol grouped under the destination it is much easier to see which destination is involved or to find the values involving any specific destination instead of the "scattered" inside the columns of the data set that might be created.

camper
Fluorite | Level 6
Thank you for taking the time to send your ideas! I don't know how he's going to use this crazy table, and I agree, it is very awkward!
art297
Opal | Level 21

I'd recommend the %transpose macro to get what you want. e.g.:

proc sort data=class (rename=(Destination=Dest
                              Distance=Dist))
          out=need;
  by Origin Dist;
run;

filename tr url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include tr ;

%transpose(data=need, out=want, by=Origin, Guessingrows=1000,
  var=Dest Dist Len Vol)

Art

 

camper
Fluorite | Level 6
Thank you Art! I appreciate your time!
Ksharp
Super User

Merge Skill:

 

data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;

data temp;
 set class;
 by origin;
 if first.origin then n=0;
 n+1;
run;
proc sql noprint nowarn;
select distinct catt('temp(where=(n=',n,') rename=(
Destination=Destination',n,' Distance=Distance',n,' Len=Len',n,' Vol=Vol',n,'))') 
into : merge separated by ' '
from temp;
quit;
data want;
 merge &merge;
 by Origin;
 drop n;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 5518 views
  • 7 likes
  • 6 in conversation