Hello!
I've spent a couple of days researching forums and reading papers and I'm stumped. I don't know how to use arrays, macros, or do loops - and i think that's what's needed for this task. This is more complicated than I can do.
This is fish habitat data. The ORIGIN is the stream where the fish lives and the DESTINATION is another suitable stream. DISTANCE is the stream length between them, and the LEN and VOL are additional attributes. The data must be sorted by ORIGIN, DISTANCE because I want the transformed columns to be in the order of the closest destination habitats.
My dataset as >6000 obs, and some ORIGINs have 25 DESTINATIONS, so there will be about 100 columns in the resulting table. It's what the boss wants - and I'm bummed that I can't figure it out myself.
Here's the data:
data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;
Here's the result I want:
As I move the data into the columns, I have abbreviated their variable names. I highlighted the DISTANCEs so you can see how they dictate the ordering of the columns.
Origin | Dest1 | Dist1 | Len1 | Vol1 | Dest2 | Dist2 | Len2 | Vol2 | Dest3 | Dist3 | Len3 | Vol3 |
1001 | 999 | 11891 | 26.44 | 193.00 | ||||||||
1005 | 1016 | 2422 | 7.03 | 8.42 | 944 | 19437 | 16.84 | 74.17 | ||||
1010 | 1012 | 42986 | 22.10 | 262.95 | 988 | 48093 | 9.10 | 26.36 | ||||
1012 | 1010 | 42986 | 10.66 | 472.81 | 988 | 44204 | 9.10 | 26.36 | 999 | 47478 | 26.44 | 193.00 |
1016 | 1005 | 2422 | 30.73 | 62.22 |
Thanks a lot! This has been fun, but a dead end. Hope someone out there has fun with it! I'm using SAS 9.4.
Alternatively you may try the arrays as below
data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;
data want;
set class;
by origin;
retain count dest1 dest2 dest3 Dist1 Dist2 Dist3 Len1 Len2 Len3 vol1 vol2 vol3;
array dest(3) dest1 dest2 dest3;
array Dist(3) Dist1 Dist2 Dist3 ;
array Lens(3) Len1 Len2 Len3;
array Vols(3) vol1 vol2 vol3;
if first.origin then count=1;
else count+1;
do i = 1 to 3;
if first.origin then do;
dest(i)=.;
Dist(i)=.;
Lens(i)=.;
Vols(i)=.;
end;
end;
dest(count)=Destination;
Dist(count)=Distance;
Lens(count)=len;
Vols(count)=vol;
if last.origin;
run;
Your problem is the second example here exactly, expand it to 4 variables though.
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
proc transpose data=class out=wide1;
by Origin;
var Destination;
run;
proc transpose data=class out=wide2;
by origin;
var distance;
run;
proc transpose data=class out=wide3;
by origin;
var length;
run;
proc transpose data=class out=wide4;
by origin;
var volume;
run;
data want;
merge wide1 wide2 wide3 wide4;
by Origin;
run;
Then once you have it in this format you can worry about re-ordering variables and then sorting as needed.
Does this get you closer?
Alternatively you may try the arrays as below
data WORK.CLASS(label='Habitat Data');
infile datalines;
input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.;
datalines;
1001 999 11891 26.44 193.00
1005 1016 2422 7.03 8.42
1005 944 19437 16.84 74.17
1010 1012 42986 22.10 262.95
1010 988 48093 9.10 26.36
1012 1010 42986 10.66 472.81
1012 988 44204 9.10 26.36
1012 999 47478 26.44 193.00
1016 1005 2422 30.73 62.22
;;;;
data want;
set class;
by origin;
retain count dest1 dest2 dest3 Dist1 Dist2 Dist3 Len1 Len2 Len3 vol1 vol2 vol3;
array dest(3) dest1 dest2 dest3;
array Dist(3) Dist1 Dist2 Dist3 ;
array Lens(3) Len1 Len2 Len3;
array Vols(3) vol1 vol2 vol3;
if first.origin then count=1;
else count+1;
do i = 1 to 3;
if first.origin then do;
dest(i)=.;
Dist(i)=.;
Lens(i)=.;
Vols(i)=.;
end;
end;
dest(count)=Destination;
Dist(count)=Distance;
Lens(count)=len;
Vols(count)=vol;
if last.origin;
run;
A data step solution while possible involves some more work. The first thing is to find out the maximum number of records with the same Origin. Then use that value to create the array dimension that are needed.
Then retain all of those.
Then using by processing you can reset the retained values when the first origin is encountered and then write output only on the last origin for each.
Question: How is this data set to be used? The desired output is a moderately awkward structure to do many things with.
If the sole purpose is to read things by a person it may be that a report would be better.
Consider:
proc report data=work.class; columns origin destination,(distance len vol); define origin /group; define destination/group across; define distance/ sum; define len/ sum; define vol/ sum; run;
Which makes "columns" across but aligns the destinations so you can compare with different origins more easily. By have the values of distance, len and vol grouped under the destination it is much easier to see which destination is involved or to find the values involving any specific destination instead of the "scattered" inside the columns of the data set that might be created.
I'd recommend the %transpose macro to get what you want. e.g.:
proc sort data=class (rename=(Destination=Dest
Distance=Dist))
out=need;
by Origin Dist;
run;
filename tr url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include tr ;
%transpose(data=need, out=want, by=Origin, Guessingrows=1000,
var=Dest Dist Len Vol)
Art
Merge Skill:
data WORK.CLASS(label='Habitat Data'); infile datalines; input Origin:8. Destination:8. Distance:8. Len:8. Vol:8.; datalines; 1001 999 11891 26.44 193.00 1005 1016 2422 7.03 8.42 1005 944 19437 16.84 74.17 1010 1012 42986 22.10 262.95 1010 988 48093 9.10 26.36 1012 1010 42986 10.66 472.81 1012 988 44204 9.10 26.36 1012 999 47478 26.44 193.00 1016 1005 2422 30.73 62.22 ;;;; data temp; set class; by origin; if first.origin then n=0; n+1; run; proc sql noprint nowarn; select distinct catt('temp(where=(n=',n,') rename=( Destination=Destination',n,' Distance=Distance',n,' Len=Len',n,' Vol=Vol',n,'))') into : merge separated by ' ' from temp; quit; data want; merge &merge; by Origin; drop n; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.