I have a dataset (excerpt shown below):
DATA have;
input id year Latitude Longitude
;
DATALINES;
1049 1995 34.289001 -85.970065
1049 1999 34.289001 -85.970065
1073 1990 33.386389 -86.816667
1073 1995 33.331111 -87.003611
1073 1995 33.386389 -86.816667
1073 1995 33.578333 -86.773889
1073 1996 33.331111 -87.003611
1073 1996 33.386389 -86.816667
1073 1996 33.704722 -86.669167
1073 1996 33.578333 -86.773889
1073 1998 33.485556 -86.915
1073 1998 33.386389 -86.816667
1073 1999 33.331111 -87.003611
1073 1999 33.386389 -86.816667
;
RUN;
I want to transpose the dataset to be as follows:
DATA want;
input id year Latitude1 Longitude1 Latitude2 Longitude2 Latitude3 Longitude3 Latitude4 Longitude4
;
DATALINES;
1049 1995 34.289001 -85.970065 . . . . . .
1049 1999 34.289001 -85.970065 . . . . . .
1073 1990 33.386389 -86.816667 . . . . . .
1073 1995 33.331111 -87.003611 33.386389 -86.816667 33.578333 -86.773889 . .
1073 1996 33.331111 -87.003611 33.386389 -86.816667 33.704722 -86.669167 33.578333 -86.773889
1073 1998 33.485556 -86.915 33.386389 -86.816667 . . . .
1073 1999 33.331111 -87.003611 33.386389 -86.816667 . . . .
;
RUN;
What I've tried is using the following code:
proc transpose data=have out=want;
by id year;
var latitude longitude;
run;
However, the result is not quite what I wanted. In particular, the full dataset could have many latitude/longitude pairs (much more than 4 shown in the example), so I would like the naming of the columns to be from latitude1, longitude1, latitude2, longitude2, ..., up until the last pair.
What are you going to to do with that wide data set? How will you program anything to us it if you do not know how many pairs will be involved for any given id?
If you describe what you intend to do that may help. Most SAS procedures will be much happier with the long format you currently have.
This is as close as I'm willing to spend any time working on at the moment without knowing how this is to be used.
DATA have; input id year Latitude Longitude ; DATALINES; 1049 1995 34.289001 -85.970065 1049 1999 34.289001 -85.970065 1073 1990 33.386389 -86.816667 1073 1995 33.331111 -87.003611 1073 1995 33.386389 -86.816667 1073 1995 33.578333 -86.773889 1073 1996 33.331111 -87.003611 1073 1996 33.386389 -86.816667 1073 1996 33.704722 -86.669167 1073 1996 33.578333 -86.773889 1073 1998 33.485556 -86.915 1073 1998 33.386389 -86.816667 1073 1999 33.331111 -87.003611 1073 1999 33.386389 -86.816667 ; RUN; data temp; set have; by id year; retain group; if first.year then group=1; else group+1; run; proc transpose data=temp out=trans1 (drop=_name_) prefix=Latitude; by id year; id group; var Latitude; run; proc transpose data=temp out=trans2 (drop=_name_) prefix=Longitude; by id year; id group; var Longitude; run; data junk; merge trans1 trans2; by id year; run;
What are you going to to do with that wide data set? How will you program anything to us it if you do not know how many pairs will be involved for any given id?
If you describe what you intend to do that may help. Most SAS procedures will be much happier with the long format you currently have.
This is as close as I'm willing to spend any time working on at the moment without knowing how this is to be used.
DATA have; input id year Latitude Longitude ; DATALINES; 1049 1995 34.289001 -85.970065 1049 1999 34.289001 -85.970065 1073 1990 33.386389 -86.816667 1073 1995 33.331111 -87.003611 1073 1995 33.386389 -86.816667 1073 1995 33.578333 -86.773889 1073 1996 33.331111 -87.003611 1073 1996 33.386389 -86.816667 1073 1996 33.704722 -86.669167 1073 1996 33.578333 -86.773889 1073 1998 33.485556 -86.915 1073 1998 33.386389 -86.816667 1073 1999 33.331111 -87.003611 1073 1999 33.386389 -86.816667 ; RUN; data temp; set have; by id year; retain group; if first.year then group=1; else group+1; run; proc transpose data=temp out=trans1 (drop=_name_) prefix=Latitude; by id year; id group; var Latitude; run; proc transpose data=temp out=trans2 (drop=_name_) prefix=Longitude; by id year; id group; var Longitude; run; data junk; merge trans1 trans2; by id year; run;
data junk;
Is that a subtle hint? 😁
Agree with you btw, the requested data format is unwieldy.
Except as a final output (i.e. a report), data is -with very few exceptions- better kept in a long -rather than wide- format.
@ChrisNZ wrote:
data junk;Is that a subtle hint? 😁
Agree with you btw, the requested data format is unwieldy.
Except as a final output (i.e. a report), data is -with very few exceptions- better kept in a long -rather than wide- format.
In this case, yes the name of the data set is editorial.
I have dealt with entirely too many "data sets" like that whose origins were the result of a report.
I'm waiting for the follow-up request on using those lat/long pairs to draw something on a map. And then show how the existing data allows for polygons pretty cleanly.
I often create a data set named junk for specific hopefully temporary cleaning steps. Like when a proc throws an unexpected value or two I'll get the associated records with a Where to create the Junk set. Then determine the "county" value I didn't expect in the output is because someone can't spell or transposed city and county or transposed digits in the Zip code or some other nonsense.
If I ever were to need disk space then Junk (and Junk2 and Junk3 if present) are always allowed to be deleted.
So when you have multiple observations for a BY group and multiple variables then PROC TRANSPOSE will generate multiple variables and multiple observations. Essentially it is transposing an NxM matrix to a MxN matrix.
You could make it work with just PROC TRANPOSE by transposing three times. Print the intermediate datasets to see what it happening.
proc transpose data=have out=step1 name=variable prefix=_;
by id year;
var latitude longitude ;
run;
proc transpose date=step1 out=step2 prefix=_;
by id year variable;
run;
proc transpose data=step2 out=want (drop=_name_);
by id year ;
id variable _name_;
var _1;
run;
But it is probably easier (and definitely faster if the file is large) to just use arrays to transpose the data for you.
data want;
do col=1 by 1 until (last.year);
set have (rename=(latitude=x longitude=y));
by id year;
array latitude [4];
array longitude [4];
latitude[col]=x;
longitude[col]=y;
end;
drop col x y ;
run;
If you don't know what size to use for the two array's then use an extra pass through the data to find maximum repetition into a macro variable and use the macro variable when defining the array.
proc sql noprint;
select max(n) format=32. into :n trimmed
from (select id,year,count(*) as n from have group by id,year)
;
quit;
data want;
do col=1 by 1 until (last.year);
set have (rename=(latitude=x longitude=y));
by id year;
array latitude [&n];
array longitude [&n];
latitude[col]=x;
longitude[col]=y;
end;
drop col x y ;
run;
The simplest way is using PROC SUMMARY.
DATA have;
input id year Latitude Longitude
;
DATALINES;
1049 1995 34.289001 -85.970065
1049 1999 34.289001 -85.970065
1073 1990 33.386389 -86.816667
1073 1995 33.331111 -87.003611
1073 1995 33.386389 -86.816667
1073 1995 33.578333 -86.773889
1073 1996 33.331111 -87.003611
1073 1996 33.386389 -86.816667
1073 1996 33.704722 -86.669167
1073 1996 33.578333 -86.773889
1073 1998 33.485556 -86.915
1073 1998 33.386389 -86.816667
1073 1999 33.331111 -87.003611
1073 1999 33.386389 -86.816667
;
RUN;
proc sql noprint;
select max(n) into : n
from (select count(*) as n from have group by id,year);
quit;
proc summary data=have;
by id year;
output out=want idgroup(out[&n] (Latitude Longitude)=);
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.