Solved: Re: How to transpose and rename variables

elbarto · Posted 08-08-2021 09:04 PM

I have a dataset (excerpt shown below):

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667


;
RUN;

I want to transpose the dataset to be as follows:

DATA want;
input id	year	Latitude1	Longitude1	Latitude2	Longitude2	Latitude3	Longitude3	Latitude4	Longitude4

;
DATALINES;
1049	1995	34.289001	-85.970065	.	.	.	.	.	.
1049	1999	34.289001	-85.970065	.	.	.	.	.	.
1073	1990	33.386389	-86.816667	.	.	.	.	.	.
1073	1995	33.331111	-87.003611	33.386389	-86.816667	33.578333	-86.773889	.	.
1073	1996	33.331111	-87.003611	33.386389	-86.816667	33.704722	-86.669167	33.578333	-86.773889
1073	1998	33.485556	-86.915	33.386389	-86.816667	.	.	.	.
1073	1999	33.331111	-87.003611	33.386389	-86.816667	.	.	.	.

;
RUN;

What I've tried is using the following code:

proc transpose data=have out=want;

by id year;
var latitude longitude;

run;

However, the result is not quite what I wanted. In particular, the full dataset could have many latitude/longitude pairs (much more than 4 shown in the example), so I would like the naming of the columns to be from latitude1, longitude1, latitude2, longitude2, ..., up until the last pair.

ballardw · Posted 08-08-2021 09:17 PM

What are you going to to do with that wide data set? How will you program anything to us it if you do not know how many pairs will be involved for any given id?

If you describe what you intend to do that may help. Most SAS procedures will be much happier with the long format you currently have.

This is as close as I'm willing to spend any time working on at the moment without knowing how this is to be used.

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667
;
RUN;

data temp;
   set have;
   by id year;
   retain group;
   if first.year then group=1;
   else group+1;
run;

proc transpose data=temp out=trans1 (drop=_name_)
     prefix=Latitude;
   by id year;
   id group;
   var Latitude;
run;
proc transpose data=temp out=trans2 (drop=_name_)
     prefix=Longitude;
   by id year;
   id group;
   var Longitude;
run;

data junk;
   merge trans1 trans2;
   by id year;
run;

View solution in original post

ballardw · Posted 08-08-2021 09:17 PM

What are you going to to do with that wide data set? How will you program anything to us it if you do not know how many pairs will be involved for any given id?

If you describe what you intend to do that may help. Most SAS procedures will be much happier with the long format you currently have.

This is as close as I'm willing to spend any time working on at the moment without knowing how this is to be used.

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667
;
RUN;

data temp;
   set have;
   by id year;
   retain group;
   if first.year then group=1;
   else group+1;
run;

proc transpose data=temp out=trans1 (drop=_name_)
     prefix=Latitude;
   by id year;
   id group;
   var Latitude;
run;
proc transpose data=temp out=trans2 (drop=_name_)
     prefix=Longitude;
   by id year;
   id group;
   var Longitude;
run;

data junk;
   merge trans1 trans2;
   by id year;
run;

ChrisNZ · Posted 08-08-2021 09:51 PM

data junk;

Is that a subtle hint? 😁

Agree with you btw, the requested data format is unwieldy.

Except as a final output (i.e. a report), data is -with very few exceptions- better kept in a long -rather than wide- format.

High-Performance SAS Coding - Third Edition

ballardw · Posted 08-09-2021 04:30 PM

@ChrisNZ wrote:
data junk;
Is that a subtle hint? 😁

Agree with you btw, the requested data format is unwieldy.

Except as a final output (i.e. a report), data is -with very few exceptions- better kept in a long -rather than wide- format.

In this case, yes the name of the data set is editorial.

I have dealt with entirely too many "data sets" like that whose origins were the result of a report.

I'm waiting for the follow-up request on using those lat/long pairs to draw something on a map. And then show how the existing data allows for polygons pretty cleanly.

I often create a data set named junk for specific hopefully temporary cleaning steps. Like when a proc throws an unexpected value or two I'll get the associated records with a Where to create the Junk set. Then determine the "county" value I didn't expect in the output is because someone can't spell or transposed city and county or transposed digits in the Zip code or some other nonsense.

If I ever were to need disk space then Junk (and Junk2 and Junk3 if present) are always allowed to be deleted.

Tom · Posted 08-08-2021 09:57 PM

So when you have multiple observations for a BY group and multiple variables then PROC TRANSPOSE will generate multiple variables and multiple observations. Essentially it is transposing an NxM matrix to a MxN matrix.

You could make it work with just PROC TRANPOSE by transposing three times. Print the intermediate datasets to see what it happening.

proc transpose data=have out=step1 name=variable prefix=_;
  by id year;
  var latitude longitude ;
run;

proc transpose date=step1 out=step2 prefix=_;
  by id year variable;
run;

proc transpose data=step2 out=want (drop=_name_);
  by id year ;
  id variable _name_;
  var _1;
run;

But it is probably easier (and definitely faster if the file is large) to just use arrays to transpose the data for you.

data want;
  do col=1 by 1 until (last.year);
    set have (rename=(latitude=x longitude=y));
    by id year;
    array latitude [4];
    array longitude [4];
    latitude[col]=x;
    longitude[col]=y;
  end;
  drop col x y ;
run;

If you don't know what size to use for the two array's then use an extra pass through the data to find maximum repetition into a macro variable and use the macro variable when defining the array.

proc sql noprint;
select max(n) format=32. into :n trimmed
from (select id,year,count(*) as n from have group by id,year)
;
quit;

data want;
  do col=1 by 1 until (last.year);
    set have (rename=(latitude=x longitude=y));
    by id year;
    array latitude [&n];
    array longitude [&n];
    latitude[col]=x;
    longitude[col]=y;
  end;
  drop col x y ;
run;

elbarto · Posted 08-08-2021 11:33 PM

Thanks Tom! This is a great solution too.

Ksharp · Posted 08-09-2021 09:28 AM

The simplest way is using PROC SUMMARY.

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667
;
RUN;

proc sql noprint;
select max(n) into : n
 from (select count(*) as n from have group by id,year);
quit;
proc summary data=have;
by id year;
output out=want idgroup(out[&n] (Latitude	Longitude)=);
run;

Classroom Training Available!