BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
elbarto
Obsidian | Level 7

I have a dataset (excerpt shown below):

 

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667


;
RUN;

I want to transpose the dataset to be as follows:

 

DATA want;
input id	year	Latitude1	Longitude1	Latitude2	Longitude2	Latitude3	Longitude3	Latitude4	Longitude4

;
DATALINES;
1049	1995	34.289001	-85.970065	.	.	.	.	.	.
1049	1999	34.289001	-85.970065	.	.	.	.	.	.
1073	1990	33.386389	-86.816667	.	.	.	.	.	.
1073	1995	33.331111	-87.003611	33.386389	-86.816667	33.578333	-86.773889	.	.
1073	1996	33.331111	-87.003611	33.386389	-86.816667	33.704722	-86.669167	33.578333	-86.773889
1073	1998	33.485556	-86.915	33.386389	-86.816667	.	.	.	.
1073	1999	33.331111	-87.003611	33.386389	-86.816667	.	.	.	.

;
RUN;

What I've tried is using the following code:

 

proc transpose data=have out=want;

by id year;
var latitude longitude;

run;

However, the result is not quite what I wanted. In particular, the full dataset could have many latitude/longitude pairs (much more than 4 shown in the example), so I would like the naming of the columns to be from latitude1, longitude1, latitude2, longitude2, ..., up until the last pair.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

What are you going to to do with that wide data set? How will you program anything to us it if you do not know how many pairs will be involved for any given id?

 

If you describe what you intend to do that may help. Most SAS procedures will be much happier with the long format you currently have.

 

This is as close as I'm willing to spend any time working on at the moment without knowing how this is to be used.

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667
;
RUN;

data temp;
   set have;
   by id year;
   retain group;
   if first.year then group=1;
   else group+1;
run;

proc transpose data=temp out=trans1 (drop=_name_)
     prefix=Latitude;
   by id year;
   id group;
   var Latitude;
run;
proc transpose data=temp out=trans2 (drop=_name_)
     prefix=Longitude;
   by id year;
   id group;
   var Longitude;
run;

data junk;
   merge trans1 trans2;
   by id year;
run;

View solution in original post

6 REPLIES 6
ballardw
Super User

What are you going to to do with that wide data set? How will you program anything to us it if you do not know how many pairs will be involved for any given id?

 

If you describe what you intend to do that may help. Most SAS procedures will be much happier with the long format you currently have.

 

This is as close as I'm willing to spend any time working on at the moment without knowing how this is to be used.

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667
;
RUN;

data temp;
   set have;
   by id year;
   retain group;
   if first.year then group=1;
   else group+1;
run;

proc transpose data=temp out=trans1 (drop=_name_)
     prefix=Latitude;
   by id year;
   id group;
   var Latitude;
run;
proc transpose data=temp out=trans2 (drop=_name_)
     prefix=Longitude;
   by id year;
   id group;
   var Longitude;
run;

data junk;
   merge trans1 trans2;
   by id year;
run;
ChrisNZ
Tourmaline | Level 20
data junk;

Is that a subtle hint? 😁

Agree with you btw, the requested data format is unwieldy.

Except as a final output (i.e. a report), data is -with very few exceptions- better kept in a long -rather than wide- format.

 

ballardw
Super User

@ChrisNZ wrote:
data junk;

Is that a subtle hint? 😁

Agree with you btw, the requested data format is unwieldy.

Except as a final output (i.e. a report), data is -with very few exceptions- better kept in a long -rather than wide- format.

 


In this case, yes the name of the data set is editorial.

I have dealt with entirely too many "data sets" like that whose origins were the result of a report.

I'm waiting for the follow-up request on using those lat/long pairs to draw something on a map. And then show how the existing data allows for polygons pretty cleanly.

 

I often create a data set named junk for specific hopefully temporary cleaning steps. Like when a proc throws an unexpected value or two I'll get the associated records with a Where to create the Junk set. Then determine the "county" value I didn't expect in the output is because someone can't spell or transposed city and county or transposed digits in the Zip code or some other nonsense.

 

If I ever were to need disk space then Junk (and Junk2 and Junk3 if present) are always allowed to be deleted.

Tom
Super User Tom
Super User

So when you have multiple observations for a BY group and multiple variables then PROC TRANSPOSE will generate multiple variables and multiple observations.  Essentially it is transposing an NxM matrix to a MxN matrix.

 

You could make it work with just PROC TRANPOSE by transposing three times.  Print the intermediate datasets to see what it happening.

proc transpose data=have out=step1 name=variable prefix=_;
  by id year;
  var latitude longitude ;
run;

proc transpose date=step1 out=step2 prefix=_;
  by id year variable;
run;

proc transpose data=step2 out=want (drop=_name_);
  by id year ;
  id variable _name_;
  var _1;
run;

But it is probably easier (and definitely faster if the file is large) to just use arrays to transpose the data for you.

data want;
  do col=1 by 1 until (last.year);
    set have (rename=(latitude=x longitude=y));
    by id year;
    array latitude [4];
    array longitude [4];
    latitude[col]=x;
    longitude[col]=y;
  end;
  drop col x y ;
run;

If you don't know what size to use for the two array's then use an extra pass through the data to find maximum repetition into a macro variable and use the macro variable when defining the array.

proc sql noprint;
select max(n) format=32. into :n trimmed
from (select id,year,count(*) as n from have group by id,year)
;
quit;

data want;
  do col=1 by 1 until (last.year);
    set have (rename=(latitude=x longitude=y));
    by id year;
    array latitude [&n];
    array longitude [&n];
    latitude[col]=x;
    longitude[col]=y;
  end;
  drop col x y ;
run;
elbarto
Obsidian | Level 7
Thanks Tom! This is a great solution too.
Ksharp
Super User

The simplest way is using PROC SUMMARY.

 

DATA have;
input id	year	Latitude	Longitude
;
DATALINES;
1049	1995	34.289001	-85.970065
1049	1999	34.289001	-85.970065
1073	1990	33.386389	-86.816667
1073	1995	33.331111	-87.003611
1073	1995	33.386389	-86.816667
1073	1995	33.578333	-86.773889
1073	1996	33.331111	-87.003611
1073	1996	33.386389	-86.816667
1073	1996	33.704722	-86.669167
1073	1996	33.578333	-86.773889
1073	1998	33.485556	-86.915
1073	1998	33.386389	-86.816667
1073	1999	33.331111	-87.003611
1073	1999	33.386389	-86.816667
;
RUN;

proc sql noprint;
select max(n) into : n
 from (select count(*) as n from have group by id,year);
quit;
proc summary data=have;
by id year;
output out=want idgroup(out[&n] (Latitude	Longitude)=);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2919 views
  • 1 like
  • 5 in conversation