I have some airport departure and destination data, along with charges. Suppose I have:
depart dest OCU
cyyz egll 500
cyvr cyyz 10
egll cyyz 500
What I would like to do is to sum OCU by city pair, which is the concatenation of depart and dest, hopefully in the format of depart-dest (dash in between them).
The price is that reverse flights should be included as one...so above the final output dataset would be:
City_Pair OCU
cyyz-egll 1000
cyvr-cyyz 10
thanks for any advice!
Hi,
Do you mean something like this:
data have;
input depart $ dest $ OCU;
cards;
cyyz egll 500
cyvr cyyz 10
egll cyyz 500
;
run;
proc print;
run;
data have2 / view = have2;
set have;
array dd[*] depart dest;
call sortc(of dd[*]);
grp = catx("-", of dd[*]);
run;
proc sql;
select grp, sum(ocu) as sumOcu
from have2
group by grp
;
quit;
All the best
Bart
As a single query:
proc sql;
create table want as
select
catx("-", a.depart, a.dest) as cityPair,
sum(a.OCU, b.OCU) as OCU
from
have as a left join
have as b on a.depart=b.dest and a.dest=b.depart
where b.depart < a.dest or b.depart is missing;
quit;
Yet another variation - hash-based:
data have ;
input depart $ dest $ ocu ;
cards ;
cyyz egll 500
cyvr cyyz 10
egll cyyz 500
;
data _null_ ;
dcl hash h () ;
h.definekey ("city_pair") ;
h.definedata ("city_pair", "ocu") ;
h.definedone () ;
do until (z) ;
set have (rename=ocu=_ocu) end = z ;
city_pair = put (catx ("-", depart, dest), $9.) ;
if dest < depart then city_pair = catx ("-", dest, depart) ;
if h.find() ne 0 then ocu = _ocu ;
else ocu + _ocu ;
h.replace() ;
end ;
h.output (dataset:"want") ;
run ;
Kind regards
Paul D.
You've had three good answer, yet none uses PROC SUMMARY, which I think of as the tool SAS intended to solve this problem. So, for completeness sake:
data have ;
input depart $ dest $ ocu ;
cards ;
cyyz egll 500
cyvr cyyz 10
egll cyyz 500
;
data vneed/ view=vneed;
set have;
call sortc (depart,dest);
length pair $9;
pair=catx('-',depart,dest);
run;
proc summary data=vneed nway;
class pair;
var ocu;
output out=want (drop=_type_ _freq_) sum=sumocu;
run;
The nice thing about PROC SUMMARY is that it offers keywords that generate a lot of statistics in addition to SUM (e.g. mean, std, median, other percentiles, …). And you could simultaneously do it not only for all pairs, but also for all DEPART values, and all DEST values.
@mkeintz: Mark, a nice complement. Truth be told, though, the crux of this problem isn't which particular tool to use for producing the sums but rather how to knead the input data to enable the summation. I absolutely agree with what you've said of the broader functionality of SUMMARY, but all by itself it's powerless to face this problem head-on. Which, of course, can be said of any other aggregation method used in this thread.A truck may be able to carry a huge tree by its sheer weight; but it cannot do that before the tree is cut into shapes that can fill its bed properly.
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.