For a project, I have a large dataset of 1.5m entries, I am looking to aggregate some car loan data by some constraint variables such as:
Country, Currency, ID, Fixed or floating , performing , Initial Loan Value , Car Type , Car Make
I am wondering if it is possible to aggregate data by summing the initial loan value for the numeric and then condensing the similar variables into one row with the same observation such that I turn the first dataset into the second
Country Currency ID Fixed_or_Floating Performing Initial_Value Current_Value
data have;
set have;
input country $ currency $ ID Fixed $ performing $ initial current;
datalines;
UK GBP 1 Fixed Performing 100 50
UK GBP 1 Fixed Performing 150 30
UK GBP 1 Fixed Performing 160 70
UK GBP 1 Floating Performing 150 30
UK GBP 1 Floating Performing 115 80
UK GBP 1 Floating Performing 110 60
UK GBP 1 Fixed Non-Performing 100 50
UK GBP 1 Fixed Non-Performing 120 30
;
run;
data want;
set have;
input country $ currency $ ID Fixed $ performing $ initial current;
datalines;
UK GBP 1 Fixed Performing 410 150
UK GBP 1 Floating Performing 275 170
UK GBP 1 Fixed Non-performing 220 80
;
run;
Essentially looking for a way to sum the numeric values while concatenating the character variables.
I've tried this code
proc means data=have sum;
var initial current;
by country currency id fixed performing;
run;
Unsure If i'll have to use a proc sql (would be too slow for such a large dataset) or possibly a data step.
any help in concatenating would be appreciated.
proc summary data=have nway;
var initial current;
class country currency id fixed performing;
output out=sums sum=sum_initial sum_current;
run;
proc summary data=have nway;
var initial current;
class country currency id fixed performing;
output out=sums sum=sum_initial sum_current;
run;
perfect, thanks.
@89974114 wrote:
perfect, thanks.
Note that with @PaigeMiller's solution if you remove NWAY you would get summaries for each level and combination of the class variables as well as one overall. This can be very useful if you are interested in the different groupings. The variable _type_ can be used to process or use subsets of this type of summary. An optional Classdata set could be used to indicate combinations of interest.
Yes @ballardw, PROC SUMMARY is an extremely powerful procedure that does lots and lots of very useful things, and it isn't particularly well known by the SAS programmers that I interact with (your experience may be different). The Classdata data set option is a good one, the TYPES statement in PROC SUMMARY does essentially the same thing.
@PaigeMiller I've been using Proc Summary since the time when Proc Means didn't produce output data sets so it is a tool I reach for fairly often though sometimes I forget the options added in the past 20 years or so
Quick supplementary question, i'm trying to get my head around the summary function
i'm trying to make a weighted variable such that it is the weight of each segment where weight = initial / total
Initial total weight
100 330 0.303
110 330 0.333
120 330 0.363
I tried using weight = (initial / total ) but in the summary function it doesn't understand this command.
This has to be done in a data step
proc summary data=have nway;
var initial current;
class country currency id fixed performing;
output out=sums sum=sum_initial sum_current;
run;
data sums;
set sums;
weight=sum_initial/sum_current;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.