I have a data set that is set up something like:
Billing Month Product Gross Sales (Quantity)
01/12 FordF150 12
01/12 FordF250 16
01/12 FordFocus 8
01/12 Dodge1500 11
01/12 Dodge2500 18
01/12 DodgeAvenger 13
02/12 FordF150 15
02/12 FordF250 7
02/12 FordFocus 9
02/12 Dodge1500 11
02/12 Dodge2500 5
02/12 DodgeAvenger 20
Except my data set is a few hundred times bigger with more product names.
What I want to do is to combine rows within the same billing month by a common keyword.
Example: I want the data to look like:
Billing Month Product Gross Sales
01/12 Ford 36
01/12 Dodge 42
02/12 Ford 31
02/12 Dodge 36
How can this be done in SAS? I have access to SAS 9.2 and 9.3
data want;
set have;
if find(product,'Ford')>0 then prod=substr(product,1,4);
else if find(product,'Dodge')>0 then prod=substr(product,1,5);
run;
proc sql;
select month,prod,sum(sales) as Gross_sales
from want
group by month,prod
order by month,prod desc;
quit;
How many product lines are you dealing with? With your examples are you dealing with any and all car makes or ONLY Dodge and Ford?
Also is this a one time thing or will this be recurring? If recurring I would look at the raw data and see if there was a way to get the maker and model separate. OR build a custom format that would likely need to be updated occasionaly.
If only two makers a simple brute force or a few:
data want;
set have;
length Maker $ 8; /* make this at least as long as the longest maker name*/
if index(upcase(product),'FORD') >0 then Maker = 'Ford';
else if index(upcase(product,'DODGE'>0 then Maker='Dodge';
run;
then many report procedures can summarize.
proc means data=want sum;
class billingmonth maker;
var sales;
run;
I have added the variable Maker so you can verify the results. Caveat: If any model ends up with part or another makers name this won't work. Unlikely but .... Which is why I recommend going back to the raw data and how it is brough into SAS and have the make and model as diffferent variables. Also, you may get a request for pickups, sedans, coupes, SUV and such. That will much easier with the model separate.
I do not use variable names with spaces in them, if yours have them adjust code as needed. And I could have done this as far back as 5.18 and I'm sure much earlier than that.
It will take a lot of subset functions, but I would try something like this:
firstupper=anyupper(substr(product,2));
firstdigit=anydigit(substr(product,2));
if firstupper=0 then firstupper=.;
if firstdigit=0 then firstdigit=.;
split=min(firstupper,firstdigit);
car_make=substr(product,1,split);
FIRSTUPPER finds the first uppercase letter (not counting the first letter of the product name).
FIRSTDIGIT finds the first digit in the product name.
If an uppercase letter or a digit are not found, a 0 is returned. I force all 0's to missing so they are not captured when I find the smallest value.
SPLIT finds the smallest value of FIRSTUPPER and FIRSTDIGIT to find out where to split the product name.
CAR_MAKE subsets the product name from the beginning of the string to the start of the first digit or next uppercase letter.
data want;
set have;
if find(product,'Ford')>0 then prod=substr(product,1,4);
else if find(product,'Dodge')>0 then prod=substr(product,1,5);
run;
proc sql;
select month,prod,sum(sales) as Gross_sales
from want
group by month,prod
order by month,prod desc;
quit;
Use a small auxiliary table (Brands) and the truncated string comparison operator EQT :
data Sales;
length Product $16;
input Month :anydtdte. Product Sales;
format Month mmyys5.;
datalines;
01/12 FordF150 12
01/12 FordF250 16
01/12 FordFocus 8
01/12 Dodge1500 11
01/12 Dodge2500 18
01/12 DodgeAvenger 13
02/12 FordF150 15
02/12 FordF250 7
02/12 FordFocus 9
02/12 Dodge1500 11
02/12 Dodge2500 5
02/12 DodgeAvenger 20
;
data Brands;
length Brand $16;
input Brand;
datalines;
Ford
Dodge
;
proc sql;
select Month, Brand, sum(sales) as BrandSales
from Sales inner join Brands on Product EQT Brand
group by Month, Brand;
quit;
PG
by proc sql in a single step
proc sql;
create table want as select distinct BillingMonth, sum(grosssales) as sum, case when index(lowcase(product),'ford')>0 then 'Ford'
when index(lowcase(product),'dodge')>0 then 'Dodge'
else '' end as prod from have group by BillingMonth, prod;
quit;
Thanks,
Jag
A simple way is using format .
data Sales; length Product $16; input Month $ Product Sales; datalines; 01/12 FordF150 12 01/12 FordF250 16 01/12 FordFocus 8 01/12 Dodge1500 11 01/12 Dodge2500 18 01/12 DodgeAvenger 13 02/12 FordF150 15 02/12 FordF250 7 02/12 FordFocus 9 02/12 Dodge1500 11 02/12 Dodge2500 5 02/12 DodgeAvenger 20 ; run; proc summary data=Sales nway ; class month product; format product $4.; var Sales; output out=want sum=; run;
Xia Keshan
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.