I have a data set that is set up something like:
Billing Month Product Gross Sales (Quantity)
01/12 FordF150 12
01/12 FordF250 16
01/12 FordFocus 8
01/12 Dodge1500 11
01/12 Dodge2500 18
01/12 DodgeAvenger 13
02/12 FordF150 15
02/12 FordF250 7
02/12 FordFocus 9
02/12 Dodge1500 11
02/12 Dodge2500 5
02/12 DodgeAvenger 20
Except my data set is a few hundred times bigger with more product names.
What I want to do is to combine rows within the same billing month by a common keyword.
Example: I want the data to look like:
Billing Month Product Gross Sales
01/12 Ford 36
01/12 Dodge 42
02/12 Ford 31
02/12 Dodge 36
How can this be done in SAS? I have access to SAS 9.2 and 9.3
data want;
set have;
if find(product,'Ford')>0 then prod=substr(product,1,4);
else if find(product,'Dodge')>0 then prod=substr(product,1,5);
run;
proc sql;
select month,prod,sum(sales) as Gross_sales
from want
group by month,prod
order by month,prod desc;
quit;
How many product lines are you dealing with? With your examples are you dealing with any and all car makes or ONLY Dodge and Ford?
Also is this a one time thing or will this be recurring? If recurring I would look at the raw data and see if there was a way to get the maker and model separate. OR build a custom format that would likely need to be updated occasionaly.
If only two makers a simple brute force or a few:
data want;
set have;
length Maker $ 8; /* make this at least as long as the longest maker name*/
if index(upcase(product),'FORD') >0 then Maker = 'Ford';
else if index(upcase(product,'DODGE'>0 then Maker='Dodge';
run;
then many report procedures can summarize.
proc means data=want sum;
class billingmonth maker;
var sales;
run;
I have added the variable Maker so you can verify the results. Caveat: If any model ends up with part or another makers name this won't work. Unlikely but .... Which is why I recommend going back to the raw data and how it is brough into SAS and have the make and model as diffferent variables. Also, you may get a request for pickups, sedans, coupes, SUV and such. That will much easier with the model separate.
I do not use variable names with spaces in them, if yours have them adjust code as needed. And I could have done this as far back as 5.18 and I'm sure much earlier than that.
It will take a lot of subset functions, but I would try something like this:
firstupper=anyupper(substr(product,2));
firstdigit=anydigit(substr(product,2));
if firstupper=0 then firstupper=.;
if firstdigit=0 then firstdigit=.;
split=min(firstupper,firstdigit);
car_make=substr(product,1,split);
FIRSTUPPER finds the first uppercase letter (not counting the first letter of the product name).
FIRSTDIGIT finds the first digit in the product name.
If an uppercase letter or a digit are not found, a 0 is returned. I force all 0's to missing so they are not captured when I find the smallest value.
SPLIT finds the smallest value of FIRSTUPPER and FIRSTDIGIT to find out where to split the product name.
CAR_MAKE subsets the product name from the beginning of the string to the start of the first digit or next uppercase letter.
data want;
set have;
if find(product,'Ford')>0 then prod=substr(product,1,4);
else if find(product,'Dodge')>0 then prod=substr(product,1,5);
run;
proc sql;
select month,prod,sum(sales) as Gross_sales
from want
group by month,prod
order by month,prod desc;
quit;
Use a small auxiliary table (Brands) and the truncated string comparison operator EQT :
data Sales;
length Product $16;
input Month :anydtdte. Product Sales;
format Month mmyys5.;
datalines;
01/12 FordF150 12
01/12 FordF250 16
01/12 FordFocus 8
01/12 Dodge1500 11
01/12 Dodge2500 18
01/12 DodgeAvenger 13
02/12 FordF150 15
02/12 FordF250 7
02/12 FordFocus 9
02/12 Dodge1500 11
02/12 Dodge2500 5
02/12 DodgeAvenger 20
;
data Brands;
length Brand $16;
input Brand;
datalines;
Ford
Dodge
;
proc sql;
select Month, Brand, sum(sales) as BrandSales
from Sales inner join Brands on Product EQT Brand
group by Month, Brand;
quit;
PG
by proc sql in a single step
proc sql;
create table want as select distinct BillingMonth, sum(grosssales) as sum, case when index(lowcase(product),'ford')>0 then 'Ford'
when index(lowcase(product),'dodge')>0 then 'Dodge'
else '' end as prod from have group by BillingMonth, prod;
quit;
Thanks,
Jag
A simple way is using format .
data Sales; length Product $16; input Month $ Product Sales; datalines; 01/12 FordF150 12 01/12 FordF250 16 01/12 FordFocus 8 01/12 Dodge1500 11 01/12 Dodge2500 18 01/12 DodgeAvenger 13 02/12 FordF150 15 02/12 FordF250 7 02/12 FordFocus 9 02/12 Dodge1500 11 02/12 Dodge2500 5 02/12 DodgeAvenger 20 ; run; proc summary data=Sales nway ; class month product; format product $4.; var Sales; output out=want sum=; run;
Xia Keshan
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.