Hello,
I am new to SAS and I am able to manipulate my dataframe pretty well but need help counting values in a column and then transposing the data from long to wide. My data looks like
ID | year_month | monthly_spending | store |
123 | 201107 | 25 | storeA 5555 |
123 | 201107 | 10 | storeA 444 |
123 | 201108 | 50 | storeB 33 |
1234 | 201010 | 45 | storeA 5555 |
1234 | 201010 | 40 | storeA 5555 |
1234 | 201011 | 30 | storeC 222 |
12345 | 200906 | 24 | storeC 222 |
12345 | 200907 | 3 | storeC 222 |
12345 | 200907 | 27 | storeB 33 |
12345 | 200908 | 15 | storeA 555 |
And what I need is
ID | year_month | avg_mnth_storeA | storeA_visits | avg_mnth_storeB | storeB_visits | avg_mnth_storeC | storeC_visits |
123 | 201107 | 17.5 | 2 | 0 | 0 | 0 | 0 |
123 | 201108 | 0 | 0 | 50 | 1 | 0 | 0 |
1234 | 201010 | 45 | 2 | 0 | 0 | 0 | 0 |
1234 | 201011 | 0 | 0 | 0 | 0 | 30 | 1 |
12345 | 200906 | 0 | 0 | 0 | 0 | 24 | 1 |
12345 | 200907 | 0 | 0 | 0 | 0 | 15 | 2 |
12345 | 200908 | 15 | 1 | 0 | 0 | 0 | 0 |
In these example each store has a name(storeA) and the number following it is its location identifier. I know how to count each store with its identifier but I would like to be able to ignore the location identifier and only have a count of stores with the same name.
I also know how to transpose my data from long to wide like this but not how to do it so that the count of the stores is what is in each column and not another value from my dataframe associated with it.
How can I get a count of each store based on similar names and then transpose that from long to wide where those counts are now in the column?
Any suggestions are much appreciated!!
Data have(drop=store_id);
input ID $ year_month monthly_spending store_id $15.;
store=scan(store_id,1);
cards;
123 201107 25 storeA 5555
123 201107 10 storeA 444
123 201108 50 storeB 33
1234 201010 45 storeA 5555
1234 201010 40 storeA 5555
1234 201011 30 storeC 222
12345 200906 24 storeC 222
12345 200907 3 storeC 222
12345 200907 27 storeB 33
12345 200908 15 storeA 555
;
run;
proc sort data=have;
by Id year_month store;
run;
data avg_vis;
sum_tot=0;
visit=0;
do until(last.store);
set have;
by Id year_month store;
sum_tot+monthly_spending;
visit+1;
end;
avg_mnth=sum_tot/visit;
drop sum_tot;
run;
proc transpose data=avg_vis out=avg_mth prefix=avg_mth_;
by id year_month;
var avg_mnth ;
id store;
run;
proc transpose data=avg_vis out=visit suffix=_visit;
by id year_month;
var visit ;
id store;
run;
data want;
merge avg_mth visit;
by id year_month;
array init{*} _numeric_;
do i=1 to dim(init);
if missing(of init[i]) then init[i]=0;
end;
drop i _name_;;
run;
There are a few steps you will need to take. Begin by getting the store name without the location identifier. If the identifier is always the last word within STORE, you could use:
data temp;
set have;
id_len = length(scan(store, -1));
store_name = substr(store, 1, length(store) - id_len);
run;
You will eventually have to consider whether the STORE_NAME could be a valid portion of a variable name, and whether it is spelled consistently. We're going to ignore that (since only you have the data), but it will be an issue sooner or later.
Next, get statistics, using variable names that are similar to the names you want to use eventually. For example:
proc summary data=temp nway;
by id year_month;
class store_name;
var monthly_spending;
output out=temp2 (drop=_type_ _freq_)
mean = _avg_month
n=_visits
;
run;
Note that transposing will be much easier if you select consistent names ... should the STORE_NAME be a prefix, or should it be a suffix. Here, I'm expecting it to be a prefix that will eventually have either "_avg_month" or "_visits" appended to the name.
See if that much makes sense, then we can talk about transposing from long to wide and address the remaining issues.
The consistency of the store name did become an issue. I was able to drop the numbers associated with the location but there are instances where there is a different spelling. Is there a way to assign similar names to one variable? For example, if I had a store name SaveMore and in my data I had; SaveMore, SaveMr, SaveM - would I be able to use substr to take the first four letter and say that if the substr='save' then assign to SaveMore?
Data have(drop=store_id);
input ID $ year_month monthly_spending store_id $15.;
store=scan(store_id,1);
cards;
123 201107 25 storeA 5555
123 201107 10 storeA 444
123 201108 50 storeB 33
1234 201010 45 storeA 5555
1234 201010 40 storeA 5555
1234 201011 30 storeC 222
12345 200906 24 storeC 222
12345 200907 3 storeC 222
12345 200907 27 storeB 33
12345 200908 15 storeA 555
;
run;
proc sort data=have;
by Id year_month store;
run;
data avg_vis;
sum_tot=0;
visit=0;
do until(last.store);
set have;
by Id year_month store;
sum_tot+monthly_spending;
visit+1;
end;
avg_mnth=sum_tot/visit;
drop sum_tot;
run;
proc transpose data=avg_vis out=avg_mth prefix=avg_mth_;
by id year_month;
var avg_mnth ;
id store;
run;
proc transpose data=avg_vis out=visit suffix=_visit;
by id year_month;
var visit ;
id store;
run;
data want;
merge avg_mth visit;
by id year_month;
array init{*} _numeric_;
do i=1 to dim(init);
if missing(of init[i]) then init[i]=0;
end;
drop i _name_;;
run;
Here is the logic that I came up with to accomplish your request:
PROC SQL;
CREATE TABLE WORK.Calculations AS
SELECT DISTINCT
ID
, Store
, year_month
, COUNT(ID) AS Visits
, (SUM(monthly_spending)/ CALCULATED Visits) AS Avg_Mnth_Spend
FROM WORK.StoreData
GROUP BY ID, Store, year_month;
QUIT;
PROC SORT DATA=Calculations; BY ID year_month; RUN;
PROC TRANSPOSE DATA=WORK.Calculations out=AvgMnthly (drop=_name_) prefix=avg_mnth_;
var Avg_Mnth_Spend ;
by ID year_month;
ID Store;
run;
PROC TRANSPOSE DATA=WORK.Calculations out=VisitCt (drop=_name_) suffix=_Visits;
var Visits ;
by ID year_month;
ID Store;
run;
PROC SQL;
CREATE TABLE WORK.FullDtls AS
SELECT
a.ID
, a.year_month
, a.avg_mnth_storeA
, b.storeA_Visits
, a.avg_mnth_storeB
, b.storeB_Visits
, a.avg_mnth_storeC
, b.storeC_Visits
FROM WORK.AvgMnthly AS a
INNER JOIN WORK.VisitCt AS b ON a.ID=b.ID
AND a.year_month=b.year_month
ORDER BY YEAR(a.year_month) DESC, MONTH(a.year_month);
QUIT;
The resulting output looks like this:
Hope this helps.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.