BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dwagner
Calcite | Level 5

Hello,

 

I am new to SAS and I am able to manipulate my dataframe pretty well but need help counting values in a column and then transposing the data from long to wide.  My data looks like

IDyear_monthmonthly_spendingstore
12320110725storeA  5555
12320110710storeA  444
12320110850storeB  33
123420101045storeA  5555
123420101040storeA  5555
123420101130storeC 222
1234520090624storeC 222
123452009073storeC 222
1234520090727storeB  33
1234520090815storeA  555

 

 

And what I need is

IDyear_monthavg_mnth_storeAstoreA_visitsavg_mnth_storeBstoreB_visitsavg_mnth_storeCstoreC_visits
12320110717.520000
1232011080050100
12342010104520000
12342010110000301
123452009060000241
123452009070000152
123452009081510000

 

 

In these example each store has a name(storeA) and the number following it is its location identifier.  I know how to count each store with its identifier but I would like to be able to ignore the location identifier and only have a count of stores with the same name.  

 

I also know how to transpose my data from long to wide like this but not how to do it so that the count of the stores is what is in each column and not another value from my dataframe associated with it.  

 

How can I get a count of each store based on similar names and then transpose that from long to wide where those counts are now in the column?

 

Any suggestions are much appreciated!!

1 ACCEPTED SOLUTION

Accepted Solutions
r_behata
Barite | Level 11
Data have(drop=store_id);
	input ID $ year_month	monthly_spending	store_id $15.;
	store=scan(store_id,1);

cards;
123 201107 25 storeA 5555
123 201107 10 storeA 444
123 201108 50 storeB 33
1234 201010 45 storeA 5555
1234 201010 40 storeA 5555
1234 201011 30 storeC 222
12345 200906 24 storeC 222
12345 200907 3 storeC 222
12345 200907 27 storeB  33
12345 200908 15 storeA 555
;

run;


proc sort data=have;
	by Id year_month store;
run;

data avg_vis;
	sum_tot=0;
	visit=0;
	do until(last.store);
		set have;
		by Id year_month store;
		sum_tot+monthly_spending;
		visit+1;
	end;
	avg_mnth=sum_tot/visit;
	drop sum_tot;
run;


proc transpose data=avg_vis out=avg_mth prefix=avg_mth_;
	by id year_month;
	var  avg_mnth ;
	id store;
run;


proc transpose data=avg_vis out=visit suffix=_visit;
	by id year_month;
	var  visit ;
	id store;
run;

data want;
	merge avg_mth visit;
	by id year_month;
	
	array init{*} _numeric_;

	do i=1 to dim(init);
		if missing(of init[i]) then init[i]=0;
	end;
	drop i _name_;;
run;

View solution in original post

4 REPLIES 4
Astounding
PROC Star

There are a few steps you will need to take.  Begin by getting the store name without the location identifier.  If the identifier is always the last word within STORE,  you could use:

 

data temp;
   set have;
   id_len = length(scan(store, -1));
   store_name = substr(store, 1, length(store) - id_len);
run;

You will eventually have to consider whether the STORE_NAME could be a valid portion of a variable name, and whether it is spelled consistently.  We're going to ignore that (since only you have the data), but it will be an issue sooner or later.

 

Next, get statistics, using variable names that are similar to the names you want to use eventually.  For example:

 

proc summary data=temp nway;
   by id year_month;
   class store_name;
   var monthly_spending;
   output out=temp2 (drop=_type_ _freq_)
   mean = _avg_month
   n=_visits
   ;
run;

Note that transposing will be much easier if you select consistent names ... should the STORE_NAME be a prefix, or should it be a suffix.  Here, I'm expecting it to be a prefix that will eventually have either "_avg_month" or "_visits" appended to the name.

 

See if that much makes sense, then we can talk about transposing from long to wide and address the remaining issues.

dwagner
Calcite | Level 5

The consistency of the store name did become an issue.  I was able to drop the numbers associated with the location but there are instances where there is a different spelling.  Is there a way to assign similar names to one variable?  For example, if I had a store name SaveMore and in my data I had; SaveMore, SaveMr, SaveM - would I be able to use substr to take the first four letter and say that if the substr='save' then assign to SaveMore?  

r_behata
Barite | Level 11
Data have(drop=store_id);
	input ID $ year_month	monthly_spending	store_id $15.;
	store=scan(store_id,1);

cards;
123 201107 25 storeA 5555
123 201107 10 storeA 444
123 201108 50 storeB 33
1234 201010 45 storeA 5555
1234 201010 40 storeA 5555
1234 201011 30 storeC 222
12345 200906 24 storeC 222
12345 200907 3 storeC 222
12345 200907 27 storeB  33
12345 200908 15 storeA 555
;

run;


proc sort data=have;
	by Id year_month store;
run;

data avg_vis;
	sum_tot=0;
	visit=0;
	do until(last.store);
		set have;
		by Id year_month store;
		sum_tot+monthly_spending;
		visit+1;
	end;
	avg_mnth=sum_tot/visit;
	drop sum_tot;
run;


proc transpose data=avg_vis out=avg_mth prefix=avg_mth_;
	by id year_month;
	var  avg_mnth ;
	id store;
run;


proc transpose data=avg_vis out=visit suffix=_visit;
	by id year_month;
	var  visit ;
	id store;
run;

data want;
	merge avg_mth visit;
	by id year_month;
	
	array init{*} _numeric_;

	do i=1 to dim(init);
		if missing(of init[i]) then init[i]=0;
	end;
	drop i _name_;;
run;
tsap
Pyrite | Level 9

Here is the logic that I came up with to accomplish your request:

PROC SQL;
CREATE TABLE WORK.Calculations AS
	SELECT DISTINCT
	  	  ID
		, Store
		, year_month
		, COUNT(ID)											AS Visits
		, (SUM(monthly_spending)/ CALCULATED Visits)		AS Avg_Mnth_Spend

	FROM WORK.StoreData
	GROUP BY ID, Store, year_month;
QUIT;

PROC SORT DATA=Calculations; BY ID year_month; RUN;

PROC TRANSPOSE DATA=WORK.Calculations out=AvgMnthly (drop=_name_) prefix=avg_mnth_;
	var Avg_Mnth_Spend ;
	by ID year_month;
	ID Store;
run;

PROC TRANSPOSE DATA=WORK.Calculations out=VisitCt (drop=_name_) suffix=_Visits;
	var Visits ;
	by ID year_month;
	ID Store;
run;


PROC SQL;
CREATE TABLE WORK.FullDtls	AS
	SELECT
		  a.ID				
		, a.year_month		
		, a.avg_mnth_storeA	
		, b.storeA_Visits	
		, a.avg_mnth_storeB	
		, b.storeB_Visits	
		, a.avg_mnth_storeC	
		, b.storeC_Visits

	FROM 		WORK.AvgMnthly	AS a
	INNER JOIN	WORK.VisitCt	AS b	ON a.ID=b.ID
										AND a.year_month=b.year_month
	ORDER BY YEAR(a.year_month) DESC, MONTH(a.year_month);
QUIT;

The resulting output looks like this:

StoreBreakoutData.PNG

 

Hope this helps.

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1226 views
  • 3 likes
  • 4 in conversation