BookmarkSubscribeRSS Feed
mccusker1818
Fluorite | Level 6

Hello everyone, 

 

I have an assignment that simply states, to "include predictive statistics for store sales and store orders and to include charts, graphs, and tables, and analysis of your findings."

 

I am not really sure what type of predictive statistics I should include.

 

So far I have used proc corr and proc reg  shown below, where log_average is the average amount spent per visit.

proc sgplot data=data1;
histogram log_average;
run;
proc corr data=data1;
var log_average;
with 'Customer Id'n ZIP_CODE FRE MON CC_CARD PSWEATERS PKNIT_TOPS PKNIT_DRES
PBLOUSES PJACKETS PCAR_PNTS PCAS_PNTS PSHIRTS PDRESSES PSUITS POUTERWEAR
PJEWELRY PFASHION PLEGWEAR PCOLLSPND GMP PROMOS DAYS MARKDOWN CLUSTYPE
PERCRET 'ln days between purchases'n 'ln lifetime ave time betw visits'n;
run;
proc reg data=data1 outest=est1;
model log_average= 'Customer Id'n ZIP_CODE FRE MON CC_CARD PSWEATERS PKNIT_TOPS PKNIT_DRES
PBLOUSES PJACKETS PCAR_PNTS PCAS_PNTS PSHIRTS PDRESSES PSUITS POUTERWEAR
PJEWELRY PFASHION PLEGWEAR PCOLLSPND GMP PROMOS DAYS MARKDOWN CLUSTYPE
PERCRET 'ln days between purchases'n 'ln lifetime ave time betw visits'n / slstay=0.15 slentry=0.15
selection=forward ss2 sse aic;
output out=out1 p=p r=r;
run;
quit;

 I was told that I need to provide more information about the results. Basically I am finding which variables have a positive correlation with the average amount spent per visit, is that correct?

 

How can I use this information, or what simpler methods of predictive analytics can I use? 

My results are shown below. 

log_average reg model.PNG

 I also get the fit diagnostics plots and residual by regressors.

 

fit diagnostics sales.PNG

residual by regressors for log_average.PNG

 I really don't understand how this tells me anything besides the correlation between all my variables.

Thank you for any input you may have.

12 REPLIES 12
mccusker1818
Fluorite | Level 6

I know I can't upload the actual .csv files on here, so here are some screenshots of the variables in Microsoft word.

store data variables.PNG

 What I'm having a hard time with is how I can do predictive analytics with this data when their is no time series variables.

 

SASKiwi
PROC Star

If you are doing time-series predictions, that's called forecasting.

 

Non-time series predictions are perfectly feasible with the data you have. Are you doing a stats course as the type of questions you are asking are basic statistics questions and not really anything to do with SAS?

 

You could, for example, compare sales with or without a credit card. Does using a credit card result in higher spend?

 

What about zip code? Do customers in certain zip codes spend more that others?

 

Do customers who have been on file spend more or less?

 

I'm sure you could come up with a whole bunch more questions along these lines.

 

mccusker1818
Fluorite | Level 6

Well my project has to use SAS and simply states to include predictive statistics for store sales and store orders to include charts, graphs, and tables, and an analysis of my findings. Currently I have used t-tests to compare credit card users and non-credit card users to see who uses promotions more. My code and results are shown below.

proc univariate data=MIS543.CLOTHINGSALES normal mu0=0;
ods select TestsForNormality;
class CC_CARD;
var PROMOS;
run;

/* t test */
proc ttest data=MIS543.CLOTHINGSALES sides=2 h0=0 plots(only
showh0)=(summaryPlot qqplot);
class CC_CARD;
var PROMOS;
run;

promos.PNG

 Would this be an example of predictive statistics?

 

What code would I use to answer your posited question of "Do customers in certain zip codes spend more that others?"

 

I am new to SAS so this is pretty difficult for me 

 

Thank you for the help.

 

SASKiwi
PROC Star

What code would I use to answer your posited question of "Do customers in certain zip codes spend more than others?"

I would probably calculate the average sales per customer, then total that up by zip code, then look at the distribution those totals across zip codes. Not knowing what your data looks like makes it hard to offer guidance. How many discrete zip codes do you have and how many customers per zip code are there? If you only have a few customers per zip code then comparing zip codes is probably pointless.

 

An easier question would be comparing credit card sales with non-credit card sales. Again I would calculate the average sales per customer ending up with two populations of customers, one using credit cards and one not. Then use a Chi-Square test to compare the two populations - see the CHISQ option in PROC UNIVARIATE.

 

   

mccusker1818
Fluorite | Level 6

My dataset has 28799 total rows and 29 columns. 

Here is a screenshot of the data.

AAAAAAAAAAA.PNG

I am unfamiliar with how to use the CHISQ option with PROC UNIVARIATE.

SASKiwi
PROC Star

Apologies, CHISQ is an option on PROC FREQ not UNIVARIATE. My stats skills are fairly basic so it would be better if one of the experts chipped in like @Rick_SAS .

 

A good starting point would be to compare distributions in PROC UNIVARIATE which you appear to have started doing. At some point you will need to provide some sample data using a DATA step with DATALINES if you want code that works. 

ballardw
Super User

This topic reminds way too much of a job interview question /instruction which was "Given this data, perform the most complicated analysis you can do."

 

Since the tool they were using for the interview was SPSS and I had ZERO experience with that program my first response was "Nothing, don't know the tools". And then described a number of the things that I thought the data supported from T-Test, Anova, Chi-squares, some regressions to non-parametric tests of distribution, location or dispersion.

 

PS: I did get the job and learned to strongly dislike the SPSS behavior of removing/changing syntax with each "upgrade."

mccusker1818
Fluorite | Level 6

So are t-tests and chi-squares used for predictive analytics or descriptive analytics?

 

I have used PROC CORR to try and find the relationship between the "average amount spent per visit" and the rest of the variables within the dataset.

 

Here is my code.

 

bbbbb.PNG

Is this correct?

mccusker1818
Fluorite | Level 6

Hello all, not sure if this is the right location for this subject, but here it goes.

 

I am new to SAS and am tasked with including predictive statistics for store sales and store orders and to include charts, graphs, and tables.

 

The datasets I was given do no contain any time series variables, and I am not sure what type of predictive analytics I am able to perform using SAS studio.

 

How do I perform predictive analytics with these datasets?

 

Here are some screenshots of the 2 datasets including the variables list in Microsoft word.

clothing sales upload.PNG

clothing orders upload.PNG

store data variables.PNG

 Thank you for any help!

ballardw
Super User

Merged duplicate (re-post).

MarketaVejskrab
Calcite | Level 5

Hi Dears,

I am trying to run the below codes and they do not show any result. Could you try it and let me know if it works for you?

Load the Data:

proc import datafile='/path/to/customers.csv' out=customers dbms=csv replace;

getnames=yes;

run:

Summary Statistics:

proc means data=customers n means std min max;

var Total Revenue Unit Cost Discount;

run;

Frequency Distribution:

proc freq data=customers;

tables Customer_Group OrderTypeLabel CustomerCountryLabel;

run;

Visualizations:

proc sgplot data=customers;

histogram Total_Revenue;

run;

proc sgplot data=customers;

vbar Customer_Group / response=Total_Revenue stat=mean;

run;

Step 4: Predictive Statistics

Correlation Analysis:

proc corr data=customers;

var Total_Revenue Customer_BirthDate Unit_Cost Discount;

run;

Regression Analysis:

proc reg data=customers;

model Total_Revenue = Customer_BirthDate Unit_Cost Discount;

run;

ANOVA for Sales Channels:

proc anova data=customers;

class OrderTypeLabel;

model Total_Revenue = OrderTypeLabel;

run;

Thanks and regards,

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2021 views
  • 0 likes
  • 5 in conversation