- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
I have an assignment that simply states, to "include predictive statistics for store sales and store orders and to include charts, graphs, and tables, and analysis of your findings."
I am not really sure what type of predictive statistics I should include.
So far I have used proc corr and proc reg shown below, where log_average is the average amount spent per visit.
proc sgplot data=data1;
histogram log_average;
run;
proc corr data=data1;
var log_average;
with 'Customer Id'n ZIP_CODE FRE MON CC_CARD PSWEATERS PKNIT_TOPS PKNIT_DRES
PBLOUSES PJACKETS PCAR_PNTS PCAS_PNTS PSHIRTS PDRESSES PSUITS POUTERWEAR
PJEWELRY PFASHION PLEGWEAR PCOLLSPND GMP PROMOS DAYS MARKDOWN CLUSTYPE
PERCRET 'ln days between purchases'n 'ln lifetime ave time betw visits'n;
run;
proc reg data=data1 outest=est1;
model log_average= 'Customer Id'n ZIP_CODE FRE MON CC_CARD PSWEATERS PKNIT_TOPS PKNIT_DRES
PBLOUSES PJACKETS PCAR_PNTS PCAS_PNTS PSHIRTS PDRESSES PSUITS POUTERWEAR
PJEWELRY PFASHION PLEGWEAR PCOLLSPND GMP PROMOS DAYS MARKDOWN CLUSTYPE
PERCRET 'ln days between purchases'n 'ln lifetime ave time betw visits'n / slstay=0.15 slentry=0.15
selection=forward ss2 sse aic;
output out=out1 p=p r=r;
run;
quit;
I was told that I need to provide more information about the results. Basically I am finding which variables have a positive correlation with the average amount spent per visit, is that correct?
How can I use this information, or what simpler methods of predictive analytics can I use?
My results are shown below.
I also get the fit diagnostics plots and residual by regressors.
I really don't understand how this tells me anything besides the correlation between all my variables.
Thank you for any input you may have.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't see any sales data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I know I can't upload the actual .csv files on here, so here are some screenshots of the variables in Microsoft word.
What I'm having a hard time with is how I can do predictive analytics with this data when their is no time series variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are doing time-series predictions, that's called forecasting.
Non-time series predictions are perfectly feasible with the data you have. Are you doing a stats course as the type of questions you are asking are basic statistics questions and not really anything to do with SAS?
You could, for example, compare sales with or without a credit card. Does using a credit card result in higher spend?
What about zip code? Do customers in certain zip codes spend more that others?
Do customers who have been on file spend more or less?
I'm sure you could come up with a whole bunch more questions along these lines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well my project has to use SAS and simply states to include predictive statistics for store sales and store orders to include charts, graphs, and tables, and an analysis of my findings. Currently I have used t-tests to compare credit card users and non-credit card users to see who uses promotions more. My code and results are shown below.
proc univariate data=MIS543.CLOTHINGSALES normal mu0=0;
ods select TestsForNormality;
class CC_CARD;
var PROMOS;
run;
/* t test */
proc ttest data=MIS543.CLOTHINGSALES sides=2 h0=0 plots(only
showh0)=(summaryPlot qqplot);
class CC_CARD;
var PROMOS;
run;
Would this be an example of predictive statistics?
What code would I use to answer your posited question of "Do customers in certain zip codes spend more that others?"
I am new to SAS so this is pretty difficult for me
Thank you for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What code would I use to answer your posited question of "Do customers in certain zip codes spend more than others?"
I would probably calculate the average sales per customer, then total that up by zip code, then look at the distribution those totals across zip codes. Not knowing what your data looks like makes it hard to offer guidance. How many discrete zip codes do you have and how many customers per zip code are there? If you only have a few customers per zip code then comparing zip codes is probably pointless.
An easier question would be comparing credit card sales with non-credit card sales. Again I would calculate the average sales per customer ending up with two populations of customers, one using credit cards and one not. Then use a Chi-Square test to compare the two populations - see the CHISQ option in PROC UNIVARIATE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My dataset has 28799 total rows and 29 columns.
Here is a screenshot of the data.
I am unfamiliar with how to use the CHISQ option with PROC UNIVARIATE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apologies, CHISQ is an option on PROC FREQ not UNIVARIATE. My stats skills are fairly basic so it would be better if one of the experts chipped in like @Rick_SAS .
A good starting point would be to compare distributions in PROC UNIVARIATE which you appear to have started doing. At some point you will need to provide some sample data using a DATA step with DATALINES if you want code that works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This topic reminds way too much of a job interview question /instruction which was "Given this data, perform the most complicated analysis you can do."
Since the tool they were using for the interview was SPSS and I had ZERO experience with that program my first response was "Nothing, don't know the tools". And then described a number of the things that I thought the data supported from T-Test, Anova, Chi-squares, some regressions to non-parametric tests of distribution, location or dispersion.
PS: I did get the job and learned to strongly dislike the SPSS behavior of removing/changing syntax with each "upgrade."
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So are t-tests and chi-squares used for predictive analytics or descriptive analytics?
I have used PROC CORR to try and find the relationship between the "average amount spent per visit" and the rest of the variables within the dataset.
Here is my code.
Is this correct?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all, not sure if this is the right location for this subject, but here it goes.
I am new to SAS and am tasked with including predictive statistics for store sales and store orders and to include charts, graphs, and tables.
The datasets I was given do no contain any time series variables, and I am not sure what type of predictive analytics I am able to perform using SAS studio.
How do I perform predictive analytics with these datasets?
Here are some screenshots of the 2 datasets including the variables list in Microsoft word.
Thank you for any help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Merged duplicate (re-post).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dears,
I am trying to run the below codes and they do not show any result. Could you try it and let me know if it works for you?
Load the Data:
proc import datafile='/path/to/customers.csv' out=customers dbms=csv replace;
getnames=yes;
run:
Summary Statistics:
proc means data=customers n means std min max;
var Total Revenue Unit Cost Discount;
run;
Frequency Distribution:
proc freq data=customers;
tables Customer_Group OrderTypeLabel CustomerCountryLabel;
run;
Visualizations:
proc sgplot data=customers;
histogram Total_Revenue;
run;
proc sgplot data=customers;
vbar Customer_Group / response=Total_Revenue stat=mean;
run;
Step 4: Predictive Statistics
Correlation Analysis:
proc corr data=customers;
var Total_Revenue Customer_BirthDate Unit_Cost Discount;
run;
Regression Analysis:
proc reg data=customers;
model Total_Revenue = Customer_BirthDate Unit_Cost Discount;
run;
ANOVA for Sales Channels:
proc anova data=customers;
class OrderTypeLabel;
model Total_Revenue = OrderTypeLabel;
run;
Thanks and regards,