Hi. I am working on a data where there are 65000 observations. this is a transport delay data for various cities. So i need to find if there is a relation between the cities and the delay in days. T-test and ANOVA cannot be conducted as day is a numeric value and city is a categorical value. Also would it be advisable if i aggrgate the data i.e. total of delay in days for a particular city and then run some descriptive analysis. Please suggest.
Can you not just create a day character field = dayc=strip(put(day,best.)); and use that in your procedures. Note 65k records is rather small so processing should be very fast.
proc sql;
create table want as
select city, avg(delay) as delay
from have
group by city
order by calculated delay descending
;
quit;
I do not see any limitation why numeric vars cannot be categorical.
Dates are numbers the number is day count passed since 1 jan1960. A marvelous number to calculate the delay when you have the datae (number) end/start
What is your question that is a numeric? where wich product?
HI Jaap,
My question is that for multiple cities (city is a categorical variable) I have related delay in number of days. These days are numeric like 1, 2, 10, 40 etc hence if i want to test any assumptions say whether city and number of delay days correlated i cant do that cause it voilates the assumptions of correlation. Similarly as there are multiple data values for each city (multiple delay dates) i also cant run anova or ttest hence please advice what test should be deployed here.
writting a code is not a concern here. its how the relationship should be tested.
Shivi, than that is a role for a statistician. It is to discuss what the variables are meaning before going for some analyses.
We should talk about measurements categorical (binomimal ordinal).
I would avoid numeric character as characters are numbers (encoding) in a computer and numbers are often used as some character human. Confusing....
You have:
- cities (categorical)
- Than there is some delay information. Delay on what (the transport) you have in it numbers. binning in classes is applying a format.
could be delay short (1-2 day) medium (3-7 day) and so on.
You can apply that on the delays column (no correlation assumed) or getting into corrections for known causes/correlation.
Is your data structure rows/columns in an acceptable way setup or must this be transformed first. What is the topic/event to analyse?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.