11-14-2014 04:12 AM
Hi. I am working on a data where there are 65000 observations. this is a transport delay data for various cities. So i need to find if there is a relation between the cities and the delay in days. T-test and ANOVA cannot be conducted as day is a numeric value and city is a categorical value. Also would it be advisable if i aggrgate the data i.e. total of delay in days for a particular city and then run some descriptive analysis. Please suggest.
11-14-2014 04:30 AM
Can you not just create a day character field = dayc=strip(put(day,best.)); and use that in your procedures. Note 65k records is rather small so processing should be very fast.
11-14-2014 04:32 AM
create table want as
select city, avg(delay) as delay
group by city
order by calculated delay descending
11-14-2014 04:53 AM
I do not see any limitation why numeric vars cannot be categorical.
Dates are numbers the number is day count passed since 1 jan1960. A marvelous number to calculate the delay when you have the datae (number) end/start
What is your question that is a numeric? where wich product?
11-14-2014 05:24 AM
My question is that for multiple cities (city is a categorical variable) I have related delay in number of days. These days are numeric like 1, 2, 10, 40 etc hence if i want to test any assumptions say whether city and number of delay days correlated i cant do that cause it voilates the assumptions of correlation. Similarly as there are multiple data values for each city (multiple delay dates) i also cant run anova or ttest hence please advice what test should be deployed here.
writting a code is not a concern here. its how the relationship should be tested.
11-14-2014 05:56 AM
Shivi, than that is a role for a statistician. It is to discuss what the variables are meaning before going for some analyses.
We should talk about measurements categorical (binomimal ordinal).
I would avoid numeric character as characters are numbers (encoding) in a computer and numbers are often used as some character human. Confusing....
- cities (categorical)
- Than there is some delay information. Delay on what (the transport) you have in it numbers. binning in classes is applying a format.
could be delay short (1-2 day) medium (3-7 day) and so on.
You can apply that on the delays column (no correlation assumed) or getting into corrections for known causes/correlation.
Is your data structure rows/columns in an acceptable way setup or must this be transformed first. What is the topic/event to analyse?