BookmarkSubscribeRSS Feed
Shivi82
Quartz | Level 8

Hi. I am working on a data where there are 65000 observations. this is a transport delay data for various cities. So i need to find if there is a relation between the cities and the delay in days. T-test and ANOVA cannot be conducted as day is a numeric value and city is a categorical value. Also would it be advisable if i aggrgate the data i.e. total of delay in days for a particular city and then run some descriptive analysis. Please suggest.

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Can you not just create a day character field = dayc=strip(put(day,best.));  and use that in your procedures.  Note 65k records is rather small so processing should be very fast.

jakarman
Barite | Level 11

I do not see any limitation why numeric vars cannot be categorical.

Dates are numbers the number is day count passed since 1 jan1960. A marvelous number to calculate the delay when you have the datae (number) end/start

What is your question that is a numeric? where wich product?

---->-- ja karman --<-----
Shivi82
Quartz | Level 8

HI Jaap,

My question is that for multiple cities (city is a categorical variable) I have related delay in number of days. These days are numeric like 1, 2, 10, 40 etc hence if i want to test any assumptions say whether city and number of delay days correlated i cant do that cause it voilates the assumptions of correlation. Similarly as there are multiple data values for each city (multiple delay dates) i also cant run anova or ttest hence please advice what test should be deployed here.

writting a code is not a concern here. its how the relationship should be tested.

jakarman
Barite | Level 11

Shivi, than that is a role for a statistician. It is to discuss what the variables are meaning before going for some analyses.

We should talk about measurements categorical (binomimal ordinal). 

I would avoid numeric character as characters are numbers (encoding) in a computer and numbers are often used as some character human.  Confusing....

You have:

- cities (categorical)

- Than there is some delay information. Delay on what (the transport)  you have in it numbers. binning in classes is applying a format.

  could be delay short (1-2 day)  medium (3-7 day)  and so on. 

  You can apply that on the delays column  (no correlation assumed) or getting into corrections for known causes/correlation.

Is your data structure rows/columns in an acceptable way setup or must this be transformed first. What is the topic/event to analyse?            

---->-- ja karman --<-----
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1880 views
  • 0 likes
  • 4 in conversation