BookmarkSubscribeRSS Feed
Shivi82
Quartz | Level 8

Hi. I am working on a data where there are 65000 observations. this is a transport delay data for various cities. So i need to find if there is a relation between the cities and the delay in days. T-test and ANOVA cannot be conducted as day is a numeric value and city is a categorical value. Also would it be advisable if i aggrgate the data i.e. total of delay in days for a particular city and then run some descriptive analysis. Please suggest.

5 REPLIES 5
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Can you not just create a day character field = dayc=strip(put(day,best.));  and use that in your procedures.  Note 65k records is rather small so processing should be very fast.

jakarman
Barite | Level 11

I do not see any limitation why numeric vars cannot be categorical.

Dates are numbers the number is day count passed since 1 jan1960. A marvelous number to calculate the delay when you have the datae (number) end/start

What is your question that is a numeric? where wich product?

---->-- ja karman --<-----
Shivi82
Quartz | Level 8

HI Jaap,

My question is that for multiple cities (city is a categorical variable) I have related delay in number of days. These days are numeric like 1, 2, 10, 40 etc hence if i want to test any assumptions say whether city and number of delay days correlated i cant do that cause it voilates the assumptions of correlation. Similarly as there are multiple data values for each city (multiple delay dates) i also cant run anova or ttest hence please advice what test should be deployed here.

writting a code is not a concern here. its how the relationship should be tested.

jakarman
Barite | Level 11

Shivi, than that is a role for a statistician. It is to discuss what the variables are meaning before going for some analyses.

We should talk about measurements categorical (binomimal ordinal). 

I would avoid numeric character as characters are numbers (encoding) in a computer and numbers are often used as some character human.  Confusing....

You have:

- cities (categorical)

- Than there is some delay information. Delay on what (the transport)  you have in it numbers. binning in classes is applying a format.

  could be delay short (1-2 day)  medium (3-7 day)  and so on. 

  You can apply that on the delays column  (no correlation assumed) or getting into corrections for known causes/correlation.

Is your data structure rows/columns in an acceptable way setup or must this be transformed first. What is the topic/event to analyse?            

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 788 views
  • 0 likes
  • 4 in conversation