Help using Base SAS procedures

Which technique should be used

Reply
Frequent Contributor
Posts: 92

Which technique should be used

Hi. I am working on a data where there are 65000 observations. this is a transport delay data for various cities. So i need to find if there is a relation between the cities and the delay in days. T-test and ANOVA cannot be conducted as day is a numeric value and city is a categorical value. Also would it be advisable if i aggrgate the data i.e. total of delay in days for a particular city and then run some descriptive analysis. Please suggest.

Super User
Super User
Posts: 7,997

Re: Which technique should be used

Can you not just create a day character field = dayc=strip(put(day,best.));  and use that in your procedures.  Note 65k records is rather small so processing should be very fast.

Super User
Posts: 7,866

Re: Which technique should be used

proc sql;

create table want as

select city, avg(delay) as delay

from have

group by city

order by calculated delay descending

;

quit;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 3,215

Re: Which technique should be used

I do not see any limitation why numeric vars cannot be categorical.

Dates are numbers the number is day count passed since 1 jan1960. A marvelous number to calculate the delay when you have the datae (number) end/start

What is your question that is a numeric? where wich product?

---->-- ja karman --<-----
Frequent Contributor
Posts: 92

Re: Which technique should be used

HI Jaap,

My question is that for multiple cities (city is a categorical variable) I have related delay in number of days. These days are numeric like 1, 2, 10, 40 etc hence if i want to test any assumptions say whether city and number of delay days correlated i cant do that cause it voilates the assumptions of correlation. Similarly as there are multiple data values for each city (multiple delay dates) i also cant run anova or ttest hence please advice what test should be deployed here.

writting a code is not a concern here. its how the relationship should be tested.

Trusted Advisor
Posts: 3,215

Re: Which technique should be used

Shivi, than that is a role for a statistician. It is to discuss what the variables are meaning before going for some analyses.

We should talk about measurements categorical (binomimal ordinal). 

I would avoid numeric character as characters are numbers (encoding) in a computer and numbers are often used as some character human.  Confusing....

You have:

- cities (categorical)

- Than there is some delay information. Delay on what (the transport)  you have in it numbers. binning in classes is applying a format.

  could be delay short (1-2 day)  medium (3-7 day)  and so on. 

  You can apply that on the delays column  (no correlation assumed) or getting into corrections for known causes/correlation.

Is your data structure rows/columns in an acceptable way setup or must this be transformed first. What is the topic/event to analyse?            

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 5 replies
  • 228 views
  • 0 likes
  • 4 in conversation