turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Which technique should be used

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2014 04:12 AM

Hi. I am working on a data where there are 65000 observations. this is a transport delay data for various cities. So i need to find if there is a relation between the cities and the delay in days. T-test and ANOVA cannot be conducted as day is a numeric value and city is a categorical value. Also would it be advisable if i aggrgate the data i.e. total of delay in days for a particular city and then run some descriptive analysis. Please suggest.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2014 04:30 AM

Can you not just create a day character field = dayc=strip(put(day,best.)); and use that in your procedures. Note 65k records is rather small so processing should be very fast.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2014 04:32 AM

proc sql;

create table want as

select city, avg(delay) as delay

from have

group by city

order by calculated delay descending

;

quit;

---------------------------------------------------------------------------------------------

Maxims of Maximally Efficient SAS Programmers

Maxims of Maximally Efficient SAS Programmers

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2014 04:53 AM

I do not see any limitation why numeric vars cannot be categorical.

Dates are numbers the number is day count passed since 1 jan1960. A marvelous number to calculate the delay when you have the datae (number) end/start

What is your question that is a numeric? where wich product?

---->-- ja karman --<-----

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2014 05:24 AM

HI Jaap,

My question is that for multiple cities (city is a categorical variable) I have related delay in number of days. These days are numeric like 1, 2, 10, 40 etc hence if i want to test any assumptions say whether city and number of delay days correlated i cant do that cause it voilates the assumptions of correlation. Similarly as there are multiple data values for each city (multiple delay dates) i also cant run anova or ttest hence please advice what test should be deployed here.

writting a code is not a concern here. its how the relationship should be tested.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-14-2014 05:56 AM

Shivi, than that is a role for a statistician. It is to discuss what the variables are meaning before going for some analyses.

We should talk about measurements categorical (binomimal ordinal).

I would avoid numeric character as characters are numbers (encoding) in a computer and numbers are often used as some character human. Confusing....

You have:

- cities (categorical)

- Than there is some delay information. Delay on what (the transport) you have in it numbers. binning in classes is applying a format.

could be delay short (1-2 day) medium (3-7 day) and so on.

You can apply that on the delays column (no correlation assumed) or getting into corrections for known causes/correlation.

Is your data structure rows/columns in an acceptable way setup or must this be transformed first. What is the topic/event to analyse?

---->-- ja karman --<-----