Hi! I am a junior SAS analyst. I intend to split data into train and test sets, and use the model built from train set to predict data in test set, the number of observation is up to 50000 or more. the easiest way that I think of is to use the syntax "PROC SURVEYSELECT" to random-sample observations from whole data. For example, I may ask SAS to random-sample 30% as test set, (and the rest 70% is train set): PROC SURVEYSELECT DATA=whole.data OUT=test.set METHOD=srs SAMPRATE=0.3; RUN; Now, I have a test set in the dataset: 'test.set', however: 1.how could I create a dataset (e.g. 'train.set') to accommodate the rest 70% data? 2.After using 'train.set' to build a predictive model (e.g. linear model), how could I use this model built in the 'train.set' to predict data in the 'test.set'? and let the output revealing every predicted value and residual? Thanks for your patience! David
... View more