BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DavidWang
Calcite | Level 5

Hi! I am a junior SAS analyst.

I intend to split data into train and test sets, and use the model built from train set to predict data in test set, the number of observation is up to 50000 or more.

the easiest way that I think of is to use the syntax "PROC SURVEYSELECT" to random-sample observations from whole data. For example,

I may ask SAS to random-sample 30% as test set, (and the rest 70% is train set):

PROC SURVEYSELECT DATA=whole.data OUT=test.set METHOD=srs SAMPRATE=0.3;

RUN;

Now, I have a test set in the dataset: 'test.set', however:

1.how could I create a dataset (e.g. 'train.set') to accommodate the rest 70% data?

2.After using 'train.set' to build a predictive model  (e.g. linear model), how could I use this model built in the 'train.set' to

  predict data in the 'test.set'? and let the output revealing every predicted value and residual?

Thanks for your patience!

David

1 ACCEPTED SOLUTION

Accepted Solutions
stat_sas
Ammonite | Level 13

Hi,

Just try the syntax given above. Flag variable "selected" will be created in the data set "all". Outall is part of syntax and "all" is the resultant data set.

View solution in original post

5 REPLIES 5
stat_sas
Ammonite | Level 13

Hi,

Just add outall in the syntax to create a dataset all that adds a flag variable "selected" which is 1 for test sample and 0 for remaining observations which may be considered as training set. So you can use selected=0 as a training dataset for the model development and selected=1 for testing.

PROC SURVEYSELECT DATA=whole.data outall OUT=all METHOD=srs SAMPRATE=0.3;

RUN;

DavidWang
Calcite | Level 5

Hi! Thanks for your prompt reply!!

But I still have some questions:

1.How to make "a flag variable: selected"? and assign values '1' and '0'?

2.Is 'outall' a syntax or just a nominal name?

If convenience, hope that you can share the detailed procedures.

Sorry, I am not accustomed to data management.

Many thanks!

David

stat_sas
Ammonite | Level 13

Hi,

Just try the syntax given above. Flag variable "selected" will be created in the data set "all". Outall is part of syntax and "all" is the resultant data set.

DavidWang
Calcite | Level 5

Hi!

I have successfully split the whole data into two parts: train set and test set, and I use the syntax

PROC FREQ to check whether they are split as the proportion I need, and it's done! Thanks

Now, I have used the train set (only 'selected=0' data are used) to build a linear model, and estimate the BETAs,

however, I do not know how to use this selected MODEL to predict data in the test set?

IN BRIEF, how to use a selected model to predict (or validate) data in test set?

warm regards

David

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 10977 views
  • 2 likes
  • 2 in conversation