About Mohamed

DougWielenga · ‎08-14-2017

In general, this would not typically be a concern of a data mining problem since the methods and software are intended for large data sets. Sampling can be helpful when resources or limited or when the distributions in the data are not reflective of the population, but the goal is to have your training/validation/test sample be as representative as possible. If you are sampling, however, one approach would be to cluster the first data set, score the second data set using the cluster solution obtained on the first, and then sample proportionally from the second data set based on the distribution of clusters in the first data set. You might consider stratifying on certain grouping variables (e.g. gender, location, etc...) to make the distribution as balanced as possible. I hope this helps! Doug

oloolo · ‎04-04-2011

change the variable name CNT in data one appropriately, and use PROC SURVEYSELECT with strata check the manual of SURVEYSELECT

Mohamed · ‎03-18-2011

Thank you David

MohamedS · ‎09-03-2007

thank you, I have been searched in the notes you sent me. but the answer was very public. but, also I solved the problem by changing the -LOCAL option in sas config file from English to arabic

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

subsetting data set due to no of obs in another one

Re: Submit EM Model in Batch Mode

How to apply chracteristics of data on another data Set

Submit EM Model in Batch Mode

[Error] Some code points did not transcode

Re: How to apply chracteristics of data on another data Set

Re: subsetting data set due to no of obs in another one

Re: Submit EM Model in Batch Mode

Re: [Error] Some code points did not transcode