When using the wizard to define a new data source, don’t change the metadata there. Instead, use a Metadata node in your flow. If you change the metadata in the wizard, then you will lose visibility to the changes you made…and if you transport your flow to someone else, they won’t see what you did either.
Here are some screenshots to illustrate:
Using the Advanced options in Step 4 of the data source wizard allows some pretty cool features for automatically detecting useful metadata settings. However, it also causes some lost visibility in your flow to what/how the metadata was set.
In Step 5, you can manually change metadata settings. But all this functionality is available in a Metadata node that you can incorporate into your flow…and by doing this in your flow, rather than in the wizard, you will be able to maintain visibility to what was set.
If you are dealing with a large dataset, and know that you don’t want to pull the whole thing into your flow, it could make sense to do your sampling in step 6 of the data source wizard. However, as with setting metadata, it might be better to do the sampling with a sampling node, in your flow, to ensure you have visibility to the fact that sampling is being done.
I just make a habit of having the very first node, immediately following the data source, be a metadata node. You can find the metadata node under the Utility tab.
Have questions related to this tip? Ask them on the SAS Data Mining and Machine Learning Community to get perspective from a large pool of SAS Enterprise Miner experts. Simply click "New Message" (must be logged in!) and ask away.