We are about to start using the SAP Data Surveyor product. I am interested in hearing anyone's experiences in using it. I am also a bit puzzled on some points and would love it if anyone could share some light on them;
According to the documentation, the extract wizards perform the following on Infocubes;
1. Metadata from SAP is extracted to dictionary tables.
2. Stores the fact table as a view so source data still resides in the BW system
and not SAS.
3. Denormalized versions of the Dimension, Master data and SIDs into a SAS
Dataset named D which implies that this data is stored
locally and not as a view
4. Stores texts as formats in separate format catalogs named according to
5. Creates a view of the new star schema.
If dimensions are stored as sas datasets, this creates some potential data governance issues;
a. Dimensions are repeated across cubes so does this mean I get multiple tables
of the same information in addition to what is stored in the database? E.G. If I
have 70 dimensions of 0BPARTNER each with 200 attributes, does this mean I
get 70 datasets of business partner in my sas repository? If I do CDC on this,
it's a nightmare to ensure reconciliation between systems and tables. Am I
correct here? Do dimensions get stored locally or does the Info Object get stored and dimensions as views?
b. If a separate format catalog is created for each extract, can I combine these
and store them in tables? Formats are shared across dimensions and facts so
where possible I see the logical solution is to consolidate this information as
much as possible. Its a similar issue to point a.
c. Do the dictionary table of SAP metadata get stored as physical table or do they
get stored in the metadata server? If they are on the metadata server, what sort
of growth should I expect from Business Content?
I've used the Data Surveyor interface into a SAP BW data warehouse quite a bit over the past 18 months and we had SAS consultancy advice on set-up and best practice (which may subsequently have changed). For what they are worth, here are my thoughts with apologies if I've not answered your questions (or if my experience differs from your own).
Before building jobs to pull data you need to extract the SAP data dictionary to your SAS server via the tool in Management Console. The wizard enables you to select which bits of the data dictionary that you want and extracts it into a series of SAS datasets in a location that will be determined by a library (that needs to have been defined in SAS metadata beforehand). These tables don't take up much space but you'll need to liaise with your SAP guys to ensure that the definitions on both are in synch as any changes on the SAP side won't be available without re-extracting the data dictionary tables. DI Studio uses these tables when building your extraction jobs.
BW InfoCubes are indeed a bit of a handful (and wherever possible we tried to avoid having to extract them!) due to how the application replicates the InfoCube structure as SAS datasets. If you build a job to pull data from an InfoCube in DI Studio, then the wizard will allow you to select which objects you need to pull. A metadata object (and corresponding SAS dataset) will be created for each fact table, dimension and InfoObject (snowflaked mini-dimensions) plus all connecting SID tables (although I think that you are correct in stating that you do have the option to create some of these objects as views to the BW data - we chose not to configure the jobs using this option though). The star schema is created as a view on all other tables. A job will be automatically created to extract each of these objects and a higher level job to run all the "sub-jobs" in sequence.
Here's where my BW knowledge gets a bit shaky. I believe that the dimensions for a particular InfoCube are defined subsets of masterdata tables (which can be shared across multiple InfoCubes) and so you will potentially get multiple copies of the same data, albeit only that relevant to specific cubes. You can only run CDC at InfoCube or DSO level - you can't do this in a straightforward manner on the masterdata tables themselves as they don't have the corresponding changelog tables that CDC requires. So you may have to run full masterdata loads (using the R/3 interface) to ensure that your data is in synch if you want to go down this route.
I think that the text tables also get extracted as part of the automatically built DI Studio job but you can extract these independently (again using the R/3 interface) and write code to generate formats from the extracted tables. I don't believe that the text tables get converted into user-defined formats - the only formats that I ever saw being automatically applied to the extracted SAS tables were pretty basic character, numeric and date ones.