Hi:
This is not the type of documentation that is possible "out of the box". The information on how a variable is derived such as you show in this assignment statement):
a=b+c;
is contained in the SAS program where variable A is created -- but the only information that is available about A that is stored in the descriptor portion of the data is the type of information that you would find in PROC CONTENTS (Var name, var label, var type, var format, var informat, var length) -- additionally, PROC CONTENTS has information on dataset creation, modification, index, sort status, etc.
But consider a process flow like this:
1) Dataset WORK.ONE is created with variables B and C from an Oracle table
2) Dataset WORK.TWO is created from WORK.ONE and variable A is created by adding B and C together.
3) Dataset PERM.THREE is created from WORK.TWO and variable D, a category variable, is created by applying a user-defined format to variable A and then some further data manipulation and subsetting is done on variables B and C to create variables E and F --- so now PERM.THREE contains variables A B C D E F -- two of which (B and C) originally came from an Oracle table.
There's no way to trace in PERM.THREE the fact that B & C originally came from an Oracle table or that A was derived from B and C -- except via very good documentation of the programs and the building of a data dictionary or processing flow dictionary.
Sounds like you want CASE tools (http://en.wikipedia.org/wiki/Computer-aided_software_engineering ) or Yourdon dataflow diagrams (http://yourdon.com/strucanalysis/wiki/index.php?title=Chapter_9 and
http://www.agilemodeling.com/artifacts/dataFlowDiagram.htm and
http://www.cems.uwe.ac.uk/~tdrewry/dfds.htm#DFD%20Principles )
SAS Enterprise Guide has a process flow view of your project, but that does not go down to the column level. SAS also has DI Studio, which is an ETL type of software meant to work within the context of the SAS platform for Business Analytics -- DI Studio has a process flow view of the transformations needed to build target tables (some screen shots are in this paper
http://support.sas.com/resources/papers/proceedings09/092-2009.pdf ). SAS also has the SAS Code Analyzer (SCAPROC) described in these papers:
http://www2.sas.com/proceedings/forum2008/006-2008.pdf
http://support.sas.com/resources/papers/proceedings10/313-2010.pdf
and here is the doc overview for PROC SCAPROC:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#/documentation/cdl/en...
For example, if you ran the following program using SCAPROC to capture information on the program:
[pre]
proc scaproc;
record 'c:\temp\record.txt' attr;
run;
data one;
do i = 1 to 3 by 1;
b=10*i;
c=5*i;
output;
end;
format b c 3.0;
label b='B'
c='C';
run;
data two(drop=i);
set one;
a=b+c;
format a comma5.0;
label a='Derived from B+C';
run;
proc scaproc;
write;
run;
[/pre]
This is the ATTR information that would be written to RECORD.TXT (white space and lines added by me for readability). As you can see, there is no visibility of HOW the variables were created, just that they were created and what their format, label, etc are:
[pre]
============= ATTR from creation of WORK.ONE
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:i TYPE:NUMERIC LENGTH:8 LABEL: FORMAT: INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:i TYPE:NUMERIC LENGTH:8 LABEL: FORMAT: INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */
=================== ATTR of WORK.ONE as input to create WORK.TWO
/* JOBSPLIT: ATTR WORK.ONE.DATA INPUT VARIABLE:i TYPE:NUMERIC LENGTH:8 LABEL: FORMAT: INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA INPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA INPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */
=============== ATTR of WORK.TWO output variables (note that I is not here)
/* JOBSPLIT: ATTR WORK.TWO.DATA OUTPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.TWO.DATA OUTPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.TWO.DATA OUTPUT VARIABLE:a TYPE:NUMERIC LENGTH:8 LABEL:Derived from B+C FORMAT:COMMA5. INFORMAT: */
[/pre]
It really comes down to what you need to produce and at what level of granularity. And another consideration is whether you are documenting code and dataflow from code that you are writing or documenting code and making dataflow diagrams from a process/code that is already written. The output from SCAPROC may help you, but you just as well served with a PROC CONTENTS after every step where you create a new dataset.
cynthia