BookmarkSubscribeRSS Feed
benhartley903
Calcite | Level 5
Dear all,

I am posting in the hope that somebody could give me some direction regarding a project that I am starting at the moment.

I am looking for a way to improve the traceability of the variables that I create an use in various SAS datasets.

My (very general) question is about finding a way of automating a program which would return information such as:

a) The definition of the variable, for example, if I have done:
data base;
set base;
a=b+c;
run;
I am looking for a way to return information about the variable 'a' being derived from b and c. And the formula that was used to do this.

b) the source of a variable, i.e. whether it was defined using other variables or imported from another database.

I realise this is a pretty general question, but any tips or suggestions that anyone has would be greatly appreciated.

Thanks

Ben
1 REPLY 1
Cynthia_sas
SAS Super FREQ
Hi:
This is not the type of documentation that is possible "out of the box". The information on how a variable is derived such as you show in this assignment statement):
a=b+c;


is contained in the SAS program where variable A is created -- but the only information that is available about A that is stored in the descriptor portion of the data is the type of information that you would find in PROC CONTENTS (Var name, var label, var type, var format, var informat, var length) -- additionally, PROC CONTENTS has information on dataset creation, modification, index, sort status, etc.

But consider a process flow like this:

1) Dataset WORK.ONE is created with variables B and C from an Oracle table
2) Dataset WORK.TWO is created from WORK.ONE and variable A is created by adding B and C together.
3) Dataset PERM.THREE is created from WORK.TWO and variable D, a category variable, is created by applying a user-defined format to variable A and then some further data manipulation and subsetting is done on variables B and C to create variables E and F --- so now PERM.THREE contains variables A B C D E F -- two of which (B and C) originally came from an Oracle table.

There's no way to trace in PERM.THREE the fact that B & C originally came from an Oracle table or that A was derived from B and C -- except via very good documentation of the programs and the building of a data dictionary or processing flow dictionary.

Sounds like you want CASE tools (http://en.wikipedia.org/wiki/Computer-aided_software_engineering ) or Yourdon dataflow diagrams (http://yourdon.com/strucanalysis/wiki/index.php?title=Chapter_9 and http://www.agilemodeling.com/artifacts/dataFlowDiagram.htm and http://www.cems.uwe.ac.uk/~tdrewry/dfds.htm#DFD%20Principles )

SAS Enterprise Guide has a process flow view of your project, but that does not go down to the column level. SAS also has DI Studio, which is an ETL type of software meant to work within the context of the SAS platform for Business Analytics -- DI Studio has a process flow view of the transformations needed to build target tables (some screen shots are in this paper http://support.sas.com/resources/papers/proceedings09/092-2009.pdf ). SAS also has the SAS Code Analyzer (SCAPROC) described in these papers: http://www2.sas.com/proceedings/forum2008/006-2008.pdf
http://support.sas.com/resources/papers/proceedings10/313-2010.pdf

and here is the doc overview for PROC SCAPROC:
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#/documentation/cdl/en...

For example, if you ran the following program using SCAPROC to capture information on the program:
[pre]
proc scaproc;
record 'c:\temp\record.txt' attr;
run;

data one;
do i = 1 to 3 by 1;
b=10*i;
c=5*i;
output;
end;
format b c 3.0;
label b='B'
c='C';
run;

data two(drop=i);
set one;
a=b+c;
format a comma5.0;
label a='Derived from B+C';
run;

proc scaproc;
write;
run;
[/pre]

This is the ATTR information that would be written to RECORD.TXT (white space and lines added by me for readability). As you can see, there is no visibility of HOW the variables were created, just that they were created and what their format, label, etc are:
[pre]
============= ATTR from creation of WORK.ONE
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:i TYPE:NUMERIC LENGTH:8 LABEL: FORMAT: INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:i TYPE:NUMERIC LENGTH:8 LABEL: FORMAT: INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA OUTPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */

=================== ATTR of WORK.ONE as input to create WORK.TWO
/* JOBSPLIT: ATTR WORK.ONE.DATA INPUT VARIABLE:i TYPE:NUMERIC LENGTH:8 LABEL: FORMAT: INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA INPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.ONE.DATA INPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */

=============== ATTR of WORK.TWO output variables (note that I is not here)
/* JOBSPLIT: ATTR WORK.TWO.DATA OUTPUT VARIABLE:b TYPE:NUMERIC LENGTH:8 LABEL:B FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.TWO.DATA OUTPUT VARIABLE:c TYPE:NUMERIC LENGTH:8 LABEL:C FORMAT:3. INFORMAT: */
/* JOBSPLIT: ATTR WORK.TWO.DATA OUTPUT VARIABLE:a TYPE:NUMERIC LENGTH:8 LABEL:Derived from B+C FORMAT:COMMA5. INFORMAT: */
[/pre]

It really comes down to what you need to produce and at what level of granularity. And another consideration is whether you are documenting code and dataflow from code that you are writing or documenting code and making dataflow diagrams from a process/code that is already written. The output from SCAPROC may help you, but you just as well served with a PROC CONTENTS after every step where you create a new dataset.

cynthia

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 1105 views
  • 0 likes
  • 2 in conversation