BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I want to generate a list of variables that are being used by the program. For example, my dataset contains 500 variables but my program may only use 50 of them. Is there a way I can create a list to show which 50 variables that are being used by my program? (some of them may being dropped during the process).

Thanks in advanced,

Chris
13 REPLIES 13
DanielSantos
Barite | Level 11
Wow.

From my knowledge, not an easy thing to do!

I guess you could probably manage to do this by building a parser, but this would be very complex and still you could miss some vars.

Now, if your source program resides inside the metadata server, and thus was generated from there, a simple impact analysis query to the server could return that info.

Cheers from Portugal

Daniel Santos @ www.cgd.pt
deleted_user
Not applicable
Hi Daniel,

Thanks for the reply, I knew this is not an easy task, but I just don't want to do it manually and go through the program line by line.

Any suggestion will be greatly appreciate!!

Thanks,

Chris
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Sure you can - explore using the VNAME function with two arrays, one for _NUMERIC_ and another for _CHARACTER_ -- sample code below.

Scott Barry
SBBWorks, Inc.

data _null_;
* create some variables. ;
retain a1-a10 ' ';
retain n1-n5 0;
length x $200;
array allnumvars (*) _numeric_;
array allcharvars (*) _character_;
do i=1 to dim(allnumvars);
x = vname(allnumvars(i));
putlog x=;
end;
do i=1 to dim(allcharvars);
x = vname(allcharvars(i));
putlog x=;
end;
stop;
run;
deleted_user
Not applicable
Hi Scott,

Thanks for your reply. One question, this section of code will give me a list of variables in a dataset, but not necessarily "used" by my program, right?

Thanks,

Chris
Cynthia_sas
Diamond | Level 26
Hi:
I'm not actually sure what you want. The variables in the PDV (program data vector) are all the variables which are "available" for use in the program. Consider this program:
[pre]
data new;
set sashelp.class;
age_in_5 = age + 5;
weight_in_5 = weight + 15;
run;
[/pre]

In this program, the PDV has these variables:
_N_ _ERROR_ NAME SEX AGE HEIGHT WEIGHT AGE_IN_5 and WEIGHT_IN_5

The only variables -used- in the program are AGE and WEIGHT, (SAS uses _N_ and _ERROR_ -- but you generally don't worry about those) but the program also creates AGE_IN_5 and WEIGHT_IN_5 -- do they count -- is being CREATED the same as being USED?

Even though you didn't "technically" use NAME, would the new file, WORK.NEW be meaningful without NAME??? Probably not...but it depends on what you're doing with WORK.NEW. Maybe you need a copy of the original dataset, including NAME, SEX, HEIGHT, AGE, WEIGHT, AGE_IN_5 and WEIGHT_IN_5 -- maybe not. Possibly a more useful way to code the above program might be:
[pre]
data new(keep=name age weight age_in_5 weight_in_5);
set sashelp.class(keep=name age weight);
age_in_5 = age + 5;
weight_in_5 = weight + 15;
run;
[/pre]

In my mind, the issue is not so much which variables are being used, per se, as which variables are needed after the program is over. For example, if you have a dataset of 500 variables -- you might need to go in and fix a zip code on some of the observations. In this instance, so what if zip code was the only variable techically "used"?? If all subsequent processes depend on there being 500 variables in the data set, in a standard input/output chart, I'd show 500 variables going into the program, show 1 variable being changed based on some condition and then show 500 variables coming out of the program.

Honestly, I don't understand the need for this distinction (between the variables in the dataset and the variables "used" in the program).

cynthia
deleted_user
Not applicable
Hi Cynthia,

Thanks for your respond. The reason that I need to identify which variables are actually being used by my program is that I cannot include all variables in my program. The size of the dataset will be too big. I am talking about at least 2 GB per year of data and I am using like 6 years of them (in addition to that, my program will also create tons of new variables). However, I knew that my program may only use those 50 out of those 500. So if I am able to identify which are actually used by the program, I can drop the rest. One problem currently I encounter is my program is basically too long for me to go through it line by line in order to manually check whether this variable is being used or not. This just not feasible and also at the same time subject to human error.

Of course, if I cannot find a way to do it automatically, at the end of the day I have to do it manually. This is just time consuming and the result may not be accurate neither.

Thanks,

Chris
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Suggest that if you have a SAS application requirement for many variables (possibly having a date/period identification), consider organizing the SAS data library members (naming them) according to year/month, for example, to reduce the number of variables/observations.

If this is not feasible, I suspect you can trim your "active" variable selection programmatically -- remember the DROP statement wins over the KEEP statement when coded in the same DATA step.

Maybe you can share more information about your SAS variable structure, so we can help you treat the real problem rather than addressing the symptom?

Scott Barry
SBBWorks, Inc.
Peter_C
Rhodochrosite | Level 12
cykso

"actually being used by program"
is an interesting concept.
Assuming your program creates something, then the results show these columns required.
Other columns required to achieve these results, include join/merge keys not required on output and columns contributing to derivations that appear on results.
In addition to "columns" you might need to know about "paths" to inputs and outputs.Fortunately, you seem to have a more restricted requirement about paths.
Unfortunately for you, the SAS languages have many efficiencies for the programmer that remove the need to define everything. Simple examples of these are variable name lists like _character_ and varA--varZ and implicit lists like a proc print with no VAR statement. Another barrier to a "~simple" parsing of program code are macro variables (you might discount actual macros, because you could parse SAS logs which use the MPRINT option).
One aspect of your objective is that the data-volume "2GB for each of 6 years", is less important than the 50/500 number of columns.

Separately, I won't accept statement. "too long for me to go through it line by line in order to manually check whether this variable is being used or not. This just not feasible ". In the absence of alternative documentation, that redefinition of the processes must be done to assure and prove to users of that information, that it is valid. (how acceptable is information processing, that derives financial adjustment to provisions and reserves for example, if the process cannot be certified?)

Good luck with re-documenting your programs.

PeterC
deleted_user
Not applicable
Hi Peter,

Again, thanks for your reply. Personally I won't accept that "too long for me to go throught blah blah blah" excuse/reason as well. I am just trying/hoping to able to find an alternative way to solve this problem before I need to scan the program line by line. Of course, at the end I still need to achieve the goal to cut down the file size. This is just not the way I prefer.

Thanks everyone for your time and suggestions.

Looks like I am running out of options.

Chris
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Honest, it's unclear what the OP "cykso" has gleaned from this thread, given the last reply. I provided a valid option that would address the original request -- now it's your opportunity to take the sample code and implement it (or not).

Scott Barry
SBBWorks, Inc.
Peter_C
Rhodochrosite | Level 12
In SAS92, you might find PROC SCAPROC helps to generate all attribs on datasets read/written
Other than that, I do these things :
1 clip the relevant part of sas log
2 paste into the old program editor
3 delete lines that are not code (like notes and warnings)
4 text-flow the syntax (line command TF8 )
5 copy all to enhanced editor
6 sort in the enhanced editor (one of the commands not assigned to a key is sort selection - I generally assign to ctrl+alt+S)
7 remove reserved words like function and statement names.
8 go back to writing a proper syntax parser (!)

good luck
PeterC
deleted_user
Not applicable
Hi PeterC,

I tried, too bad that my SAS don't have this SCAPROC function.

Thanks,

Chris
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Not necessarily -- take the code snippet I provided and you could wrap that logic within an IF _N_=1 THEN DO / END, and send the generated output to an alternate destination for external capture and/or review away from your SASLOG.

As demonstrated, you get a bare-bones list of all SAS variables in the PDV.

Scott Barry
SBBWorks, Inc.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 3790 views
  • 0 likes
  • 5 in conversation