Is there an option to only keep variables referenced in a program? I would like to shrink a large dataset but would prefer not to have to identify 50-80 variables everytime I start a new analysis.
There isn't such an option. However, you might be able to lessen the workload. Instead of naming 50 to 80 variables toward the end of the program, you might be able to keep just 20 or 30 variables at the beginning of the program. Those would be kept, along with all variables that are calculated by executing the program.
Without more specifics about what the program does, it's difficult to give more specific suggestions.
If the variables you are keeping have recognizable patterns to their variable names, additional possibilities exist. For example, it's easy to keep all variables where the variable name begins with "var_".
Thanks. What is the syntax for keeping all var that start with "var_"?
Use the colon operator to specify a prefix.
Keep var_:;
Use a list for variables that are sequentially labelled.
Keep var1-var100;
Use -- to keep variables that are side by side. All variables in between the two variables listed are kept.
keep var1--middle_var;
The documentation details it better than I can, as usual:
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.