## How to keep large numbers of variables

Solved
Occasional Contributor
Posts: 5

# How to keep large numbers of variables

Hey all,

Sas newbie here. I have a large dataset with thousands of variables and I only want to keep some of the variables. I have a list of the variables that I'd like to keep. There must be a simple way to do it other than listing the hundreds of variables I want in the sas code. I'm thinking maybe something with merge?

So pretend the following dataset has thousands of observations:

Obs         Date    Time     Room

1     14SEP2000    10:00    103

2     14SEP2000    10:30    103

3     14SEP2000    11:00    207

4     15SEP2000    10:00    105

5     15SEP2000    10:30    105

6     17SEP2000    11:00    207

But I only want Date and Time (along with hundreds of others) and I have a dataset that contains the variable names that I want. How do I proceed?

Any help would be appreciated.

Accepted Solutions
Solution
‎05-22-2012 06:18 PM
PROC Star
Posts: 8,164

## Re: How to keep large numbers of variables

One way would be to put the vars to keep into a macro variable and then use that variable in a keep statement.  E.g.,

/*create test dataset*/

data have;

retain var1-var10 (10*1);

run;

/*create vars to keep dataset;*/

data vars_to_keep;

input vars \$;

cards;

var2

var5

var7

var9

;

proc sql noprint;

select vars

into :vars separated by " "

from vars_to_keep

;

quit;

data want;

set have (keep=&vars.);

run;

HTH,

Art

All Replies
Solution
‎05-22-2012 06:18 PM
PROC Star
Posts: 8,164

## Re: How to keep large numbers of variables

One way would be to put the vars to keep into a macro variable and then use that variable in a keep statement.  E.g.,

/*create test dataset*/

data have;

retain var1-var10 (10*1);

run;

/*create vars to keep dataset;*/

data vars_to_keep;

input vars \$;

cards;

var2

var5

var7

var9

;

proc sql noprint;

select vars

into :vars separated by " "

from vars_to_keep

;

quit;

data want;

set have (keep=&vars.);

run;

HTH,

Art

Occasional Contributor
Posts: 5

## Re: How to keep large numbers of variables

I'll try this out. Thanks

Super User
Posts: 23,740

## Re: How to keep large numbers of variables

You can create a macro variable that has the variables you want to keep and use that in your data step as noted by Art.

Other possibilities, if your variables have a naming system, ie var1 var2 var3 you can specify var: which will keep all variables that start with the prefix var.

Another option is var1--var10 which will keep all variables between var1--var10

And finally, sometimes it's easier to drop the variables you don't need instead of keeping the ones you do want.

Occasional Contributor
Posts: 5

## Re: How to keep large numbers of variables

Unfortunately there is no naming system. I'm looking at a bunch of snps over multiple chromosomes.

Super User
Posts: 13,554

## Re: How to keep large numbers of variables

```purphant wrote:

Unfortunately there is no naming system.
```

Lesson learned? If you know before hand that a project is likely to generate that many variables some sort of scheme is often helpful. At least we don't have to keep them all 8 characters or fewer as the first time I ran into this kind of issue.

It may not be too late to use to RENAME variables.

Occasional Contributor
Posts: 5

## Re: How to keep large numbers of variables

Ah I would have. I'm not in charge of acquiring the data though, just analyzing it.

Occasional Contributor
Posts: 9

## Re: How to keep large numbers of variables

Not sure if I understand completely, so correct me if I'm wrong.

You have 2 datasets: one with a list of the variables you want and a second with some of those same variables as well as a whole mess more. If this is the case, you can use the APPEND procedure to add the 2nd dataset to the first.

Example:

PROC APPEND      BASE=first

DATA=second

FORCE;

RUN;

This would add all of the observations from the second dataset to those from the first keeping only variables found in the first dataset. Using the option FORCE allows you to concatenate the two datasets even though the second dataset has variables not found in the first.

Occasional Contributor
Posts: 5

## Re: How to keep large numbers of variables

Ah unfortunately my first dataset is just a list of variables with no observations and my second dataset has those variables and their observations along with a bunch of other variables. So I don't think I could append.

Occasional Contributor
Posts: 9

## Re: How to keep large numbers of variables

You can still append to a dataset with variables but no observations. You'd just have to make sure the formats in the DATA dataset are the same as those in the BASE dataset.

Example:

DATA first;

FORMAT var1-var4 1.;

INPUT var1 var2 var3 var4;

DATALINES;

RUN;

DATA second;

FORMAT var1-var6 1.;

INPUT var1 var2 var3 var4 var5 var6;

DATALINES;

1 1 1 2 1 1

1 2 1 1 1 2

2 1 2 1 2 1

2 1 1 1 1 1

2 2 2 2 2 2

;

RUN;

PROC APPEND BASE=first DATA=second FORCE; RUN;

Super User
Posts: 8,114

## Re: How to keep large numbers of variables

I find that it is much easier to use a data step to generate the code I want in a case like this.  It is so much easier to debug because I can look at the results.

Lets say you have one dataset with the list of variables of interest (call it SNPS) and other dataset with the actual data (call it HAVE).

You want to create a new dataset (call it WANT) with some key variables (say DATE TIME ROOM) and the variables of interest.

filename code temp;

data _null_;

if _n_=1 then put 'data want;' / '  set have;' / '  keep date time room ' ;

if eof then put ';' / 'run;' ;

set snps;

put var ;

run;

%include code / source2 ;

🔒 This topic is solved and locked.