Re: selecting a range of columns in sas

Sepehrp · Posted 11-04-2015 12:34 PM

Hello everyone, I have a data set with about 900 columns in it. I want to export this data set into access, since access cannot handle more than 255 variables, I want to split the data set into four data sets with less than 255 columns. I was wondering is there a way to select a range of columns in sas? Thanks!

Reeza · Posted 11-04-2015 12:46 PM

Please don't post the same question in multiple forums.

You can look at the Dictionary tables, the sashelp.vcolumn for your dataset and pick up the variables required. Here's a sample, untested.

proc sql;
select nameinto :var_list1 separated by " "
where libname="MYLIB" and memname='MYDATA'
and varnum between 1 and 255;
quit;

data want1;
set have1 (keep = &var_list1);
run;

Sepehrp · Posted 11-05-2015 01:24 PM

Sorry about that, thank you

MaikH_Schutze · Posted 11-04-2015 12:49 PM

Sure, you can use SAS variable lists to refer to a list of variable names, as long as they are similarly named, e.g. var1:var255

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695105.htm

Sepehrp · Posted 11-05-2015 01:24 PM

Thank you 🙂

ballardw · Posted 11-04-2015 12:53 PM

You can use variable lists. If the variables have names that are related and start with the same string of characters then use: CommonText: and would select all of the variables that start with that text. You can also use a - to indicate a range of variables suchas Avarname - ThatVarname. Use the list(s) in a keep statement. Recommend that you have any identification variables in each set.

You may need to experiment with the different form of lists to see how many you get.

Example: Suppose a data set has variables named VarName1 to VarName100 such as 100 similar measurements and another 20 variables whose names start with Col the dataset want below would have 45 variables, the 25 sepecified with VarName and all 20 of the Col.

data want;

set have (keep= VarName1 - VarName25 Col: )

Sepehrp · Posted 11-05-2015 01:25 PM

Thank you

FreelanceReinh · Posted 11-04-2015 02:33 PM

Another approach (similar to Reeza's) is to use ranges of variables according to their varnum (i.e. their internal variable number). As a preparation you would run a PROC CONTENTS on your dataset with the VARNUM option:

proc contents data=yourdata varnum;
run;

In section "Variables in Creation Order" of the resulting output you can see (in column with header "#") which variable is no. 1, which one is no. 254, which one is no. 255, no. 508 (=2*254), no. 509, no. 762 (=3*254), no. 763 and which one is last (say, no. 900).

Then you can easily form the appropriate ranges in KEEP= options of data steps like the following:

data part1;
set yourdata(keep=idvar numberone--twofivefour);
run;

data part2;
set yourdata(keep=idvar twofivefive--fivezeroeight);
run;

/* and so on */

Of course, the funny names numberone, twofivefour etc. need to be replaced by the corresponding variable names (of no. 1, no. 254 etc.). Dataset PART1 will then contain the first 254 variables, PART2 the second block of 254, PART3 (not shown above) the third and PART4 the remaining ones (about 900-762=138), without any requirements on how your variables are actually named or what their types are. The disadvantage of this approach is that the variables will be grouped just as they happen to be arranged in your dataset, disregarding contents.

I strongly support ballardw's recommendation of keeping an additional (e.g. 255th if that is possible) variable in each of the datasets PART1 - PART4 as a common record identifier (named IDVAR in the example code above). Depending on your data, you may even need more than one record-identifying variable (in this case reduce the number of variables in the ranges accordingly to less than 254).

RW9 · Posted 11-05-2015 05:23 AM

Well, you have some good answers above. I would like to take another vien however. The question I would ask is Why do you have 900 columns? If you move this data into an RDBMs strcuture you will find your work is multiplied endlessly. RDBMs are strcutured to have long tables, with few variables, where duplication is eraditcated. I would start by first learning this. Next, once you understand Relational DataBase Model strcucture, then examine your data. Look at normalising the data, ie. take a wide dataset, and make log dataset, for example:

HAVE

ID VAR1 VAR2 VAR3 ...

1 abc def efg

WANT

ID VARNUM RESULT

1 1 abc

1 2 def

1 3 efg

...

Once you have normalised your data, then start to identify where duplication appears and split this up into different tables:

HAVE

ID COMPANY COMPANY_ADDRESS VARNUM RESULT

1 1 abc road 1 abc

1 1 abc road 2 def

1 1 abc road 3 efg

...

WANT1

ID COMPANY_ID VARNUM RESULT

1 1 1 abc

1 1 2 def

1 1 3 efg

...

WANT2

COMPANY_ID COMPANY_ADDR

1 abc road

What you are doing is changing technologies, so you would want to change the way you think about your data. In the above, the data is smaller, and can easily be merged, but also specifying columns and such like is far easier which will help you write SQL. Also, as a word of recomendation, if you need a database, I would look at one of the free databases rather than Access. Whilst Access is simple to start with a proper database will serve you better in the long run. Depending on what you need there is SQLlite for embedded databases, MySQL etc.

Sepehrp · Posted 11-05-2015 01:27 PM

Thank you, that was really helpful

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away