BookmarkSubscribeRSS Feed
Sepehrp
Fluorite | Level 6

Hello everyone, I have a data set with about 900 columns in it. I want to export this data set into access, since access cannot handle more than 255 variables, I want to split the data set into four data sets with less than 255 columns. I was wondering is there a way to select a range of columns in sas? Thanks!

9 REPLIES 9
Reeza
Super User
Please don't post the same question in multiple forums.

You can look at the Dictionary tables, the sashelp.vcolumn for your dataset and pick up the variables required. Here's a sample, untested.

proc sql;
select nameinto :var_list1 separated by " "
where libname="MYLIB" and memname='MYDATA'
and varnum between 1 and 255;
quit;

data want1;
set have1 (keep = &var_list1);
run;
Sepehrp
Fluorite | Level 6
Sorry about that, thank you
MaikH_Schutze
Quartz | Level 8

Sure, you can use SAS variable lists to refer to a list of variable names, as long as they are similarly named, e.g. var1:var255

 

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000695105.htm

 

Sepehrp
Fluorite | Level 6
Thank you 🙂
ballardw
Super User

You can use variable lists. If the variables have names that are related and start with the same string of characters then use: CommonText: and would select all of the variables that start with that text. You can also use a - to indicate a range of variables suchas Avarname - ThatVarname. Use the list(s) in a keep statement. Recommend that you have any identification variables in each set.

 

You may need to experiment with the different form of lists to see how many you get.

Example: Suppose a data set has variables named VarName1 to VarName100 such as 100 similar measurements and another 20 variables whose names start with Col the dataset want below would have 45 variables, the 25 sepecified with VarName and all 20 of the Col.

 

 

data want;

   set have (keep= VarName1 - VarName25 Col: )

Sepehrp
Fluorite | Level 6
Thank you
FreelanceReinh
Jade | Level 19

Another approach (similar to Reeza's) is to use ranges of variables according to their varnum (i.e. their internal variable number). As a preparation you would run a PROC CONTENTS on your dataset with the VARNUM option:

 

proc contents data=yourdata varnum;
run;

In section "Variables in Creation Order" of the resulting output you can see (in column with header "#") which variable is no. 1, which one is no. 254, which one is no. 255, no. 508 (=2*254), no. 509, no. 762 (=3*254), no. 763 and which one is last (say, no. 900).

 

Then you can easily form the appropriate ranges in KEEP= options of data steps like the following:

data part1;
set yourdata(keep=idvar numberone--twofivefour);
run;

data part2;
set yourdata(keep=idvar twofivefive--fivezeroeight);
run;

/* and so on */

Of course, the funny names numberone, twofivefour etc. need to be replaced by the corresponding variable names (of no. 1, no. 254 etc.). Dataset PART1 will then contain the first 254 variables, PART2 the second block of 254, PART3 (not shown above) the third and PART4 the remaining ones (about 900-762=138), without any requirements on how your variables are actually named or what their types are. The disadvantage of this approach is that the variables will be grouped just as they happen to be arranged in your dataset, disregarding contents.

 

I strongly support ballardw's recommendation of keeping an additional (e.g. 255th if that is possible) variable in each of the datasets PART1 - PART4 as a common record identifier (named IDVAR in the example code above). Depending on your data, you may even need more than one record-identifying variable (in this case reduce the number of variables in the ranges accordingly to less than 254).

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, you have some good answers above.  I would like to take another vien however.  The question I would ask is Why do you have 900 columns?  If you move this data into an RDBMs strcuture you will find your work is multiplied endlessly.  RDBMs are strcutured to have long tables, with few variables, where duplication is eraditcated.  I would start by first learning this.  Next, once you understand Relational DataBase Model strcucture, then examine your data.  Look at normalising the data, ie. take a wide dataset, and make log dataset, for example:

HAVE

ID  VAR1  VAR2   VAR3 ...

1    abc     def       efg

 

WANT

ID    VARNUM    RESULT

1      1                 abc

1      2                 def

1      3                 efg

...

 

Once you have normalised your data, then start to identify where duplication appears and split this up into different tables:

HAVE

ID    COMPANY    COMPANY_ADDRESS VARNUM   RESULT

1      1                   abc road                        1               abc

1      1                   abc road                         2              def

1      1                   abc road                         3              efg

...

 

WANT1

ID   COMPANY_ID  VARNUM   RESULT

1     1                       1                abc

1     1                       2                def

1     1                       3                efg

...

WANT2

COMPANY_ID  COMPANY_ADDR

1                       abc road

 

What you are doing is changing technologies, so you would want to change the way you think about your data.  In the above, the data is smaller, and can easily be merged, but also specifying columns and such like is far easier which will help you write SQL.  Also, as a word of recomendation, if you need a database, I would look at one of the free databases rather than Access.  Whilst Access is simple to start with a proper database will serve you better in the long run.  Depending on what you need there is SQLlite for embedded databases, MySQL etc.

Sepehrp
Fluorite | Level 6
Thank you, that was really helpful

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 25316 views
  • 6 likes
  • 6 in conversation