BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EC189QRW
Obsidian | Level 7

hi all,

i'd love to create multiple datasets  from one dataset (sashelp.class) and name each dataset with the target variable(weight height sex age) and the key variable(name).  The hard code looks like as follows.  Thanks.

 

data weight(keep=name weight)
         height(keep=name height)
         sex(keep=name sex)
         age(keep=name age);
set sashelp.class;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Some questions to answer:

Will there only be one "key" variable such as name or could there multiples that will be used for all data sets? (not much of a problem but should be considered.)

Will there only be exactly  one "target" variable (height weight sex etc.) or could there be multiples? If multiple how many and how would the name of the data set be created from them (serious issue with name length limit for data sets.);

 

If exactly one key var and only one target var per data set:

 

%macro split (indsn=, outlib=work, keyvar=, targvarlist= );

data 
%do i=1 %to %sysfunc(countw(&targvarlist.) );
   %let targvar = %scan(&targvarlist.,&i.);
   &outlib..&targvar. (keep= &keyvar. &targvar)
%end;
;  /* this ; ends the data statement*/

set &indsn.;
run;
%mend;

%split (indsn=sashelp.class, keyvar=name, targvarlist=age weight height sex)


Warning: not guaranteed for any variable list involving name literals because of the stupid characters folks come up with. Countw and scan functions very likely could fail.

Do NOT use a comma delimited list of target variables.

View solution in original post

6 REPLIES 6
ballardw
Super User

Some questions to answer:

Will there only be one "key" variable such as name or could there multiples that will be used for all data sets? (not much of a problem but should be considered.)

Will there only be exactly  one "target" variable (height weight sex etc.) or could there be multiples? If multiple how many and how would the name of the data set be created from them (serious issue with name length limit for data sets.);

 

If exactly one key var and only one target var per data set:

 

%macro split (indsn=, outlib=work, keyvar=, targvarlist= );

data 
%do i=1 %to %sysfunc(countw(&targvarlist.) );
   %let targvar = %scan(&targvarlist.,&i.);
   &outlib..&targvar. (keep= &keyvar. &targvar)
%end;
;  /* this ; ends the data statement*/

set &indsn.;
run;
%mend;

%split (indsn=sashelp.class, keyvar=name, targvarlist=age weight height sex)


Warning: not guaranteed for any variable list involving name literals because of the stupid characters folks come up with. Countw and scan functions very likely could fail.

Do NOT use a comma delimited list of target variables.

EC189QRW
Obsidian | Level 7
Yes! the key is unique in customer level and only one target variable "Good/Bad". As i've got a really 'wide' widetable which contains almost like 20 thousands features in a dataset. I'd like to caculate the information value (IV) for each feature before Modelling. The size of dataset is enormous and computation resouces is very limited. Set the widetable for each feature each time is too expensive. So i tried to create multiple dataset in one data step. Accessing the small dataset and do the calculation could save me quite a lot time. Thank you so much for your help, i really appreciate it.
ballardw
Super User

@EC189QRW wrote:
Yes! the key is unique in customer level and only one target variable "Good/Bad". As i've got a really 'wide' widetable which contains almost like 20 thousands features in a dataset. I'd like to caculate the information value (IV) for each feature before Modelling. The size of dataset is enormous and computation resouces is very limited. Set the widetable for each feature each time is too expensive. So i tried to create multiple dataset in one data step. Accessing the small dataset and do the calculation could save me quite a lot time. Thank you so much for your help, i really appreciate it.

It might not hurt to describe what kind of "information value" needs to be calculated for each of these variables and what you attempted.

 

It is possible a table that "wide" may have other issues with modeling and possibly belongs in a "long" format.

EC189QRW
Obsidian | Level 7
IV which is designed for screening variables in modeling. I am trying to bulild up a Credit Score Card to predict if a customer default or not in future. The variables come from customers' transaction and their behaviour. if you feel interested, try Naeem Siddiqi's book"Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring ", it might help.
PaigeMiller
Diamond | Level 26

Splitting the data set may not be necessary, or it may be, but I would first see if the entire data set can be run through the Information Value calculations in the macro here: https://support.sas.com/resources/papers/proceedings13/095-2013.pdf

 

Generally, splitting the data is a last resort to be used only when other methods won't work. The other drawback to doing this is that now you have to write a macro to loop through the 1000s of data sets to do the calculations.

 

In your case, if you can't run the macro because of resource limitations, you still might be able to run it on 100 variables at a time, or something like that, rather than splitting the data set into a huge number of data sets that are appropriate for analyzing one variable at a time.

 

Of course, there will be other difficulties handling 20000+ variables.

--
Paige Miller
EC189QRW
Obsidian | Level 7
I love this paper. Thanks!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2547 views
  • 4 likes
  • 3 in conversation