hi all,
i'd love to create multiple datasets from one dataset (sashelp.class) and name each dataset with the target variable(weight height sex age) and the key variable(name). The hard code looks like as follows. Thanks.
data weight(keep=name weight)
height(keep=name height)
sex(keep=name sex)
age(keep=name age);
set sashelp.class;
run;
Some questions to answer:
Will there only be one "key" variable such as name or could there multiples that will be used for all data sets? (not much of a problem but should be considered.)
Will there only be exactly one "target" variable (height weight sex etc.) or could there be multiples? If multiple how many and how would the name of the data set be created from them (serious issue with name length limit for data sets.);
If exactly one key var and only one target var per data set:
%macro split (indsn=, outlib=work, keyvar=, targvarlist= ); data %do i=1 %to %sysfunc(countw(&targvarlist.) ); %let targvar = %scan(&targvarlist.,&i.); &outlib..&targvar. (keep= &keyvar. &targvar) %end; ; /* this ; ends the data statement*/ set &indsn.; run; %mend; %split (indsn=sashelp.class, keyvar=name, targvarlist=age weight height sex)
Warning: not guaranteed for any variable list involving name literals because of the stupid characters folks come up with. Countw and scan functions very likely could fail.
Do NOT use a comma delimited list of target variables.
Some questions to answer:
Will there only be one "key" variable such as name or could there multiples that will be used for all data sets? (not much of a problem but should be considered.)
Will there only be exactly one "target" variable (height weight sex etc.) or could there be multiples? If multiple how many and how would the name of the data set be created from them (serious issue with name length limit for data sets.);
If exactly one key var and only one target var per data set:
%macro split (indsn=, outlib=work, keyvar=, targvarlist= ); data %do i=1 %to %sysfunc(countw(&targvarlist.) ); %let targvar = %scan(&targvarlist.,&i.); &outlib..&targvar. (keep= &keyvar. &targvar) %end; ; /* this ; ends the data statement*/ set &indsn.; run; %mend; %split (indsn=sashelp.class, keyvar=name, targvarlist=age weight height sex)
Warning: not guaranteed for any variable list involving name literals because of the stupid characters folks come up with. Countw and scan functions very likely could fail.
Do NOT use a comma delimited list of target variables.
@EC189QRW wrote:
Yes! the key is unique in customer level and only one target variable "Good/Bad". As i've got a really 'wide' widetable which contains almost like 20 thousands features in a dataset. I'd like to caculate the information value (IV) for each feature before Modelling. The size of dataset is enormous and computation resouces is very limited. Set the widetable for each feature each time is too expensive. So i tried to create multiple dataset in one data step. Accessing the small dataset and do the calculation could save me quite a lot time. Thank you so much for your help, i really appreciate it.
It might not hurt to describe what kind of "information value" needs to be calculated for each of these variables and what you attempted.
It is possible a table that "wide" may have other issues with modeling and possibly belongs in a "long" format.
Splitting the data set may not be necessary, or it may be, but I would first see if the entire data set can be run through the Information Value calculations in the macro here: https://support.sas.com/resources/papers/proceedings13/095-2013.pdf
Generally, splitting the data is a last resort to be used only when other methods won't work. The other drawback to doing this is that now you have to write a macro to loop through the 1000s of data sets to do the calculations.
In your case, if you can't run the macro because of resource limitations, you still might be able to run it on 100 variables at a time, or something like that, rather than splitting the data set into a huge number of data sets that are appropriate for analyzing one variable at a time.
Of course, there will be other difficulties handling 20000+ variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.