Hi
I am trying to split the data set into many here is the data have :
data have;
input cusip_id Name $10. date_issue :ddmmyy10. type $10.;
format date_issue ddmmyy10.;
datalines;
10343452 Zingas 01/04/2016 bench
10343489 Stephanie 01/04/2014 main
10343445 Olivia 01/04/2012 main
10343456 Amstel 01/04/2011 bench
10343457 KEO 01/04/2017 main
;
run;
My solution:
data data_main;
set have;
If type="main";
run;
data data_bech;
set have;
If type="bench";
run;
This is in fact an awkard/cumbersome solution. I would like to utilize a more efficient algorithm for advanced SAS programming to avoid repetition of code. e.g. using proc sql distinct to set the values in a macro variable
Please I am looking forward from your prompt reply
proc sql noprint;
select count(distinct(type)) into :c from have;
select distinct type into :typ1 - :typ%trim(%left(&c)) from have;
quit;
%macro create_table;
%do i=1 %to &c;
data &&typ&i;
set have ;
if upcase(type) eq %upcase("&&typ&i");
run;
%end;
%mend;
%create_table;
Generally you won't want to split a data set in such a manner. Learning to judiciously code BY, CLASS, and WHERE statements will give you great control over slicing and dicing in the reporting and analytic procedures you use.
Depending on the size of the original file, you can use a hash of hash approach to dynamically split a table into value named sub tables.
From samples page
/* Richard A. DeVenezia
* www.devenezia.com
* 25May2004
*
* Similar to material originally posted to SAS-L on May 24, 2004
* "Hash of hashes and dynamic output"
*/
* sample data with several groups;
data have0;
do group = 'a','c','d','e';
do _n_ = 1 to 100;
date = today() - rand('integer', 365);
satellite_1 + 1;
satellite_2 = 0;
satellite_A = 'foo bar';
output;
end;
end;
format date date9.;
run;
* disorder for good measure;
proc sql;
create table have as select * from have0 order by ranuni(1);
quit;
* split one data set into several data sets named according to variable group;
data _null_;
declare hash h0 ();
h0.defineKey ('group');
h0.defineData ('group', 'h'); * hash of hashes, data element h is a hash object;
h0.defineDone ();
declare hash h;
do _n_ = 1 by 1 until (endOfDataset);
set have end = endOfDataset;
if group ne lag(group) then
if 0 ne h0.find() then do;
* create a hash for the group that will be a sub data set;
h = _new_ hash (ordered:'a');
h.defineKey ('_n_');
h.defineData ('group', 'date', 'satellite_1', 'satellite_2', 'satellite_A');
h.defineDone ();
h0.add();
end;
* add data to the groups hash;
h.replace ();
end;
declare hiter hi ('h0');
* create one table for each group;
do while (hi.next() = 0);
h.output (dataset:'group_'||group);
end;
run;
Other approaches would use a pre-step to generate source code that is a wallpaper of if/then or select/when statements that output to sub tables named according to group value. The pre step would either put the source code in macro variable(s) or write the source code to a file that is %included.
Unless the size of the data set is humongous, keeping it as one is usually a better approach. Then you do the next steps (analysis, reporting, plotting) using a BY statement, or excluding specific categories from the analysis.
The static solution is this:
data
data_main
data_bench
;
set have;
select (type);
when ("main") output data_main;
when ("bench") output data_bench;
otherwise;
end;
run;
This can be made dynamic:
proc sort
data=have (keep=type)
out=control
nodupkey
;
by type;
run;
filename incfile temp;
data _null_;
file incfile;
length line $100;
put "data";
do until eof1;
set control end=eof1;
line = " data_" !! type;
put line;
end:
put ";";
put "select (type);";
do until eof2;
set control end=eof2;
line = "when ('" !! strip(type) !! "') output data_" !! strip(type) !! ";";
put line;
end;
put "end;run;";
run;
%inc incfile;
Untested, posted from my tablet.
proc sql noprint;
select count(distinct(type)) into :c from have;
select distinct type into :typ1 - :typ%trim(%left(&c)) from have;
quit;
%macro create_table;
%do i=1 %to &c;
data &&typ&i;
set have ;
if upcase(type) eq %upcase("&&typ&i");
run;
%end;
%mend;
%create_table;
There is no need to run the query twice just to count how many values it returns. PROC SQL will count them for you.
proc sql noprint;
select distinct type into :typ1 - from have;
%let c=&sqlobs;
quit;
Please note that @singhsahab 's solution will read the.whole "have" dataset for every distinct value of type. With a large dataset and a high cardinality, this can become VERY time consuming.
My solution reads "have" only twice (sort and the dynamically created data step), regardless of the cardinality of type.
If I may, I would do it with one data read with help of the SELECT statement:
data have;
input cusip_id Name $10. date_issue :ddmmyy10. type $10.;
format date_issue ddmmyy10.;
datalines;
10343452 Zingas 01/04/2016 bench
10343489 Stephanie 01/04/2014 main
10343445 Olivia 01/04/2012 main
10343456 Amstel 01/04/2011 bench
10343457 KEO 01/04/2017 main
;
run;
proc sql noprint;
select distinct type
into :typ1 -
from have;
%let c = &sqlobs.;
quit;
%macro create_table_with_1_read();
data
%do i=1 %to &c;
&&typ&i
%end;
;
set have ;
select;
%do i=1 %to &c;
when (upcase(type) eq %upcase("&&typ&i")) output &&typ&i;
%end;
otherwise put "ERROR: unknown type";
end;
run;
%mend create_table_with_1_read;
%create_table_with_1_read()
Bart
Thinking in the large, if you order the types by frequency count descending you will have the most common true whens first, reducing the number of tests that are done in your submitted codegen.
proc sql noprint;
select type
into :typ1-
from
( select type, count(*) as freq
from have
group by type
)
order by freq desc
;
%let c = &sqlobs.;
quit;
%put &=typ1; *main (freq=3);
%put &=typ2; *bench (freq=2);
From the question above, if I want to get the datasets with name 'dataset1, dataset2, dataset3,……,dataset100', rather than naming the datasets by their observation's names. Could you please tell me how should I modify the code?
First question, as always in such cases: why do you think it necessary to split your dataset like this?
Second, what would be the rule for putting particular source observations into a particular target dataset?
1. Since when I try to do rolling regression, there will be a 'sort failed' error, maybe because the amount of data is too large,
2. I want to split the dataset based on the different observations in 'characteristics', but I don't want to name the dataset using the name of the observations, I want to use dataset1, dataset2, dataset3,...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.