Help using Base SAS procedures

TO split dataset by loop and obs should not be in another dataaset

Reply
Regular Contributor
Posts: 229

TO split dataset by loop and obs should not be in another dataaset

Hi i will get dataset i dont know the number of observastions are there i want to divide it in to 5 dataset
and name them as d1 to d5 dynamically how can i do by loop and each dataset obs should not go to another dataset based on id variable
ex:
data test;
input id;
cards;
1
1
2
2
2
3
3
4
5
6
6
7
7
8
run;
output
d1
1
1
1
d2
2
2
3
3
d3
4
5
d4
6
6
d5
7
7
8


As the dataset should divide in to 5 datasets and the id variable obs should be only in one dataset only for ex:d3 is having obs '4' the obs of that dataset should be only in d3 it sholuld not be in another dataset .The test dataset should divide in to 5 not exactly all the five dataset should be having the same no of obs it can vary can any one help me in this.
Frequent Contributor
Frequent Contributor
Posts: 94

Re: TO split dataset

Not sure if there's a better way, but as a quick and dirty you could try something like this:


data test;
format i 8.;
do i = 1 to 10;
output;
end;
run;

data a b c d e error;
set test;
select (mod(_n_, 5));
when (0) output a;
when (1) output b;
when (2) output c;
when (3) output d;
when (4) output e;
otherwise output error;
end;
run;
Super User
Posts: 10,035

Re: TO split dataset

Hi.
I have to leave now.
Give you some clue. in the dictionary table dictionary.tables ,there is a 'nobs' contains the table 's physical number of observations,You can calculated it by yourself to decide the number of obs for dataset d1 - d5 .


Ksharp
New Contributor
Posts: 4

Re: TO split dataset

I am not sure why the complete post is not visible. Please edit to view the complete post.
I think this works:

data a b c d e;
set act.talent nobs=t;
/*In my case I am interested in 5 data sets */
t1=round(t/5);
if _n_<=t1 then output a;
else if _n_<=t1*2 then output b;
else if _n_<=t1*3 then output c;
else if _n_<=t1*4 then output d;
else if _n_<=t1*5 then output e;
run;

The above code is useful when you want to create less number of datasets from the input data set.
Its better to use the below macro code:

%macro numofdatasets(no);
%do i=1 %to &no;
data _&i;
set act.talent nobs=t;
/*In my case I am interested in 3 data sets */
t1=round(t/&no);
if t1*(&i-1) < _n_ <=t1*&i then output _&i;
run;
%end;
%mend;
%numofdatasets(5);

You can change the number of data set you want to create easily in the macro call. But the last data set will have less obervations if the number of observations is not a multiple of the number of datasets required.

Message was edited by: SAS333

Message was edited by: SAS333 Message was edited by: SAS333
Regular Contributor
Posts: 165

Re: TO split dataset

The forum doesn't like the symbols for "less than or equal to" etc. so it is best to use SAS's abbreviation's (lt, le, gt, ge, eq, ne). Only you can edit your own posts.
Super Contributor
Super Contributor
Posts: 365

Re: TO split dataset

Hello,

Use < instead of < and > instead of > (Also add ; after < and ; after >).

SPR

Message was edited by: SPR

Message was edited by: SPR

Message was edited by: SPR Message was edited by: SPR
Super User
Posts: 10,035

Re: TO split dataset by loop and obs should not be in another dataaset

Emmmm.
It is much complicated.
Suppose your id variable is numeric.


[pre]
data test;
input id;
cards;
1
1
2
2
2
3
3
4
5
6
6
7
7
8
;
run;
ods output nlevels=level;
proc freq data=test nlevels ;
table id /out=id(keep=id) nopercent nofreq nocum ;
run;
data temp(keep=id flag);
set id;
if _n_ eq 1 then set level;
retain flag 0;
div=floor(nlevels/5);
if div eq 1 then do;
if mod(_n_,div) eq 0 and flag le 4 then flag+1;
end;
else do;
if mod(_n_,div) eq 1 and flag le 4 then flag+1;
end;
run;
data op;
set temp;
by flag;
length level $ 200;
retain level;
if first.flag then call missing(level);
level=catx(' ',level,id);
if last.flag then output;
run;
data _null_;
set op;
call symputx(cats('d',flag),level);
run;
data d1 d2 d3 d4 d5;
set test;
select;
when (id in (&d1)) output d1;
when (id in (&d2)) output d2;
when (id in (&d3)) output d3;
when (id in (&d4)) output d4;
when (id in (&d5)) output d5;
otherwise;
end;
run;



[/pre]



Ksharp Message was edited by: Ksharp
Ask a Question
Discussion stats
  • 6 replies
  • 592 views
  • 0 likes
  • 6 in conversation