Hi ,
Would you please help on coding part for below steps please.please help
data _null_;
set xx.ZBB nobs=nobs; (dataset has 800+ millions of records)
call symputx('nobs',nobs);
run;
%do i=1 to 8;
*create sas code to sort nobs/8 at a time;
* run the sas code;
%end;
*create a wait code to ensure 8 datasets are created;
*run the following four datasteps in parallel;
data four1;
merge eight1-eight2;
by key;
run;
data four2;
merge eight3-eight4;
by key;
data four3;
merge eight5-eight6;
by key;
run;
data four4;
merge eight7-eight8;
by key;
run;
data
eight1
eight2
eight3
eight4
eight5
eight6
eight7
eight8
;
set xx.zbb;
select (mod(_n_,8);
when (1) output eight1;
when (2) output eight2;
when (3) output eight3;
when (4) output eight4;
when (5) output eight5;
when (6) output eight6;
when (7) output eight7;
when (0) output eight8;
end;
run;
takes care of splitting in one step, without having to count at all.
Thanks @Kurt_Bremser .
There are several items you need to address besides the splitting of the data.
If you plan to put all the data sets back together again as the final step, there is no advantage in creating FOUR1 FOUR2 FOUR3 and FOUR4. You might as well just combine all 8 subsets in one step.
MERGE is the wrong tool for the job. It is slower and does the wrong thing. Using MERGE explains the results you got that dropped a handful of observations. That result means that KEY is not unique. I don't know whether it is supposed to be unique or not, but there are some KEYs with more than 1 observation in your original data set. The right tool for the job would be SET instead of MERGE (but keep the BY statement in the program). If your boss told you to use MERGE, you need someone on the team with more SAS knowledge.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.