Hi,
As asked in my post yesterday, I would like to duplicate each row of a big table A in a table B before reducing B in a table C, and it's technically impossible currently to duplicate, too big volume of data.
But I noticed that if I split A into parts (1000 rows per part), then I apply the duplication on a part, and then reduce the part before duplicating the next part, it can work.
What I want to do:
1/ Splitting A into A1, A2,...An
2/ Duplicate A1 into B1. Reduce B1 into C1. Delete B1
3/ Duplicate A2 into B2. Reduce B2 into C2. Delete B2
...
n+1/ Duplicate An into Bn. Reduce Bn into Cn. Delete Bn
n+2/ Make an union of all Ci tables into C
But technically I don't arrive to split into 1000 rows tables and applying my duplication macro, and my reduction macro.
Let's say that:
- my duplication macro is called "duplication(i)" use Ai and output my table Bi,
- my reducing macro is called "reduce(i)", use Bi and output Ci
%macro getpartextraction(part= 0); PROC SQL; CREATE TABLE A&part AS SELECT * FROM A WHERE ID_ROW >= (&part-1)*1000+1 AND ID_ROW <= MIN(MAX(ID_ROW), &part*1000) ORDER BY ID_ROW; %mend; data _null_; set A; do k = 1 to (MAX(ID_ROW)/1000); call execute('%getpartextraction('||k||')');
call execute('%duplication('||k||')');
call execute('%reduce('||k||')'); end; run;
For the moment I try to split A to Ai but there is an error
"an error occured executing the workspace "Duplication process". The server is disconnected. "
And "Duplication process" is the name of my programm. I need to rerun everything after this error.
Do you know where is the problem?
Thanks
Edit: ID_ROW is the number of the row in the big table A
Why are you using PROC SQL? Is your data not in SAS datasets?
%macro getpartextraction(dsn=A, part= 0,size=1000);
data &dsn.&part ;
set &dsn (firstobs=%eval((&size * &part) +1) obs=%eval(&size*(&part+1)) );
run;
%mend;
Now generate a call each ;
filename code temp;
data _null_;
set A nobs=nobs ;
file code ;
do part=0 to int(nobs/1000);
put '%getpartextraction(dsn=A,size=1000,' part = ')' ;
end;
stop;
run;
%include code / source2 ;
Are you macros functioning individually?
Yes they are working well.
Iam trying this:
macro getpartextraction(part =); PROC SQL; CREATE TABLE A&part AS SELECT * FROM A WHERE ID_ROW>= (&part-1)*1000+1 AND ID_ROW<= &part*1000 ORDER BY ID_ROW; %mend; data _null_; set A; call execute('%getpartextraction(part='||1||')'); run;
And even this is not working...
It is running since ages without answering my query...
Whereas it is the same code than this:
PROC SQL; CREATE TABLE A1 AS SELECT * FROM A WHERE ID_ROW>= 1 AND ID_ROW<= 1000 ORDER BY ID_ROW;
That's a nonsense...
Edit: I finally had the good answer in 30 minutes...lol
When you get unusual results form macros it is a good idea to attempt the code with some of the macro debug options turned on.
Options Mprint symbolgen;
will add generated code to the log and descriptions of resolving macro variables which may point to something.
If I were working in a server environment a message like that would make think of a network bottleneck or possibly exceeding the amount of work space you have available on the server.
It looks as if you are calling those 3 macros mulitple times for each record in a with the exact same parameter value of 'k'. So you may have stacked up a bunch of identical calls and just plain overloaded everything. Note that your code max(Id_row) in the data _null_ step is exactly the same as Id_row as the variable Id_row is only one variable.
Without code for the macros your are calling it is hard to make suggestions as to where to look to likely causes though I would recommend running the macro duplication with manual parameters a time or two.
Thanks for your answer.
But I will look at my macros after.
There is obviously a problem with my split (see my previous post).
Do you see why it was so long to run it?
@Planck wrote:
Thanks for your answer.
But I will look at my macros after.
There is obviously a problem with my split (see my previous post).
Do you see why it was so long to run it?
If you make the same call to the three macros multiple times with the same K then you are just doing the same step over and over with the exact same values.
Again you have not shared any actual data. HOW many records are in your dataset A? Your data _null_ step attempts to call those macros for every single record in that data set. Since you apparently have several thousands of records in A then you are calling the do loop 1000s of times.
HOWEVER sind you have MAX(Id_Row) the the data _null_step does not even execute because you only have one argument and MAX requires at least 2 in a data step. So I don't really see how you are getting the error at all.
So some code must be missing.
And as we say: showing a log really helps.
Why are you using PROC SQL? Is your data not in SAS datasets?
%macro getpartextraction(dsn=A, part= 0,size=1000);
data &dsn.&part ;
set &dsn (firstobs=%eval((&size * &part) +1) obs=%eval(&size*(&part+1)) );
run;
%mend;
Now generate a call each ;
filename code temp;
data _null_;
set A nobs=nobs ;
file code ;
do part=0 to int(nobs/1000);
put '%getpartextraction(dsn=A,size=1000,' part = ')' ;
end;
stop;
run;
%include code / source2 ;
Thanks it was exactly what I needed
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.