BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Planck
Obsidian | Level 7

Hi,

 

As asked in my post yesterday, I would like to duplicate each row of a big table A in a table B before reducing B in a table C, and it's technically impossible currently to duplicate, too big volume of data.

 

But I noticed that if I split A into parts (1000 rows per part), then I apply the duplication on a part, and then reduce the part before duplicating the next part, it can work. 

 

What I want to do:

1/ Splitting A into A1, A2,...An

2/ Duplicate A1 into B1. Reduce B1 into C1. Delete B1

3/ Duplicate A2 into B2. Reduce B2 into C2. Delete B2

...

n+1/ Duplicate An into Bn. Reduce Bn into Cn. Delete Bn

n+2/ Make an union of all Ci tables into C

 

But technically I don't arrive to split into 1000 rows tables and applying my duplication macro, and my reduction macro.

 

Let's say that:

- my duplication macro is called "duplication(i)" use Ai and output my table Bi,

- my reducing macro is called "reduce(i)", use Bi and output Ci

 

 

%macro getpartextraction(part= 0);
PROC SQL;
CREATE TABLE A&part AS
SELECT * FROM A
WHERE ID_ROW >= (&part-1)*1000+1 AND ID_ROW <= MIN(MAX(ID_ROW), &part*1000)
ORDER BY ID_ROW;
%mend;

data _null_;
set A;

do k = 1 to (MAX(ID_ROW)/1000);
		call execute('%getpartextraction('||k||')');
call execute('%duplication('||k||')');
call execute('%reduce('||k||')'); end; run;

 

For the moment I try to split A to Ai but there is an error

"an error occured executing the workspace "Duplication process". The server is disconnected. "

And "Duplication process" is the name of my programm. I need to rerun everything after this error.

 

Do you know where is the problem?

 

Thanks

 

Edit: ID_ROW is the number of the row in the big table A

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Why are you using PROC SQL? Is your data not in SAS datasets?

%macro getpartextraction(dsn=A, part= 0,size=1000);
data &dsn.&part ;
  set &dsn (firstobs=%eval((&size * &part) +1) obs=%eval(&size*(&part+1)) );
run;
%mend;

Now generate a call each ;

filename code temp;
data _null_;
  set A nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%getpartextraction(dsn=A,size=1000,' part = ')' ;
  end;
  stop;
run;
%include code / source2 ;

View solution in original post

7 REPLIES 7
Reeza
Super User

Are you macros functioning individually?

 

Planck
Obsidian | Level 7

Yes they are working well.

 

Iam trying this:

macro getpartextraction(part =);
PROC SQL;
CREATE TABLE A&part AS
SELECT * FROM A
WHERE ID_ROW>= (&part-1)*1000+1 AND ID_ROW<= &part*1000
ORDER BY ID_ROW;
%mend;

data _null_;
set A;
call execute('%getpartextraction(part='||1||')');
run;

And even this is not working...

It is running since ages without answering my query...

Whereas it is the same code than this:

PROC SQL;
CREATE TABLE A1 AS
SELECT * FROM A
WHERE ID_ROW>= 1 AND ID_ROW<= 1000
ORDER BY ID_ROW;

That's a nonsense...

 

Edit: I finally had the good answer in 30 minutes...lol

ballardw
Super User

When you get unusual results form macros it is a good idea to attempt the code with some of the macro debug options turned on.

 

Options Mprint symbolgen;

will add generated code to the log and descriptions of resolving macro variables which may point to something.

 

If I were working in a server environment a message like that would make think of a network bottleneck or possibly exceeding the amount of work space you have available on the server.

 

It looks as if you are calling those 3 macros mulitple times for each record in a with the exact same parameter value of 'k'. So you may have stacked up a bunch of identical calls and just plain overloaded everything. Note that  your code max(Id_row) in the data _null_ step is exactly the same as Id_row as the variable Id_row is only one variable.

 

Without code for the macros your are calling it is hard to make suggestions as to where to look to likely causes though I would recommend running the macro duplication with manual parameters a time or two.

Planck
Obsidian | Level 7

Thanks for your answer.

 But I will look at my macros after.

There is obviously a problem with my split (see my previous post).

Do you see why it was so long to run it?

ballardw
Super User

@Planck wrote:

Thanks for your answer.

 But I will look at my macros after.

There is obviously a problem with my split (see my previous post).

Do you see why it was so long to run it?


If you make the same call to the three macros multiple times with the same K then you are just doing the same step over and over with the exact same values.

 

Again you have not shared any actual data. HOW many records are in your dataset A? Your data _null_ step attempts to call those macros for every single record in that data set. Since you apparently have several thousands of records in A then you are calling the do loop 1000s of times.

 

HOWEVER sind you have MAX(Id_Row) the the data _null_step does not even execute because you only have one argument and MAX requires at least 2 in a data step. So I don't really see how you are getting the error at all.

So some code must be missing.

 

And as we say: showing a log really helps.

 

 

 

Tom
Super User Tom
Super User

Why are you using PROC SQL? Is your data not in SAS datasets?

%macro getpartextraction(dsn=A, part= 0,size=1000);
data &dsn.&part ;
  set &dsn (firstobs=%eval((&size * &part) +1) obs=%eval(&size*(&part+1)) );
run;
%mend;

Now generate a call each ;

filename code temp;
data _null_;
  set A nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%getpartextraction(dsn=A,size=1000,' part = ')' ;
  end;
  stop;
run;
%include code / source2 ;
Planck
Obsidian | Level 7

Thanks it was exactly what I needed

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1206 views
  • 1 like
  • 4 in conversation