DATA Step, Macro, Functions and more

How to generate multiple rows after by variables id a data step?

Reply
Occasional Learner
Posts: 1

How to generate multiple rows after by variables id a data step?

Hi,

 

I have some issues with the following code. The code works well for small datasets (ie. for 100 observations like in the sample code),but for some reason when I try to run it with a grater number of observations the code won't finish and the Work-library fills up (288Gb). I'm running the code on SAS EG version 5.1 if that has something to do with it.

data work.want;
	set work.give (obs=100);
	by ID1 ID2;
	
	difference=0;
	
	if first.ID1 then row=0;
	if first.ID2 then row=0;
	row +1;
	output;

	do while (difference < 100);
		if last.ID2 then do;
				_n_ = row;
				call missing(ABC);
				days = days + plus;
				difference = abs(today() - days);
				row = _n_+1;
				output;
		end;
	end;
		
run;

 

Here's a quick scetch of what I would like to have:

 

I got:

 

 

ABC

ID1

ID2

Days

Plus

123

1A

ac

1

50

453

1A

bd

1

100

654

2A

ac

1

50

456

2A

jk

1

25

234

3A

ac

1

50

 

 

 

  

 

What I want:

 

 

ABC

ID1

ID2

Days

Plus

123

1A

ac

1

50

 

1A

ac

51

 50

 

1A

ac

101

 50

453

1A

bd

1

100

 

1A

bd

101

 100

654

2A

ac

1

50

 

2A

ac

51

 50

 

2A

ac

101

 50

456

2A

jk

1

25

 

2A

jk

26

 25

 

2A

jk

51

 25

 

2A

jk

76

 25

 

2A

jk

101

 25

234

3A

ac

1

50

 

3A

ac

51

 50

 

3A

ac

101

 50

 

The dataset that I'm working with has around 5000 observations (=unique ID1 - ID2 combinations). The variable in the do-while loop should be somewhere around 1095. Variable "plus" varies between 5 and 3650. The current SAS setup that I'm working on is easily capeable of handling these kinds of datasets (I think this calculation would yield around 1M rows, that should be manageable).

 

What am I doing wrong? Are there any more efficient ways to program this?

 

Thanks a bunch in advance.

Regerds,

Tapir

 

Super User
Posts: 11,343

Re: How to generate multiple rows after by variables id a data step?

When I get LOTS of possibly unexpected output look at where OUTPUT statements are located.

I would examine all of the variables involved with this block of code:

do while (difference < 100);
		if last.ID2 then do;
				_n_ = row;
				call missing(ABC);
				days = days + plus;
				difference = abs(today() - days);
				row = _n_+1;
				output;
		end;
	end;

If your PLUS variable ever is negative your value of difference has the potential to cause lots of issues with values of -1, -2, -3, -1000000.

 

I might suggest should the do while value be (0 <= difference <100) or similar

Ask a Question
Discussion stats
  • 1 reply
  • 137 views
  • 0 likes
  • 2 in conversation