DATA Step, Macro, Functions and more

How to duplicate rows

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

How to duplicate rows

Hi,  I have a list of observations I would like to dupicate based on the variable Twin.  A row with Twin=1 contains informations about 2 individuals. After duplicating the row, i want to replace information in Age1 and Age2 by Sex1 and Sex2, all variable remains the same.

 

 

HAVE:

Id   Twin   Age1   Age2    Sex1    Sex2

1       1      45        35       M           F     

2       0      32         .         F                  

3       0      26        .          M                

4       1      20        21       M           F  

 

My dataset contains about 300.000 rows.

 

WANT: 

Id   Twin   Age1   Age2    Sex1    Sex2

1       1      45        35       M           F  

1       1      35        35       F            F      

2       0      32         .         F                  

3       0      26        .          M                    

4       1      20        21       M           F 

4       1      21        21       F           F

 

Thanks for your help.


Accepted Solutions
Solution
‎05-03-2017 08:50 AM
Super Contributor
Posts: 408

Re: How to duplicate rows

If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.

 

 

data want;
	set have;

	if twin = 0 then
		do;
			new_twin = 0;
			output;
		end;
	else if twin = 1 then
		do;
			new_twin=1;
			output;
			new_id+1;
			new_twin=2;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
run;

 

 

 

One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.

 

Regards,

- Jan.

View solution in original post


All Replies
PROC Star
Posts: 551

Re: How to duplicate rows

Like this?

 

data have;
input Id Twin Age1 Age2 Sex1 $ Sex2 $;
infile datalines missover;
datalines;
1 1 45 35 M F
2 0 32 .  F  
3 0 26 .  M  
4 1 20 21 M F
;

data want;
	set have;
	output;
	if twin = 1 then do;
		age1 = age2;
		sex1 = sex2;
		output;
	end;
run;
Contributor
Posts: 30

Re: How to duplicate rows

Exactly what I want thank very much. Seems quite easy to you Smiley Happy.

oh i just forget to increment an new variable let's say new_id with distinct id number.

 

again thank u very much. I'm quite new in programming. Smiley Happy

PROC Star
Posts: 551

Re: How to duplicate rows

No problem, posting your problem like this with a clear description of your desired result makes it easy to help.

 

I agree with @KurtBremser, you should drop those variables, they are redundant.

 

Now about incrementing ID, how do you want this done? Eg your first set of twins have id=1, so incrementing this would give you id=1 and id=2 in the first two rows and then your third row would also have id=2 from your original data?

Contributor
Posts: 30

Re: How to duplicate rows

I express myself poorly sorry, easier to understand with a table. new_id in order to get unique value per row and new_twin to distinct original row from the duplicate one.

 

I hope it's clear enough for you.

Yep definitely i'll drop redundant variables.

 

Id   Twin   Age1   Age2    Sex1    Sex2  new_Id new_twin

1       1      45        35       M           F         1             1

1       1      35        35       F            F         2             2

2       0      32         .         F                       3             0

3       0      26        .          M                      4             0

4       1      20        21       M           F         5             1

4       1      21        21       F            F         6             2

 

 

thank u very much again Smiley Happy

Solution
‎05-03-2017 08:50 AM
Super Contributor
Posts: 408

Re: How to duplicate rows

If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.

 

 

data want;
	set have;

	if twin = 0 then
		do;
			new_twin = 0;
			output;
		end;
	else if twin = 1 then
		do;
			new_twin=1;
			output;
			new_id+1;
			new_twin=2;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
run;

 

 

 

One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.

 

Regards,

- Jan.

Contributor
Posts: 30

Re: How to duplicate rows

Again thank you for your responsiveness. I think i have all elements i need to perform the code.

 

thx Smiley Wink

Contributor
Posts: 30

Re: How to duplicate rows

I get some issues with new_id incrementation. Since it independant, i'm using new_id=_n_ at the end of the code but the vector is null. With your solution, the incrementation starts over at each duplicates :s.

.data want2;
	set have;

	if twin = 0 then
		do;
			new_twin = 0;
			output;
		end;
	
	else if twin = 1 then
		do;
			new_twin=1;
			output;
			*new_id+1;
			new_twin=2;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
new_id=_n_;		
run;

 

Super Contributor
Posts: 408

Re: How to duplicate rows

One way to make certain your new incremented id does not clash with an existing one is to take the current max and use that to start incrementing. Note that for large datasets this may have a performance impact.

 

proc sql noprint;
	select max(id) into :maxid
		from have;
quit;

data want;
	set have;
	output;
	retain newid &maxid;
	drop newid;

	if twin = 1 then
		do;
			newid+1;
			id=newid;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
run;

Hope this helps,

- Jan

Super User
Posts: 6,938

Re: How to duplicate rows

Since the information in age2 and sex2 becomes redundant, I'd drop the variables during the transformation. Otherwise @draycut's solution is the way to go.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 340 views
  • 4 likes
  • 4 in conversation