BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
adil256
Quartz | Level 8

Hi,  I have a list of observations I would like to dupicate based on the variable Twin.  A row with Twin=1 contains informations about 2 individuals. After duplicating the row, i want to replace information in Age1 and Age2 by Sex1 and Sex2, all variable remains the same.

 

 

HAVE:

Id   Twin   Age1   Age2    Sex1    Sex2

1       1      45        35       M           F     

2       0      32         .         F                  

3       0      26        .          M                

4       1      20        21       M           F  

 

My dataset contains about 300.000 rows.

 

WANT: 

Id   Twin   Age1   Age2    Sex1    Sex2

1       1      45        35       M           F  

1       1      35        35       F            F      

2       0      32         .         F                  

3       0      26        .          M                    

4       1      20        21       M           F 

4       1      21        21       F           F

 

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
jklaverstijn
Rhodochrosite | Level 12

If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.

 

 

data want;
	set have;

	if twin = 0 then
		do;
			new_twin = 0;
			output;
		end;
	else if twin = 1 then
		do;
			new_twin=1;
			output;
			new_id+1;
			new_twin=2;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
run;

 

 

 

One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.

 

Regards,

- Jan.

View solution in original post

9 REPLIES 9
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input Id Twin Age1 Age2 Sex1 $ Sex2 $;
infile datalines missover;
datalines;
1 1 45 35 M F
2 0 32 .  F  
3 0 26 .  M  
4 1 20 21 M F
;

data want;
	set have;
	output;
	if twin = 1 then do;
		age1 = age2;
		sex1 = sex2;
		output;
	end;
run;
adil256
Quartz | Level 8

Exactly what I want thank very much. Seems quite easy to you :).

oh i just forget to increment an new variable let's say new_id with distinct id number.

 

again thank u very much. I'm quite new in programming. 🙂

PeterClemmensen
Tourmaline | Level 20

No problem, posting your problem like this with a clear description of your desired result makes it easy to help.

 

I agree with @Kurt_Bremser, you should drop those variables, they are redundant.

 

Now about incrementing ID, how do you want this done? Eg your first set of twins have id=1, so incrementing this would give you id=1 and id=2 in the first two rows and then your third row would also have id=2 from your original data?

adil256
Quartz | Level 8

I express myself poorly sorry, easier to understand with a table. new_id in order to get unique value per row and new_twin to distinct original row from the duplicate one.

 

I hope it's clear enough for you.

Yep definitely i'll drop redundant variables.

 

Id   Twin   Age1   Age2    Sex1    Sex2  new_Id new_twin

1       1      45        35       M           F         1             1

1       1      35        35       F            F         2             2

2       0      32         .         F                       3             0

3       0      26        .          M                      4             0

4       1      20        21       M           F         5             1

4       1      21        21       F            F         6             2

 

 

thank u very much again 🙂

jklaverstijn
Rhodochrosite | Level 12

If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.

 

 

data want;
	set have;

	if twin = 0 then
		do;
			new_twin = 0;
			output;
		end;
	else if twin = 1 then
		do;
			new_twin=1;
			output;
			new_id+1;
			new_twin=2;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
run;

 

 

 

One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.

 

Regards,

- Jan.

adil256
Quartz | Level 8

Again thank you for your responsiveness. I think i have all elements i need to perform the code.

 

thx 😉

adil256
Quartz | Level 8

I get some issues with new_id incrementation. Since it independant, i'm using new_id=_n_ at the end of the code but the vector is null. With your solution, the incrementation starts over at each duplicates :s.

.data want2;
	set have;

	if twin = 0 then
		do;
			new_twin = 0;
			output;
		end;
	
	else if twin = 1 then
		do;
			new_twin=1;
			output;
			*new_id+1;
			new_twin=2;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
new_id=_n_;		
run;

 

jklaverstijn
Rhodochrosite | Level 12

One way to make certain your new incremented id does not clash with an existing one is to take the current max and use that to start incrementing. Note that for large datasets this may have a performance impact.

 

proc sql noprint;
	select max(id) into :maxid
		from have;
quit;

data want;
	set have;
	output;
	retain newid &maxid;
	drop newid;

	if twin = 1 then
		do;
			newid+1;
			id=newid;
			age1 = age2;
			sex1 = sex2;
			output;
		end;
run;

Hope this helps,

- Jan

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 38844 views
  • 5 likes
  • 4 in conversation