Hi, I have a list of observations I would like to dupicate based on the variable Twin. A row with Twin=1 contains informations about 2 individuals. After duplicating the row, i want to replace information in Age1 and Age2 by Sex1 and Sex2, all variable remains the same.
HAVE:
Id Twin Age1 Age2 Sex1 Sex2
1 1 45 35 M F
2 0 32 . F
3 0 26 . M
4 1 20 21 M F
My dataset contains about 300.000 rows.
WANT:
Id Twin Age1 Age2 Sex1 Sex2
1 1 45 35 M F
1 1 35 35 F F
2 0 32 . F
3 0 26 . M
4 1 20 21 M F
4 1 21 21 F F
Thanks for your help.
If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.
data want;
set have;
if twin = 0 then
do;
new_twin = 0;
output;
end;
else if twin = 1 then
do;
new_twin=1;
output;
new_id+1;
new_twin=2;
age1 = age2;
sex1 = sex2;
output;
end;
run;
One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.
Regards,
- Jan.
Like this?
data have;
input Id Twin Age1 Age2 Sex1 $ Sex2 $;
infile datalines missover;
datalines;
1 1 45 35 M F
2 0 32 . F
3 0 26 . M
4 1 20 21 M F
;
data want;
set have;
output;
if twin = 1 then do;
age1 = age2;
sex1 = sex2;
output;
end;
run;
Exactly what I want thank very much. Seems quite easy to you :).
oh i just forget to increment an new variable let's say new_id with distinct id number.
again thank u very much. I'm quite new in programming. 🙂
No problem, posting your problem like this with a clear description of your desired result makes it easy to help.
I agree with @Kurt_Bremser, you should drop those variables, they are redundant.
Now about incrementing ID, how do you want this done? Eg your first set of twins have id=1, so incrementing this would give you id=1 and id=2 in the first two rows and then your third row would also have id=2 from your original data?
I express myself poorly sorry, easier to understand with a table. new_id in order to get unique value per row and new_twin to distinct original row from the duplicate one.
I hope it's clear enough for you.
Yep definitely i'll drop redundant variables.
Id Twin Age1 Age2 Sex1 Sex2 new_Id new_twin
1 1 45 35 M F 1 1
1 1 35 35 F F 2 2
2 0 32 . F 3 0
3 0 26 . M 4 0
4 1 20 21 M F 5 1
4 1 21 21 F F 6 2
thank u very much again 🙂
If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.
data want;
set have;
if twin = 0 then
do;
new_twin = 0;
output;
end;
else if twin = 1 then
do;
new_twin=1;
output;
new_id+1;
new_twin=2;
age1 = age2;
sex1 = sex2;
output;
end;
run;
One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.
Regards,
- Jan.
Again thank you for your responsiveness. I think i have all elements i need to perform the code.
thx 😉
I get some issues with new_id incrementation. Since it independant, i'm using new_id=_n_ at the end of the code but the vector is null. With your solution, the incrementation starts over at each duplicates :s.
.data want2; set have; if twin = 0 then do; new_twin = 0; output; end; else if twin = 1 then do; new_twin=1; output; *new_id+1; new_twin=2; age1 = age2; sex1 = sex2; output; end; new_id=_n_; run;
One way to make certain your new incremented id does not clash with an existing one is to take the current max and use that to start incrementing. Note that for large datasets this may have a performance impact.
proc sql noprint;
select max(id) into :maxid
from have;
quit;
data want;
set have;
output;
retain newid &maxid;
drop newid;
if twin = 1 then
do;
newid+1;
id=newid;
age1 = age2;
sex1 = sex2;
output;
end;
run;
Hope this helps,
- Jan
Since the information in age2 and sex2 becomes redundant, I'd drop the variables during the transformation. Otherwise @PeterClemmensen's solution is the way to go.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.