Hi, I have a list of observations I would like to dupicate based on the variable Twin. A row with Twin=1 contains informations about 2 individuals. After duplicating the row, i want to replace information in Age1 and Age2 by Sex1 and Sex2, all variable remains the same.
HAVE:
Id Twin Age1 Age2 Sex1 Sex2
1 1 45 35 M F
2 0 32 . F
3 0 26 . M
4 1 20 21 M F
My dataset contains about 300.000 rows.
WANT:
Id Twin Age1 Age2 Sex1 Sex2
1 1 45 35 M F
1 1 35 35 F F
2 0 32 . F
3 0 26 . M
4 1 20 21 M F
4 1 21 21 F F
Thanks for your help.
If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.
data want;
set have;
if twin = 0 then
do;
new_twin = 0;
output;
end;
else if twin = 1 then
do;
new_twin=1;
output;
new_id+1;
new_twin=2;
age1 = age2;
sex1 = sex2;
output;
end;
run;
One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.
Regards,
- Jan.
Like this?
data have;
input Id Twin Age1 Age2 Sex1 $ Sex2 $;
infile datalines missover;
datalines;
1 1 45 35 M F
2 0 32 . F
3 0 26 . M
4 1 20 21 M F
;
data want;
set have;
output;
if twin = 1 then do;
age1 = age2;
sex1 = sex2;
output;
end;
run;
Exactly what I want thank very much. Seems quite easy to you :).
oh i just forget to increment an new variable let's say new_id with distinct id number.
again thank u very much. I'm quite new in programming. 🙂
No problem, posting your problem like this with a clear description of your desired result makes it easy to help.
I agree with @Kurt_Bremser, you should drop those variables, they are redundant.
Now about incrementing ID, how do you want this done? Eg your first set of twins have id=1, so incrementing this would give you id=1 and id=2 in the first two rows and then your third row would also have id=2 from your original data?
I express myself poorly sorry, easier to understand with a table. new_id in order to get unique value per row and new_twin to distinct original row from the duplicate one.
I hope it's clear enough for you.
Yep definitely i'll drop redundant variables.
Id Twin Age1 Age2 Sex1 Sex2 new_Id new_twin
1 1 45 35 M F 1 1
1 1 35 35 F F 2 2
2 0 32 . F 3 0
3 0 26 . M 4 0
4 1 20 21 M F 5 1
4 1 21 21 F F 6 2
thank u very much again 🙂
If, going by youre example, the new_id can be regenerated independent from the id, there is no need to get the max beforehand. We can just use a counter for new_id. Mark the special meaning of "new_id+1;" which combines the increment and an implicit "RETAIN new_id 0;" statement.
data want;
set have;
if twin = 0 then
do;
new_twin = 0;
output;
end;
else if twin = 1 then
do;
new_twin=1;
output;
new_id+1;
new_twin=2;
age1 = age2;
sex1 = sex2;
output;
end;
run;
One important weakness in this code is the ommision to deal with cases where twin is neither 0 nor 1. I'll leave that as homework for you.
Regards,
- Jan.
Again thank you for your responsiveness. I think i have all elements i need to perform the code.
thx 😉
I get some issues with new_id incrementation. Since it independant, i'm using new_id=_n_ at the end of the code but the vector is null. With your solution, the incrementation starts over at each duplicates :s.
.data want2; set have; if twin = 0 then do; new_twin = 0; output; end; else if twin = 1 then do; new_twin=1; output; *new_id+1; new_twin=2; age1 = age2; sex1 = sex2; output; end; new_id=_n_; run;
One way to make certain your new incremented id does not clash with an existing one is to take the current max and use that to start incrementing. Note that for large datasets this may have a performance impact.
proc sql noprint;
select max(id) into :maxid
from have;
quit;
data want;
set have;
output;
retain newid &maxid;
drop newid;
if twin = 1 then
do;
newid+1;
id=newid;
age1 = age2;
sex1 = sex2;
output;
end;
run;
Hope this helps,
- Jan
Since the information in age2 and sex2 becomes redundant, I'd drop the variables during the transformation. Otherwise @PeterClemmensen's solution is the way to go.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.