Hi,
I have a first dataset called "survieH" that looks like this:
p1 | p2 | p3 | p4 | p5 | p6 | p7 | p8 |
0,99805395 | 0,99972943 | 0,99978889 | 0,99983248 | 0,99986097 | 0,99988087 | 0,99989295 | 0,99989952 |
I have a second dataset called "survieF" that is similar to the first one:
p1 | p2 | p3 | p4 | p5 | p6 | p7 | p8 |
0,99829433 | 0,99976891 | 0,99982696 | 0,99986511 | 0,99989035 | 0,9999069 | 0,99991557 | 0,99992048 |
Finaly, I have the main database ("bd2006") that contains many observations (30000) and several variables, which "sex" (0-1).
I want to create a new dataset that merges them under the condition that if sex=1 then variable I pick the p1-p8 variables in the survieF dataset and if sex=0, I pick the other one.
I tried this, without succes:
data work.bd2007;
set work.bd2006;
if _N_ EQ 1 and sex=0 then set survieH ;
if _N_ EQ 1 and sex=1 then set survieF ;
I know I can do it by doing this, but I want to reduce the size of the dataset, so I don't think it's the best way.
data survieH (keep ph1-ph8);
set survieH;
p1=ph1;
...
p8=ph8;
run;
data work.bd2007;
set work.bd2006;
if _N_ EQ 1 then do set survieH; set survieF ; end;
array ph ph1-ph8;
array p p1-p8;
do i=1 to 8;
if sex=0 then p(i)=ph(i);
end;
drop ph1-ph8;
run;
Here is one way using the POINT= option to prevent SAS from stopping because you read all of the P value records.
Added logic to force P1-P8 to be missing if SEX is neither 0 or 1. SAS will automatically drop the variables named in the POINT= options.
data work.bd2007;
set work.bd2006;
pointer1=1;
pointer2=1;
if sex=0 then set survieH point=pointer1;
else if sex=1 then set survieF point=pointer2;
else call missing(of p1-p8);
run;
Here is one way using the POINT= option to prevent SAS from stopping because you read all of the P value records.
Added logic to force P1-P8 to be missing if SEX is neither 0 or 1. SAS will automatically drop the variables named in the POINT= options.
data work.bd2007;
set work.bd2006;
pointer1=1;
pointer2=1;
if sex=0 then set survieH point=pointer1;
else if sex=1 then set survieF point=pointer2;
else call missing(of p1-p8);
run;
Thank you, it works.
But I don't understand what is the purpose of pointer1 and pointer2.
The POINT= option of the SET statement uses a data step variable to allow you to dynamically directly access observations in a dataset. That is what the two pointer variables are used for. Because we have two SET statements we need two different variables.
In this case we are always accessing the same observation because the pointer variables are always set to one. Why it is useful for this application is that it lets you use the extra two SET statements without worrying that SAS will move past the end of one of them and cause your data step to end prematurely. Also by re-accessing the same observations from the two P value tables over and over we can reload the P1-P8 variables without having to rename them or create arrays or code DO loops to copy from the renamed variables to the desired names.
So if I have 15 SET statements, I will need 15 variables?
If you want to use POINT= option on each of them. But if you have 15 then you probably do not want to code it this way, or you want to use a macro to generate the code for you.
You could do your combination using PROC SQL by adding the SEX variable to the data with the P values. Either keep the SEX variable when you are generating the P values or add it back in.
proc sql ;
create table want as
select a.*,b.*
from have a
left join
(select 1 as SEX,f.* from survieF f
union
select 0 as SEX,h.* from survieH h) b
on a.sex = b.sex
;
quit;
Tom,
Do you really need 15? Wouldn't a single variable work just as well across all SET statements?
Actually I tested and it does not require different variables. So a single pointer variable could work. I am just used to using different variables because of the confusion that would occur if you used the same variable for options that the SET statement modifies the values such as END= or NOBS=.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.