DATA Step, Macro, Functions and more

Merge dataset under conditions

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 113
Accepted Solution

Merge dataset under conditions

Hi,

I have a first dataset called "survieH" that looks like this:

p1p2p3p4p5p6p7p8
0,998053950,999729430,999788890,999832480,999860970,999880870,999892950,99989952

I have a second dataset called "survieF" that is similar to the first one:

p1p2p3p4p5p6p7p8
0,998294330,999768910,999826960,999865110,999890350,99990690,999915570,99992048

Finaly, I have the main database ("bd2006") that contains many observations (30000) and several variables, which "sex" (0-1).

I want to create a new dataset that merges them under the condition that if sex=1 then variable I pick the p1-p8 variables in the survieF dataset and if sex=0, I pick the other one.

I tried this, without succes:

data work.bd2007;

set work.bd2006;

if _N_ EQ 1 and sex=0 then set survieH ; 

if _N_ EQ 1 and sex=1 then set survieF ;

I know I can do it by doing this, but I want to reduce the size of the dataset, so I don't think it's the best way.

data survieH (keep ph1-ph8);

set survieH;

p1=ph1;

...

p8=ph8;

run;

data work.bd2007;

set work.bd2006;

if _N_ EQ 1 then do set survieH; set survieF ; end;

array ph ph1-ph8;

array p p1-p8;

do i=1 to 8;

if sex=0 then p(i)=ph(i);

end;

drop ph1-ph8;

run;


Accepted Solutions
Solution
‎01-03-2013 02:19 PM
Super User
Super User
Posts: 7,083

Re: Merge dataset under conditions

Posted in reply to Demographer

Here is one way using the POINT= option to prevent SAS from stopping because you read all of the P value records.

Added logic to force P1-P8 to be missing if SEX is neither 0 or 1.  SAS will automatically drop the variables named in the POINT= options.

data work.bd2007;

  set work.bd2006;

  pointer1=1;

  pointer2=1;

  if sex=0 then set survieH point=pointer1;

  else if sex=1 then set survieF point=pointer2;

  else call missing(of p1-p8);

run;

View solution in original post


All Replies
Solution
‎01-03-2013 02:19 PM
Super User
Super User
Posts: 7,083

Re: Merge dataset under conditions

Posted in reply to Demographer

Here is one way using the POINT= option to prevent SAS from stopping because you read all of the P value records.

Added logic to force P1-P8 to be missing if SEX is neither 0 or 1.  SAS will automatically drop the variables named in the POINT= options.

data work.bd2007;

  set work.bd2006;

  pointer1=1;

  pointer2=1;

  if sex=0 then set survieH point=pointer1;

  else if sex=1 then set survieF point=pointer2;

  else call missing(of p1-p8);

run;

Frequent Contributor
Posts: 113

Re: Merge dataset under conditions

Thank you, it works.

But I don't understand what is the purpose of pointer1 and pointer2.

Super User
Super User
Posts: 7,083

Re: Merge dataset under conditions

Posted in reply to Demographer

The POINT= option of the SET statement uses a data step variable to allow you to dynamically directly access observations in a dataset. That is what the two pointer variables are used for.  Because we have two SET statements we need two different variables.

In this case we are always accessing the same observation because the pointer variables are always set to one.  Why it is useful for this application is that it lets you use the extra two SET statements without worrying that SAS will move past the end of one of them and cause your data step to end prematurely. Also by re-accessing the same observations from the two P value tables over and over we can reload the P1-P8 variables without having to rename them or create arrays or code DO loops to copy from the renamed variables to the desired names.

Frequent Contributor
Posts: 113

Re: Merge dataset under conditions

So if I have 15 SET statements, I will need 15 variables?

Super User
Super User
Posts: 7,083

Re: Merge dataset under conditions

Posted in reply to Demographer

If you want to use POINT= option on each of them.  But if you have 15 then you probably do not want to code it this way, or you want to use a macro to generate the code for you.

You could do your combination using PROC SQL by adding the SEX variable to the data with the P values.  Either keep the SEX variable when you are generating the P values or add it back in.

proc sql ;

  create table want as

    select a.*,b.*

    from have a

       left join

        (select 1 as SEX,f.* from survieF f

         union

         select 0 as SEX,h.* from survieH h) b

    on a.sex = b.sex

  ;

quit;


Super User
Posts: 5,518

Re: Merge dataset under conditions

Tom,

Do you really need 15?  Wouldn't a single variable work just as well across all SET statements?

Super User
Super User
Posts: 7,083

Re: Merge dataset under conditions

Posted in reply to Astounding

Actually I tested and it does not require different variables.  So a single pointer variable could work.  I am just used to using different variables because of the confusion that would occur if you used the same variable for options that the SET statement modifies the values such as END= or NOBS=.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 246 views
  • 0 likes
  • 3 in conversation