DATA Step, Macro, Functions and more

Duplicate a line following conditions

Reply
Frequent Contributor
Posts: 110

Duplicate a line following conditions

Hi,

With data step, I want to duplicate a line if conditions are respected. I then change some variables on the new line created. I did this:

output;

if  rand('uniform')<prob_nais then do;

    nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

    if rand('uniform')>105/205 then do SEXE=1; prob_survie=qf; end; else do; sexe=0; prob_survie=qm; end;

    output;

end;

It works, but the problem is that if I add more codes after this, they aren't read properly. Example, I tried this:

test=1;

output;

if  rand('uniform')<prob_nais then do;

    nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

    if rand('uniform')>105/205 then do SEXE=1; prob_survie=qf; end; else do; sexe=0; prob_survie=qm; end;

    output;

end;

test2=1;

The variable test is created properly (all set to 1), but the variable test2 is set to missing for each observation, so I guess there is a problem with the code for the duplication.

Super User
Super User
Posts: 6,500

Re: Duplicate a line following conditions

It is because of where you have placed the OUTPUT statements.  Since they are before the assignment of TEST2 the value is missing.

You can think of the OUTPUT statement are doing literally what its name implies.  It writes the record to that dataset with the current values of all of the variables.

If there is a value that you want to calculate based on the current observation and have that value carried forward onto the next then use the RETAIN statement.  If you added RETAIN TEST2 statement to your data step then in this case only the first observation would have a missing value for TEST2.

Frequent Contributor
Posts: 110

Re: Duplicate a line following conditions

Is there another way to duplicate a line without using the OUTPUT statements?

PROC Star
Posts: 7,363

Re: Duplicate a line following conditions

Nothing wrong with output statements per se.  Tom added some extra thoughts, but his initial adivce will solve your current problem.  i.e., just change your code to:

  test=1;

  output;

  if  rand('uniform')<prob_nais then do;

    nais=1;

    age=0;

    prob_nais=0;

    annee_immig=.;

    immig=0;

    duree_imm=.;

    if rand('uniform')>105/205 then do;

      SEXE=1;

      prob_survie=qf;

    end;

    else do;

      sexe=0;

      prob_survie=qm;

    end;

    test2=1;

    output;

  end;

Frequent Contributor
Posts: 110

Re: Duplicate a line following conditions

That doesn't work. With your code, the variable test2 is only created for observations that have been duplicated. I wrote test2=1 as an example, but I have much more codes that should follow the duplication, and those codes must concern all observations. Maybe I could just close the data step and start another one after the duplication.

Super User
Super User
Posts: 6,500

Re: Duplicate a line following conditions

Not sure why it would need to follow the insertion of the extra record, but if your dataset is not extremely large then there is not much harm it splitting the processing into two steps.

PROC Star
Posts: 7,363

Re: Duplicate a line following conditions

Then I obviously don't understand what you are trying to do.  Can you post twi small example datasets, namely one that simulates what you have, and the other showing what you want the resulting dataset to look like?

Frequent Contributor
Posts: 110

Re: Duplicate a line following conditions

It's a demographic projection by microsimulation. Prob_nais is the probability of giving birth. Prob_survie his the probability of survival.

Each observation already has a value to prob_survie. When there is a duplication (i.e. a new birth), the prob_survie change for the new observation, since it's a new born who has his own probability of survival.

Once each old and new line has his own prob_survie, then I simulated mortality:

if rand('uniform') > prob_survie THEN vie=0;

Respected Advisor
Posts: 4,649

Re: Duplicate a line following conditions

You must be careful about the order of statements. I think this would be better :

test=1;

test2=0;

output;

if  rand('uniform')<prob_nais then do;

     nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

     if rand('uniform')>105/205 then do;

          sexe=1; prob_survie=qf; end;

     else do;

          sexe=0; prob_survie=qm; end;

     test2=1;

     output;

end;

PG

PG
Ask a Question
Discussion stats
  • 8 replies
  • 291 views
  • 3 likes
  • 4 in conversation