BookmarkSubscribeRSS Feed
Demographer
Pyrite | Level 9

Hi,

With data step, I want to duplicate a line if conditions are respected. I then change some variables on the new line created. I did this:

output;

if  rand('uniform')<prob_nais then do;

    nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

    if rand('uniform')>105/205 then do SEXE=1; prob_survie=qf; end; else do; sexe=0; prob_survie=qm; end;

    output;

end;

It works, but the problem is that if I add more codes after this, they aren't read properly. Example, I tried this:

test=1;

output;

if  rand('uniform')<prob_nais then do;

    nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

    if rand('uniform')>105/205 then do SEXE=1; prob_survie=qf; end; else do; sexe=0; prob_survie=qm; end;

    output;

end;

test2=1;

The variable test is created properly (all set to 1), but the variable test2 is set to missing for each observation, so I guess there is a problem with the code for the duplication.

8 REPLIES 8
Tom
Super User Tom
Super User

It is because of where you have placed the OUTPUT statements.  Since they are before the assignment of TEST2 the value is missing.

You can think of the OUTPUT statement are doing literally what its name implies.  It writes the record to that dataset with the current values of all of the variables.

If there is a value that you want to calculate based on the current observation and have that value carried forward onto the next then use the RETAIN statement.  If you added RETAIN TEST2 statement to your data step then in this case only the first observation would have a missing value for TEST2.

Demographer
Pyrite | Level 9

Is there another way to duplicate a line without using the OUTPUT statements?

art297
Opal | Level 21

Nothing wrong with output statements per se.  Tom added some extra thoughts, but his initial adivce will solve your current problem.  i.e., just change your code to:

  test=1;

  output;

  if  rand('uniform')<prob_nais then do;

    nais=1;

    age=0;

    prob_nais=0;

    annee_immig=.;

    immig=0;

    duree_imm=.;

    if rand('uniform')>105/205 then do;

      SEXE=1;

      prob_survie=qf;

    end;

    else do;

      sexe=0;

      prob_survie=qm;

    end;

    test2=1;

    output;

  end;

Demographer
Pyrite | Level 9

That doesn't work. With your code, the variable test2 is only created for observations that have been duplicated. I wrote test2=1 as an example, but I have much more codes that should follow the duplication, and those codes must concern all observations. Maybe I could just close the data step and start another one after the duplication.

Tom
Super User Tom
Super User

Not sure why it would need to follow the insertion of the extra record, but if your dataset is not extremely large then there is not much harm it splitting the processing into two steps.

art297
Opal | Level 21

Then I obviously don't understand what you are trying to do.  Can you post twi small example datasets, namely one that simulates what you have, and the other showing what you want the resulting dataset to look like?

Demographer
Pyrite | Level 9

It's a demographic projection by microsimulation. Prob_nais is the probability of giving birth. Prob_survie his the probability of survival.

Each observation already has a value to prob_survie. When there is a duplication (i.e. a new birth), the prob_survie change for the new observation, since it's a new born who has his own probability of survival.

Once each old and new line has his own prob_survie, then I simulated mortality:

if rand('uniform') > prob_survie THEN vie=0;

PGStats
Opal | Level 21

You must be careful about the order of statements. I think this would be better :

test=1;

test2=0;

output;

if  rand('uniform')<prob_nais then do;

     nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

     if rand('uniform')>105/205 then do;

          sexe=1; prob_survie=qf; end;

     else do;

          sexe=0; prob_survie=qm; end;

     test2=1;

     output;

end;

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 987 views
  • 3 likes
  • 4 in conversation