turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Duplicate a line following conditions

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-05-2013 06:34 PM

Hi,

With data step, I want to duplicate a line if conditions are respected. I then change some variables on the new line created. I did this:

output;

if rand('uniform')<prob_nais then do;

nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

if rand('uniform')>105/205 then do SEXE=1; prob_survie=qf; end; else do; sexe=0; prob_survie=qm; end;

output;

end;

It works, but the problem is that if I add more codes after this, they aren't read properly. Example, I tried this:

test=1;

output;

if rand('uniform')<prob_nais then do;

nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.;

if rand('uniform')>105/205 then do SEXE=1; prob_survie=qf; end; else do; sexe=0; prob_survie=qm; end;

output;

end;

test2=1;

The variable test is created properly (all set to 1), but the variable test2 is set to missing for each observation, so I guess there is a problem with the code for the duplication.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Demographer

01-05-2013 08:01 PM

It is because of where you have placed the OUTPUT statements. Since they are before the assignment of TEST2 the value is missing.

You can think of the OUTPUT statement are doing literally what its name implies. It writes the record to that dataset with the current values of all of the variables.

If there is a value that you want to calculate based on the current observation and have that value carried forward onto the next then use the RETAIN statement. If you added RETAIN TEST2 statement to your data step then in this case only the first observation would have a missing value for TEST2.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-06-2013 03:17 PM

Is there another way to duplicate a line without using the OUTPUT statements?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Demographer

01-06-2013 03:31 PM

Nothing wrong with output statements per se. Tom added some extra thoughts, but his initial adivce will solve your current problem. i.e., just change your code to:

test=1;

output;

if rand('uniform')<prob_nais then do;

nais=1;

age=0;

prob_nais=0;

annee_immig=.;

immig=0;

duree_imm=.;

if rand('uniform')>105/205 then do;

SEXE=1;

prob_survie=qf;

end;

else do;

sexe=0;

prob_survie=qm;

end;

test2=1;

output;

end;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

01-06-2013 03:47 PM

That doesn't work. With your code, the variable test2 is only created for observations that have been duplicated. I wrote test2=1 as an example, but I have much more codes that should follow the duplication, and those codes must concern all observations. Maybe I could just close the data step and start another one after the duplication.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Demographer

01-06-2013 04:32 PM

Not sure why it would need to follow the insertion of the extra record, but if your dataset is not extremely large then there is not much harm it splitting the processing into two steps.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Demographer

01-06-2013 04:38 PM

Then I obviously don't understand what you are trying to do. Can you post twi small example datasets, namely one that simulates what you have, and the other showing what you want the resulting dataset to look like?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

01-06-2013 05:31 PM

It's a demographic projection by microsimulation. Prob_nais is the probability of giving birth. Prob_survie his the probability of survival.

Each observation already has a value to prob_survie. When there is a duplication (i.e. a new birth), the prob_survie change for the new observation, since it's a new born who has his own probability of survival.

Once each old and new line has his own prob_survie, then I simulated mortality:

if rand('uniform') > prob_survie THEN vie=0;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Demographer

01-06-2013 04:40 PM

You must be careful about the order of statements. I think this would be better :

**test=1;**

**test2=0;**

**output;**

**if rand('uniform')<prob_nais then do;**

** nais=1; age=0; prob_nais=0; annee_immig=.; immig=0; duree_imm=.; **

** if rand('uniform')>105/205 then do; **

** sexe=1; prob_survie=qf; end; **

** else do; **

** sexe=0; prob_survie=qm; end; **

** ****test2=1;**

** output;**

**end;**

PG

PG