Hello,
I try to make the microsimulation with SAS, meaning that a starting database is regenerated assuming certain conditions and assumptions. Some observations would also be added. In the starting database, a variable NOINDIV identifies each case. My problem is the following: I want to assign a single NOINDIV to each observations added. I can not use a random function (rand) to assign this NOINDIV because the rounded number generated could be not unique. Anyone knows what to do?
Thanks
Assuming you have a starting population (StudyPop) and a series of separate datasets of immigrants (NewPop1980 etc), you can assign the sequential NOINDIV to the starting population with an initial id of e.g. 123457 (one more than in the Retain statement) :
Data StudyPop ;
Retain NOINDIV 123456 ;
Set StudyPop ;
NOINDIV = NOINDIV + 1 ;
Run ;
The following steps can be implemented as a macro, run it for each new population to be added.
To continue the sequential series; first find the maximum NOINDIV from your previous generation and insert that into a macro variable; then use the maximum as the starting point for the next series.
%Macro AddNewPop (NewPop) ;
Proc SQL NoPrint ;
Select Max (NOINDIV)
Into :MaxNOINDIV
From StudyPop
;
Quit ;
/* Add the NOINDIV sequential value to the new population */
Data NewPop ;
Retain NOINDIV &MaxNOINDIV ;
Set NewPop ;
NOINDIV = NOINDIV + 1 ;
Run ;
/* Append the new population to the study population */
Proc Append
Base = StudyPop
Data = NewPop
;
Run ;
%Mend ;
Now use the macro thus :
%AddNewPop (NewPop1980) ;
%AddNewPop (NewPop1981) ;
%AddNewPop (NewPop1982) ;
etc.
It's really necessary to be random?
A sequence number will guarantee the uniqueness of the value.
Cheers from Portugal!
Daniel Santos @ www.cgd.pt
I agree with Daniel. And if you want your ID number to "look" random, generate a SAS dataset with two variables; a sequential number and a random number. Then sort it by the random number, and assign your sequential numbers in the resulting order.
Tom
Thanks for your comment. The randomness is not necessary, but I'm not sure what to do with what you suggest. To clarify, it's a population projection on 20 years, so there is 20 regenerations. Cases added during each regeneration are immigrants. Once the ID number is assigned, it must be conserved for each regeneration.
Assuming you have a starting population (StudyPop) and a series of separate datasets of immigrants (NewPop1980 etc), you can assign the sequential NOINDIV to the starting population with an initial id of e.g. 123457 (one more than in the Retain statement) :
Data StudyPop ;
Retain NOINDIV 123456 ;
Set StudyPop ;
NOINDIV = NOINDIV + 1 ;
Run ;
The following steps can be implemented as a macro, run it for each new population to be added.
To continue the sequential series; first find the maximum NOINDIV from your previous generation and insert that into a macro variable; then use the maximum as the starting point for the next series.
%Macro AddNewPop (NewPop) ;
Proc SQL NoPrint ;
Select Max (NOINDIV)
Into :MaxNOINDIV
From StudyPop
;
Quit ;
/* Add the NOINDIV sequential value to the new population */
Data NewPop ;
Retain NOINDIV &MaxNOINDIV ;
Set NewPop ;
NOINDIV = NOINDIV + 1 ;
Run ;
/* Append the new population to the study population */
Proc Append
Base = StudyPop
Data = NewPop
;
Run ;
%Mend ;
Now use the macro thus :
%AddNewPop (NewPop1980) ;
%AddNewPop (NewPop1981) ;
%AddNewPop (NewPop1982) ;
etc.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.