Hello. I have a variable in my main dataset that looks like this:
Number of previous live births (PLBs)
0 PLBs = 0 1 PLBs = 1 2 PLBs = 2 3 PLBs = 3-5 4 PLBs = 6+
I used another dataset that has a fully continuous PLB variable to figure out the percentage of observations that have 3, 4, and 5 PLBs. Now, I would like to use those percentages to randomly pull 60%, 25%, and 15% to be reassigned as 3, 4, and 5 PLBs (respectively) so that I have a continuous variable in my main dataset. I've been playing around with proc surveyselect but I can't figure out how to make this work. Let me know if I can clarify anything. Thanks!
EDIT: there seems to be some confusion (although I can't understand why anyone familiar with data would be confused. but I digress). My data looks something like this, and the 'want' variable is what I want, where 60% of the '3' category is assigned as 3, 25% assigned as 4, and 15% assigned as 5. Thanks!
ID PLB WANT
1 3 4
2 1 1
3 2 2
4 3 3
5 4 4
6 3 5
Does you just want to assign 60% of the '3' to be 3, 25% of the '3' to be 4, 15% of the '3' to be 5 ?
If it was, that would be very easy to implement by RAND() function with TABLE distribution.
data have;
call streaminit(123);
do id=1 to 1000;
plb=rand('table',0.25,0.25,0.4);
output;
end;
stop;
run;
data want;
set have;
call streaminit(123);
if plb=3 then want=2+rand('table',0.6,0.25);
else want=plb;
run;
/*Check the result*/
proc freq data=want(where=(plb=3));
table want;
run;
@joachimg wrote:
Hello. I have a variable in my main dataset that looks like this:
Number of previous live births
0 = 0 1 = 1 2 = 2 3 = 3-5 4 = 6+
Data sets don't look like this. If we are going to write sample code for you, we need to see (a portion of) the actual data set (or the actual data set with fake numbers). Please help us help you by providing the data set as working SAS data step code (examples and instructions), this is the only acceptable way to show us a data set; do not provide the data as Excel files or text files, do not provide the data as copy/paste from Excel, etc. By the way, we have asked you to provide data in the proper form previously, please don't make us repeatedly request that you use the proper form.
Hello, I specified in my original post that this is a VARIABLE in a dataset, not a dataset. This is how you would see the variable described in a data dictionary or codebook.
SAS doesn't work on data dictionaries or codebooks, it works on variables in data sets. As such, we need to see (a portion of) your data set in a usable form, which is defined at the link I gave.
Suppose I assumed the people on here were familiar enough with how data works that my original post would be clear. I've updated the post now.
Does you just want to assign 60% of the '3' to be 3, 25% of the '3' to be 4, 15% of the '3' to be 5 ?
If it was, that would be very easy to implement by RAND() function with TABLE distribution.
data have;
call streaminit(123);
do id=1 to 1000;
plb=rand('table',0.25,0.25,0.4);
output;
end;
stop;
run;
data want;
set have;
call streaminit(123);
if plb=3 then want=2+rand('table',0.6,0.25);
else want=plb;
run;
/*Check the result*/
proc freq data=want(where=(plb=3));
table want;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.