- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want to create varX = (w, b, h, s) [4 cardinal variables] and varY = (yes, no) and then sample with replacement, with probability. ProbX = (0.6, 0.2, 0.15, 0.05) and ProbY = (0.8, 0.2). The sample size for X is X_total = (20, 50, 100, 200, 500 and 1000) and sample size for Y is each type of X for any given X_total and I have to do this by bootstrap and then compare them against the theoretical results of a 2D chi-square test. I am stuck at defining the variables X and Y. Can someone please help me with the syntax or point me to a tutorial on how to define variables in SAS? I'm familiar with R and have no clue on this one.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't understand some of your question because you are mixing "variable" with "value" , I think.
Without any other values involved one basic approach to making a data set with a fixed number of observations would look like this:
data yournamegoeshere; do i = 1 to 100; output; end; run;
That will create as sequence of records with the value of the variable i = 1,2,3 ... 100, which you may see are the limits of the do loop starting at 1 ending at 100.
The Rand('table') function is the way to provide given probabilities to selecting values. I'll show the second as it has a useful bit of SAS information for use:
data example; do i= 1 to 10; y= rand('table',.2,.8)-1; output; end; run;
The table function will assign the value 1 with the first probability, 2 with second, 3rd etc. If your total of the probabilities provided totals less than 1 then then the remaining chance will go to N+1. If the total of the probabilities exceeds 1 this gets a bit harder to explain so leave it as best to have the parameters total to 1.
The above is my suggestion to create a variable that is coded 0/1 with 1 standing for Yes(or True). SAS will will treat the value 1 as True and 0 as False so you can simplify some code involving IF constructs to "If Y then <do something>". If you use Y/N or 'Yes/No' or similar then you have to explicitly state the value in comparisons. Additionally summary counts for a 1/0 coded variable can be accomplished with the SUM function, the sum is the number of 1's and the Mean is the percent of 1's as a decimal value.
The 4 level would look like:
data example; do i= 1 to 100; y= rand('table',.2,.9)-1; x= rand('table',0.6, 0.2, 0.15, 0.05) ; output; end; run;
Which will generate X of 1,2,3 or 4. If you want to SEE something like w, b, h, s then create a custom format and use that when you need to see those values.
Proc format; value myx 1='w' 2='b' 3='h' 4='s' ; run;
proc freq data=example;
format x myx. ;
run;
Formats are the basic way of controlling displayed values in SAS. They have an additional functionality that groups defined by a format will be used for reporting, analysis and most graphing tasks. So in your example if we created a format like
Proc format; value myxbh 1='w' 2,3='bh' 4='s' ; run;
We could
Proc freq data=example; format x myxbh. ; run;
see the affects of having that 'bh' appear with roughly .35 probability without having to recreate the data set.
Note: do not end the name of a format with a digit. The digits in formats control display positions so myx3. would occupy 3 output positions in general (even when only 1 is used padded) .
Part of what I have no clue of is this: The sample size for X is X_total = (20, 50, 100, 200, 500 and 1000). IS implies a single value to me. My guess is that in SAS you either create multiple data sets by changing the upper limit of i in the examples OR create one data set with 1000 records and then filter the data set for use by using a Where statement to restrict which records are used for a given purpose. A third approach could be to use a format on the I variable.
data example; do i= 1 to 1000; y= rand('table',.2,.9)-1; x= rand('table',0.6, 0.2, 0.15, 0.05) ; output; end; run; Proc format; value myx 1='w' 2,3='bh' 4='s' ; value i_20_ 1 - 20 = 'first 20' other='More than 20' ; value yn 0='N' 1='Y' ; run; proc freq data=example; by i; table x*y /chisq; format i i_20_. x myx. y yn.; run; /* or */ Proc freq data=example; where i le 20; table x*y /chisq; format i i_20_. x myx. y yn.; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you want to simulate two independent discrete variables (uncorrelated), you can use the RAND function and the "Table" distribution. See
https://blogs.sas.com/content/iml/2011/07/13/simulate-categorical-data-in-sas.html
That article has a macro variable
%let NSim = 10000; /* sample size. 20, 50, 100, 200, 500 and 1000 */
that determines the sample size.
I don't understand the rest of your questions. Since you are "familiar with R," you can post your R program and maybe that will help us understand what analysis you are attempting.