SAS Programming

KMaxwell · Posted 10-10-2022 10:14 PM

I want to create varX = (w, b, h, s) [4 cardinal variables] and varY = (yes, no) and then sample with replacement, with probability. ProbX = (0.6, 0.2, 0.15, 0.05) and ProbY = (0.8, 0.2). The sample size for X is X_total = (20, 50, 100, 200, 500 and 1000) and sample size for Y is each type of X for any given X_total and I have to do this by bootstrap and then compare them against the theoretical results of a 2D chi-square test. I am stuck at defining the variables X and Y. Can someone please help me with the syntax or point me to a tutorial on how to define variables in SAS? I'm familiar with R and have no clue on this one.

Thanks

ballardw · Posted 10-11-2022 05:31 PM

I don't understand some of your question because you are mixing "variable" with "value" , I think.

Without any other values involved one basic approach to making a data set with a fixed number of observations would look like this:

data yournamegoeshere;
   do i = 1 to 100;
       output;
   end;
run;

That will create as sequence of records with the value of the variable i = 1,2,3 ... 100, which you may see are the limits of the do loop starting at 1 ending at 100.

The Rand('table') function is the way to provide given probabilities to selecting values. I'll show the second as it has a useful bit of SAS information for use:

data example;
  do i= 1 to 10;
     y= rand('table',.2,.8)-1;
     output;
  end;
run;

The table function will assign the value 1 with the first probability, 2 with second, 3rd etc. If your total of the probabilities provided totals less than 1 then then the remaining chance will go to N+1. If the total of the probabilities exceeds 1 this gets a bit harder to explain so leave it as best to have the parameters total to 1.

The above is my suggestion to create a variable that is coded 0/1 with 1 standing for Yes(or True). SAS will will treat the value 1 as True and 0 as False so you can simplify some code involving IF constructs to "If Y then <do something>". If you use Y/N or 'Yes/No' or similar then you have to explicitly state the value in comparisons. Additionally summary counts for a 1/0 coded variable can be accomplished with the SUM function, the sum is the number of 1's and the Mean is the percent of 1's as a decimal value.

The 4 level would look like:

data example;
  do i= 1 to 100;
     y= rand('table',.2,.9)-1;
     x= rand('table',0.6, 0.2, 0.15, 0.05) ;
     output;
  end;
run;

Which will generate X of 1,2,3 or 4. If you want to SEE something like w, b, h, s then create a custom format and use that when you need to see those values.

Proc format;
value myx
1='w'
2='b'
3='h'
4='s'
;
run;

proc freq data=example;
   format x myx. ;
run;

Formats are the basic way of controlling displayed values in SAS. They have an additional functionality that groups defined by a format will be used for reporting, analysis and most graphing tasks. So in your example if we created a format like

Proc format;
value myxbh
1='w'
2,3='bh'
4='s'
;
run;

We could

Proc freq data=example;
    format x myxbh. ;
run;

see the affects of having that 'bh' appear with roughly .35 probability without having to recreate the data set.

Note: do not end the name of a format with a digit. The digits in formats control display positions so myx3. would occupy 3 output positions in general (even when only 1 is used padded) .

Part of what I have no clue of is this: The sample size for X is X_total = (20, 50, 100, 200, 500 and 1000). IS implies a single value to me. My guess is that in SAS you either create multiple data sets by changing the upper limit of i in the examples OR create one data set with 1000 records and then filter the data set for use by using a Where statement to restrict which records are used for a given purpose. A third approach could be to use a format on the I variable.

data example;
  do i= 1 to 1000;
     y= rand('table',.2,.9)-1;
     x= rand('table',0.6, 0.2, 0.15, 0.05) ;
     output;
  end;
run; 

Proc format;
value myx
1='w'
2,3='bh'
4='s'
;
value i_20_
1 - 20 = 'first 20'
other='More than 20'
;
value yn
0='N'
1='Y'
;
run;

proc freq data=example;
   by i;
   table x*y /chisq;
   format i i_20_. x myx. y yn.;
run;
/* or */
Proc freq data=example;
   where i le 20;
   table x*y /chisq;
   format i i_20_. x myx. y yn.;
run;

Rick_SAS · Posted 10-12-2022 06:38 AM

If you want to simulate two independent discrete variables (uncorrelated), you can use the RAND function and the "Table" distribution. See

https://blogs.sas.com/content/iml/2011/07/13/simulate-categorical-data-in-sas.html

That article has a macro variable

%let NSim = 10000;  /* sample size. 20, 50, 100, 200, 500 and 1000 */

that determines the sample size.

I don't understand the rest of your questions. Since you are "familiar with R," you can post your R program and maybe that will help us understand what analysis you are attempting.

SAS Programming

I need to define 2 variables and random sample from these.

Re: I need to define 2 variables and random sample from these.

Re: I need to define 2 variables and random sample from these.

generate random sample with dataset

[2-4] User-Defined Macro Variables

2-2. Analysis Of Variance(ANOVA): Randomized Block Design

proc surveyselect random sample

Add These Tools to Your SAS Viya Developer's Toolbox

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...