SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
KMaxwell
Calcite | Level 5

I want to create varX = (w, b, h, s) [4 cardinal variables] and varY = (yes, no) and then sample with replacement, with probability. ProbX = (0.6, 0.2, 0.15, 0.05) and ProbY = (0.8, 0.2). The sample size for X is X_total = (20, 50, 100, 200, 500 and 1000) and sample size for Y is each type of X for any given X_total and I have to do this by bootstrap and then compare them against the theoretical results of a 2D chi-square test. I am stuck at defining the variables X and Y. Can someone please help me with the syntax or point me to a tutorial on how to define variables in SAS? I'm familiar with R and have no clue on this one.

 

Thanks

2 REPLIES 2
ballardw
Super User

I don't understand some of your question because you are mixing "variable" with "value" , I think.

 

Without any other values involved one basic approach to making a data set with a fixed number of observations would look like this:

data yournamegoeshere;
   do i = 1 to 100;
       output;
   end;
run;

That will create as sequence of records with the value of the variable i = 1,2,3 ... 100, which you may see are the limits of the do loop starting at 1 ending at 100.

The Rand('table') function is the way to provide given probabilities to selecting values. I'll show the second as it has a useful bit of SAS information for use:

data example;
  do i= 1 to 10;
     y= rand('table',.2,.8)-1;
     output;
  end;
run; 

The table function will assign the value 1 with the first probability, 2 with second, 3rd etc. If your total of the probabilities provided totals less than 1 then then the remaining chance will go to N+1. If the total of the probabilities exceeds 1 this gets a bit harder to explain so leave it as best to have the parameters total to 1.

The above is my suggestion to create a variable that is coded 0/1 with 1 standing for Yes(or True). SAS will will treat the value 1 as True and 0 as False so you can simplify some code involving IF constructs to "If Y then <do something>". If you use Y/N or 'Yes/No' or similar then you have to explicitly state the value in comparisons. Additionally summary counts for a 1/0 coded variable can be accomplished with the SUM function, the sum is the number of 1's and the Mean is the percent of 1's as a decimal value.

The 4 level would look like:

data example;
  do i= 1 to 100;
     y= rand('table',.2,.9)-1;
     x= rand('table',0.6, 0.2, 0.15, 0.05) ;
     output;
  end;
run; 

Which will generate X of 1,2,3 or 4. If you want to SEE something like w, b, h, s then create a custom format and use that when you need to see those values.

Proc format;
value myx
1='w'
2='b'
3='h'
4='s'
;
run;

proc freq data=example;
format x myx. ;
run;

Formats are the basic way of controlling displayed values in SAS. They have an additional functionality that groups defined by a format will be used for reporting, analysis and most graphing tasks. So in your example if we created a format like

Proc format;
value myxbh
1='w'
2,3='bh'
4='s'
;
run;

We could

Proc freq data=example;
    format x myxbh. ;
run;

see the affects of having that 'bh' appear with roughly .35 probability without having to recreate the data set.

Note: do not end the name of a format with a digit. The digits in formats control display positions so myx3. would occupy 3 output positions in general (even when only 1 is used padded) .

 

Part of what I have no clue of is this: The sample size for X is X_total = (20, 50, 100, 200, 500 and 1000). IS implies a single value to me. My guess is that in SAS you either create multiple data sets by changing the upper limit of i in the examples OR create one data set with 1000 records and then filter the data set for use by using a Where statement to restrict which records are used for a given purpose. A third approach could be to use a format on the I variable.

data example;
  do i= 1 to 1000;
     y= rand('table',.2,.9)-1;
     x= rand('table',0.6, 0.2, 0.15, 0.05) ;
     output;
  end;
run; 

Proc format;
value myx
1='w'
2,3='bh'
4='s'
;
value i_20_
1 - 20 = 'first 20'
other='More than 20'
;
value yn
0='N'
1='Y'
;
run;

proc freq data=example;
   by i;
   table x*y /chisq;
   format i i_20_. x myx. y yn.;
run;
/* or */
Proc freq data=example;
   where i le 20;
   table x*y /chisq;
   format i i_20_. x myx. y yn.;
run;


 

 

Rick_SAS
SAS Super FREQ

If you want to simulate two independent discrete variables (uncorrelated), you can use the RAND function and the "Table" distribution. See

https://blogs.sas.com/content/iml/2011/07/13/simulate-categorical-data-in-sas.html

 

That article has a macro variable

%let NSim = 10000;  /* sample size. 20, 50, 100, 200, 500 and 1000 */

that determines the sample size.

 

I don't understand the rest of your questions. Since you are "familiar with R," you can post your R program and maybe that will help us understand what analysis you are attempting.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 592 views
  • 0 likes
  • 3 in conversation