Hello,
I'm estimating a series of cross tabulation using a large dataset. There are a few threshold categories - based on the sample counts- that needs to be considered for data release. For example, like this:
- category A (good quality): we use the total population and the sample counts is >=500. we use the subpopulation of age >=20 and the sample counts is >=700.
- category E (marginal quality): we use the total population and the sample counts is between 250 and 500 . we use the subpopulation of age >=20 and the sample counts is between 350 and 700.
- category F (poor quality): we use the total population and the sample counts is <250. we use the subpopulation of age >=20 and the sample counts is <350.
There are many cells from cross -tabulations and it is very time consuming to check every cell for the thresholds. is there any ways to categorize the output (proportions) according to the above threshold?
Thank you.
I find this impossible to understand without a small example. Show us realistic example data of your data, and what the desired output is.
Example data - before the cross tabulation
How you make the cross tab
What is the content of the "many cells"?
Your "rules" don't make much sense. Basing a category on a fixed number of counts is, in my mind, unreliable because your basic sample size might change. If the total sample size is fixed, then tell us what that number must be.
Thank you so much to "PaigeMiller" and Bllardw" for their responses.
My total sample size is fixed. I'm estimating the proportions of people for 12 categorical indicators disaggregated by sex, age, education .
Here is the example of data:
variable name: indicate 1
Answer categories code
great 1
moderate 2
small 3
not at all 4
variable name: indicate 2
Answer categories code
great 1
moderate 2
small 3
not at all 4
.
.
.
variable name: indicate 12
Answer categories code
great 1
moderate 2
small 3
not at all 4
an example of my syntax is (I'm using SUDAAN in SAS)
proc crosstab data=mydata design=BBR;
weight myweight;
repwgt mybootstrapweight;
tables gender * indicate1;
class gender indicate1;
setenv colwidth = 1;
output nsum serow rowper lowrow uprow /filename= out_ filetype=SAS;
data out;
set out_;
row= round (rowper, .1);
lowR= round (lowrow, .1);
upR= round (uprow, .1);
cv= round ((serow/rowper *100), .1);
keep gender indicate 1 nsum row lowR upR cv;
run;
I need to marked the output of "nsum" based on the thresholds, for example nsum>=220 be marked as A (good quality), 110>=nsum>=220 be marked as E (marginal quality) and nsum<110 be marked as F for poor quality.
Hope I could explain it well now.
Thank you so much.
An example of your output was specifically requested, but you didn't provide that. (For those of us who don't understand SUDAAN, your explanation doesn't help.)
It also seems that this is not a small example of the data that we asked for, it is just a description of your variables and what the values mean, which isn't sufficient, we need to see a small portion of the actual SAS data set you will be analyzing, in the format requested.
IF this is actually correct:
data out; set out_; row= round (rowper, .1); lowR= round (lowrow, .1); upR= round (uprow, .1); cv= round ((serow/rowper *100), .1); keep gender indicate 1 nsum row lowR upR cv; run; I need to marked the output of "nsum" based on the thresholds,
for example nsum>=220 be marked as A (good quality),
110>=nsum>=220 be marked as E (marginal quality) and
nsum<110 be marked as F for poor quality.
Then perhaps a custom format is a possible solution. HOWEVER you need to pick ONE category for the value 220. What you show has 220 in both A and E.
Here is one way with the 220 value in category E:
Proc format; value MyCategory 0 -<110 = 'F' 110 - 220 = 'E' 220<- high= 'A' ; run; proc print data=out noobs; var nsum; format nsum MyCategory.; run;
Formats are a very powerful tool in SAS. Consider if someone looks at your output and says what to things look like if set 150 for the boundary between F and E? Or wants a fourth result added. You create a different format for use at report or analysis time instead of creating new variables. The custom formats will work for all the report and most of the analysis and graphing procedures.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.