About Damien_Mather

Damien_Mather · ‎12-08-2016

The best you can probably do is to compare the distibutions of any demographics you measured in your online survey to those obtained by national census or similarly geographically intensive large survey methods. What is left over after you do that is something that I too have been wondering about since 1992, and would be interested in what you find. Undercoverage also surfaces as an issue in the other context of secondary internal data predictive modelling, but with different problems and solutions.

Damien_Mather · ‎12-08-2016

The type of non -probability sample you drew might make a difference to the answer. What type was it?

Damien_Mather · ‎12-08-2016

yes, try adding an intercept keyword effect to the first random statement and moving the rep and block(rep) into a second random statement with subject=tree and type=vc, fa0(1), fa0(2) or chol options. Also adding ddfm=kr2 to the model statement options might help. I'm not sure if maxrank needs to be specified as a class statement - does it work as expected without?

Damien_Mather · ‎12-08-2016

my advice would be to use proc sql to generate a unique list of municipalities, then use surveyselect with method=srs to select a much smaller random sample of those, then proc sql again to do an inner join of the resuling municipality sample with your original data. Run your model on that sample. Keep taking smaller or larger samples until you find the tipping point for the error. The model then might then be your stopping point, or you can then allow you to usefully investigate other approaches that give you equivalent results that are not so memory hungry.

Damien_Mather · ‎12-08-2016

The community members will need more information before we can answer your question efficiently. Is depvar a binary or nominal measure, or is it the proportion you mention? (related) Do you have individual subject/participant measures or are they aggregated?

Damien_Mather · ‎12-08-2016

do you mean that you want to know the sample sizes required with enough sensitivity to pick up at least a 5% and 7% increase in binary responses respectively?

Damien_Mather · ‎12-08-2016

see http://documentation.sas.com/#!/?docsetId=statug&docsetVersion=14.2&docsetTarget=statug_lifereg_toc.htm&locale=en

Damien_Mather · ‎12-08-2016

I would recommend using proc surveyselect with unrestricted random sampling to draw the number of samples with reps if needed for under-represented locations from your survey that you would expect by location so the resampled total approximately equals the original survey total. You will get some repeats in the underweighted locations but that's how you rebalance the overall sample. Next time just use quota sampling by location.

Damien_Mather · ‎12-08-2016

I believe this is explained in the proceedure documentaation, but it takes some digging around to find it. The G- and R-side matricies are computed the same way regardless of dist=byobs(dist) or more ordinary fixed effects specification you use - the var/covar component is pooled/shared between the two (in your case) fixed effect matrix components of the model.

Damien_Mather · ‎12-08-2016

Must be a typo in the course notes.. When I use trusty SAS to do the calculation I get data x; infile cards; input x; cards; 93 89 88 84 83 82 79 78 78 77 74 73 72 68 67 63 ; proc means data=WORK.X chartype var vardef=df ; var x; run; Analysis Variable : x Variance 69.8666667 also.

Damien_Mather · ‎12-08-2016

oops I meant to only type populate!

Damien_Mather · ‎12-08-2016

SAS has a prohibition against macrovariables in datalines/cards statements. Use the varname1=&macvarnam1.; varname2=&macvarnam2.; ... varnameN=&macvarnamN.; output; method to get around that. If need to poopulate more than one data row, and more than one variable per data row, you can assign and reference macro arrays like: %do i = 1 %to &n.; %do j = 1 %to &m. varnam&j.=&&mvar&i._&j..; %end; %end; where n and m are the rows and columns of the data you want to load respectively and your macro variables are named from mvar1_1 to mvar&n._&m.

Damien_Mather · ‎12-01-2016

Thanks, @SASKiwi, not only a great teaching example, but also a comfort for all the other New Zealand based SAS users in the quake-affected zone who browse the community postings to see for themselves that the magnitude of the aftershocks generally decrease, albeit exponentially. As seen by other New Zealanders, Wellingtonians seem to take quakes in their stride, and almost downplay the many smaller quakes that Wellington experiences, but this set was pretty severe. Are you one of those, with nerves of steel?

Damien_Mather · ‎11-24-2016

OK, you asked for it.. Anonymity in New Zealand and Australia for... Primary research data: As a university researchers I always promise to store securely, limit access to a small group of named reseachers, analyse and summarise so that individual participants cannot be identified, then eventually destroy after a fixed number of years, all the primary survey and qualitative data I collect. As far as I know, this is mandatory if I want to get ethical approval for my research from my university, and ethical approval for primary research data gathering is in turn mandatory. So nobody outside the prior-nominated researcher group gets to see the data, anonymised or otherwise. Sometimes for removal of potential researcher bias the data is also anonymised for some researchers. Secondary internal data: There are commercial customer privacy laws that prevents firms from collecting and retaining customer data unless they can demonstrate that is part of their legitimate business activity, whilst legitimate business activity is understood to include any analysis that is designed to generate improvements in customer value and experience. Generally improvements in customer experience and value can in turn be tied to improvements in business efficiency and profitability. Analysing internal secondary data for anti-competitive and market-dominance-maintaining reasons is explicitly prohibited on penalty of severe fines, as is lax data firewall and handling security. For this type of data to be made publicly available for general learning, an Australasian firm would have to assume responsibility for the effectiveness of any anonymising process, to ensure its customers cannot ever be identified. So I can sort of understand why corporate legal councel insist on data anonymising protocols that err on the conservative side, i.e. go over the top, before any is released into the wild. Secondary external data: This is where it gets a little tricky for me. Data and metadata that is largely or primarily machine-generated, and does not explicitly identify individual users, like google page stats for selected government public websites, should be OK to analyse and reproduce unless explicitly protected by an agreed protocol, not that any come to mind apart from Creative Commons variations. However I'm never entirely sure that someone smarter than me might be able to identify users somehow. Any data in the public domain generated by someone typing something, like I am doing right now, is generally protected by author copyright, which means, at a minimum, any explicit reproduction should acknowledge authorship during the period of copyright, and, as in the case of Creative Commons, other protocols should be adhered to. Reproduction and explicit CC or other protocols aside, the data and metadata should be fairly available for analysis, which generally summarises the heck out of what is generally big corpora and structured data and metadata. But when we type, do we ever stop and think how someone might be openly and honestly attempting to analyse what we write, either specifically identifying us, or in bulk with others' creative output? As an academic, I do**, but how widespread is that consciousness? If writers are generally not aware, or limit their awareness to specific domains, how do we, as analysts, ensure their moral rights are fairly protected? Who analyses the analysts? If you were tired before reading this, you'll be asleep by now. Nitey-nite. ** I watch my own bibliographic metadata like a hawk to see who is citing me and to what extent!

Damien_Mather · ‎11-24-2016

More great teaching data links, thanks team! I'm loath to post anything negative about free data, especially when we get it for free, but I'd like to reflect some comments from my advanced business analytics class just finished this semester back to the community: Much of the kaggle data seemed so heavily anonymized to them so as to be unusable for many of their learning and research opportunities. Maybe this is due in part to the marketing (and therefore customer) orientation of my course, but it seems to me that some datasets are 'overanonymised' if that is indeed a word. I am keenly aware that useability, as a function of anonymity, for public data is an 'ideal point' (inverted parabolic or 'U' ) relationship, and that without anonymity we would have much less data available, and different domains generally have different views on what constitutes sufficient anonymity, and different protocols to ensure adequate privacy. Does anyone have a feel for characterizing some of the other teaching data sources on an anonymity scale?

Online Status	Offline
Date Last Visited	‎04-19-2024 01:15 AM

Re: SAS IML Scoring Logic and how to process faster

Re: %choiceff() macro searching for 251 parm designs halts with error:...

Re: %choiceff() macro searching for 251 parm designs halts with error:...

%choiceff() macro searching for 251 parm designs halts with error: ove...

Re: Nonlinear hypothesis tests

Re: Using a covariate in Proc Glimmix

Re: PROC GLIMMIX with Clustered data and multiple dependent variables

Re: Sample size calculation for multiple grouthe ps and a cluster rand...

Re: Sample size calculation for multiple grouthe ps and a cluster rand...

Re: Sample size calculation for multiple groups and a cluster randomiz...

SAS Global Forum_Wojdan_1652-2014.pdf

A SAS analysis of traffic to US Government websites

Re: Importing gunzipped txt files on linux

Re: Importing gunzipped txt files on linux

Evaluate experimental design with %choiceff ?

Re: SAS IML Scoring Logic and how to process faster

Re: web crawler

Re: Using a covariate in Proc Glimmix

Re: PROC GLIMMIX with Clustered data and multiple dependent variables

Re: PROC GLIMMIX with Clustered data and multiple dependent variables

Re: Removing Sampling Bias - Selection and Under Coverage

Re: proc surveyfreq question about strata

Re: Repeated measures in Glimmix with incomplete block design

Re: GLIMMIX for multilevel multinomial logistic regression

Re: PROC GLIMMIX with panel data

Re: Sample size calculation for proportion repeated measures

Re: Parametric proportional hazard

Re: Removing Sampling Bias - Selection and Under Coverage

Re: PROC GLIMMIX with Clustered data and multiple dependent variables

Re: Introduction to ANOVA, Regression, and Logistic Regression - Measu...

Re: How to create a database from scratch with macrovariables

Re: How to create a database from scratch with macrovariables

Re: North Canterbury, New Zealand Earthquake

Re: Need data for teaching or learning? Get it here!

Re: Need data for teaching or learning? Get it here!

SAS Inner Circle Panel

SAS Analytics Explorers