About Doc_Duke

Doc_Duke · ‎09-08-2016

When you take a log transform, you are now in the realm of the geometric distribution. Google geometric mean site:sas.com for lots of information on your question.

Doc_Duke · ‎09-08-2016

try a google search compare import export methods site:sas.com Here's one http://support.sas.com/resources/papers/proceedings11/074-2011.pdf

Doc_Duke · ‎08-24-2016

You mention DOS. If you are batching this on a single pc, you are unlikely to gain much by running Jobs 2 and 3 concurrently. Many SAS jobs use a lot more I/O than CPU and there is only one I/O path on most PCs. Even with multiple servers, you may not see much gain unless the data are spread across multiple drives and you are running multi-gigabit pipes between the data and the servers.

Doc_Duke · ‎08-11-2016

I would approach this a bit differently. I'd keep the case and control datasets separate and apply the rand_num as you have done. Drop the SORT and RANK. (BTW, RANK is working correctly. SSN is a unique key and it restarts the ranking every time the by level changes.) Do the SQL, but change to two datasets and drop the rand_num part of the JOIN. Include the case.ssn with the control.ssn in the output dataset and drop the other case variables.. This will get you ALL of the matches for each case, with duplication, so now your task is to de-duplicate. Non-trivial, but doable. OR, you can use one of the man 1:N matching macros that people have already written. Google search for sas matching macro and you will find a number. Statistically, a 1:5 match isn't that much more informative than a 1:2 match. Depending on your next steps, you could also use all fo the matches (1:1-1:x) and have a perfectly valid analysis. The use of a limited number of matches is valuable if you have to do manual abstraction (e.g. extra labor), but if you are totally working with computer files, what's a few extra cycles to use them all.

Doc_Duke · ‎08-11-2016

In addition to Rick's comments, the Gamma, OR, and RR, and confidence limits are all based on asymptotic theory, so they should not be used with sample sizes this small. PROC FREQ has a number of EXACT tests that are more appropriate.

Doc_Duke · ‎08-11-2016

The syntax you have is not technically correct (missing multiple semi-colons), so it is hard to understand what you are really trying to do. The DATA step, as stated, does not require the data sets to be sorted; is there a BY statement missing?

Doc_Duke · ‎08-05-2016

take a look at http://blogs.sas.com/content/sasdummy/2012/12/18/using-sas-to-access-data-stored-on-dropbox/ for methods to get to the data you want in a dynamic fashion.

Doc_Duke · ‎08-03-2016

The approach that I have used is to create a new variable in the data set based on the RANUNI funciton and then use that to make a pseudo-random selection of a prorportion of the records for audit. I'm sure you could set up something similar in PROC SQL so you didn't have to do multiple passes of the data.

Doc_Duke · ‎07-19-2016

PG's right. Just removing data to make the model fit is a form of "data dredging" and can lead to erroneous conclusions. There have been several books written describing how research has gotten into trouble that way. Those three observations may be the most important in your data set. Identifying what made them outliers may be much more important than the analysis of the "well behaved" data.

Doc_Duke · ‎07-14-2016

As you have learned the hard way, it is good not to modify the original data. Now that you have done it, deleting the extra observations is a simple data step: DATA haveless; SET have; IF _n_>2010 THEN DELETE; *Assumes added rows at the end; RUN; If you still have the original data, a better approach to augmentation might be: DATA havemdd0; DO _I_ =1871 to 2010; mdd=0; OUTPUT; END; DROP _i_; RUN; DATA have2; SET have havemdd0; RUN;

Doc_Duke · ‎07-14-2016

Most often, I have seen that hapen when I have a typo in a varaible name. The second most often time I see it is when I am accessing a replacement data set (e.g. monthly processing) and the underlying input dataset has changed. Excel is a frequent culprit.

Doc_Duke · ‎07-07-2016

This may fit into the class of data considered 'rare events'. SAS/QC recently added PROC RAREEVENTS to produce Shewhart control charts using the hypergeometric distribution. Though not a statistical comparison per se, the graph may be able to illustrate the difference much better than a p-value. The most recent SUGI proceedings had a nice article on it. http://support.sas.com/resources/papers/proceedings16/SAS4040-2016.pdf

Doc_Duke · ‎07-06-2016

A guess that should help. Copy the subset of lib2 to WORK and do the ID transform during the copy. The left join may be making multiple passes through lib2 and input is not the most efficient function in SAS.

Doc_Duke · ‎07-06-2016

One trick is to re-format the data. You can put a space before the yes, so it is " yes" and leave the no as "no". If th alignment causes you issues, you can use a non-printable character (you'll need to look at your collating sequence for that).

Doc_Duke · ‎06-16-2016

I would typically use ordinal. The Mann-Whitney test is a rank-based test, so the results are invariant under monotonic transformations. If you have some idea of the underlying distribution, then you could choose that. However, ordinal will always work. There is a huge literature on why one should NOT do post hoc power analysis. That doesn't keep my clients from asking for it.

Online Status	Offline
Date Last Visited	‎05-03-2025 03:08 PM

Re: Chi-Square WARNING

Re: Exposure-adjusted event rate

Re: Time to event analysis- ADTTE

Re: How can I Find Code that opens a given dataset?

Re: How to extract date from start date (description part) and create ...

Re: ERROR: No valid observations are found.

Re: setting VALIDVARNAME as the default for the duration of the sessio...

Re: SAS Enterprise Guide Project Size Increasing with no changes

Re: How to calculate confidence interval for crude rate by using age g...

Re: Why will my proc lifetest code not work?

Re: Invalid third argument to function SUBSTR

Re: Cleanwork command for /saswork and /sasutil cleanup

Re: Why this message “Data set Limit Reached” appear in “Project Tree...

Re: Chi-Square WARNING

Re: Power calculation for proportions

Maxims of Maximally Efficient SAS Programmers

Re: Global English Guidelines for Community Members

Re: How do I interpret the log transformed CL for the difference in SA...

Re: SAS - Methods of importing and exporting data

Re: Mixing consecutive and concurrent jobs in a batch file - DOS

Re: match 1:5 unique case controls...

Re: How to understand the great difference value of Gamma in PROC FREQ...

Re: use proc sql to rewrite code with "data set"

Re: ERROR: The connection has timed out..

Re: Select Observations at Random for Data Checking

Re: Removing observation to solve non-proportionality?

Re: How to add and delete observations within a range

Re: WARNING: The variable in the DROP, KEEP, or RENAME list has never ...

Re: two sample tests on rates of infection

Re: Proc SQL with left join clause so long in execution.

Re: proc freq how to change the order of yes no

Re: Power Analysis & Mann-Whitney test