About AlexBeaver

AlexBeaver · ‎05-16-2025

Missed our event in Orlando? Catch up on the best of SAS Innovate 2025 — anytime, anywhere. Join us virtually with our complimentary SAS Innovate Digital Pass. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds. Register to stream now Already looking forward to next year? Save the date for SAS Innovate 2026 in Grapevine, Texas!

AlexBeaver · ‎05-14-2025

We are thrilled to announce the 2025 winners of the SAS Customer Recognition Awards! With over 60 inspiring entries submitted and several customers nominated for each SAS-chosen category, this year's awards program was an excellent showcase of the powerful impacts SAS customers are making across the globe. Please help us recognize the winners below. Check out the Winners e-book Awards from customer-submitted entries Community Uplift Award Awarded to a SAS customer who made an impact in their community at large using SAS products. 1 st Place Winner: North Carolina Agricultural and Technical State University, United States. 2 nd Place Winner: Shionogi & Co., Ltd., Japan. 3 rd Place Winner: Merck, United States. Innovative Problem Solver Awarded to a SAS customer who uses SAS in innovative ways to solve a business problem. 1 st Place Winner: AstraZeneca K.K., Japan. 2 nd Place Winner: ABSA, South Africa. 3 rd Place Winner: Centers for Disease Control and Prevention, Kenya. SAS Analytics Explorers Advocate Awarded to a SAS customer who is using the SAS Analytics Explorers program to grow their skills, career and/or network.. 1 st Place Winner: EY ifb, Markus Weick, Germany. 2 nd Place Winner: Merck, Murali Neela, United States. Awards from SAS nominations SAS Customer Impact Award – Public Sector Awarded to a public sector customer who has had the most impact through a willingness to share their analytics journey, successes and lessons learned with others. Winner: Iowa Department of Corrections, United States. The SAS and Iowa Department of Corrections partnership reflects the power of using an analytics framework that is scalable, repeatable, and flexible to provide a broad selection of analytic capabilities to achieve significant transformation and change the long-term trajectory for their correctional clients. Through a continuous SAS collaboration, IDOC can continue to pursue value streams that share critical justice information, providing a unified framework for client accountability, while focusing on reducing recidivism and increasing public safety, all while setting a data-driven standard for supervision and rehabilitation among their correctional peers. Congratulations to Sarah Fineran who is accepting this award on behalf of IDOC. SAS Customer Impact Award – Private Sector Awarded to a private sector customer who has had the most impact through a willingness to share their analytics journey, successes and lessons learned with others. Winner: Georgia Pacific, United States. Georgia-Pacific has been a SAS advocate since March 2020 when they first partnered with SAS to share their experience at SAS Global Forum. Since then, they have partnered with SAS on multiple video testimonials on topics ranging from Data Strategy, Computer Vision and Generative AI which all supplement their original story on Optimizing the Supply Chain and equipment efficiency with analytics and IoT. They have worked with SAS to publish multiple blog posts and articles on sas.com, provided quotes for press releases and supported the publication of their SAS story on the AWS website. We recognize Roshan Shah and Sam Coyne who are accepting the award on behalf of Georgia-Pacific. In-House User Group Leader Awarded to an In-House User Group Leader demonstrating a dedicated passion for the success of user group members. Winner: Marina Ayerst, Absa, South Africa. Marina Ayerst spearheads the aSASins Community of Practice in-house user group within ABSA. With weighted contributing factors, the aSASins group stood out among several worldwide nominees. Meeting monthly with high rates of consistent attendance, aSASins allows for cross-functional team collaboration and integration, playing a key role in driving an enterprise-wide SAS Viya modernization program for model risk management. User Feedback Award Awarded to a customer who provides valuable feedback on SAS products and has been an essential reference for product improvements. Winner: Secretaria de Estado de Fazenda de Minas Gerais, Brazil. SEF/MG has been a long-time, active SAS user and endorser, contributing to product improvements through several interactions with our support team. They have provided valuable insight for performance tuning accessing DB2, tuning of SAS Job Flow Scheduler that was not processing large flows – and always act as a partner concerned with the best usage of SAS products. In addition, SEF/MG is migrating to SAS Viya 4 and has been contributing to the evolution of SAS automation tools.

AlexBeaver · ‎04-04-2025

Curious about the future of artificial intelligence and technology? Don’t miss this opportunity to hear from SAS CEO Jim Goodnight and Microsoft Chairman and CEO Satya Nadella in a special, pre-recorded conversation at SAS Innovate. Reserve your seat! Head over to the SAS Innovate page to learn more about all the featured speakers you can see May 6 - 9 in Orlando, Florida. I hope to see you there!

AlexBeaver · ‎03-12-2025

🚨Early bird rate ends this Friday, March 14: Register now to save $200. 🚨 Get the early bird rate Don't miss out on the must-attend event of the year at a discounted rate! Happening May 6-9 in Orlando, Florida, SAS Innovate is not your average tech conference. Imagine groundbreaking innovation, cutting-edge technology, and invaluable insights that will elevate your skills and knowledge. Why You Need to Be There: Industry Leaders: Hear from the brightest minds in tech, business, and leadership, including Jim Goodnight, Satya Nadella, and Frank Abagnale. AI and Machine Learning: Dive into the latest breakthroughs in AI and learn how to leverage these technologies to solve your complex business challenges. Cloud Analytics: Discover how to harness the power of the cloud to accelerate your analytics journey and gain valuable insights from your data. Data Science and Advanced Analytics: Explore cutting-edge data science techniques and best practices for building and deploying successful analytics solutions. Industry-Specific Solutions: See how AI and analytics are transforming industries such as healthcare, finance, and manufacturing, providing tailored solutions to real-world problems. Hands-On Workshops: Participate in interactive sessions where you can apply what you’ve learned and gain practical experience. Networking Opportunities: Connect with thousands of professionals who share your passion for AI and analytics, opening doors to new collaborations and career opportunities. Register Now!

AlexBeaver · ‎02-20-2025

We're thrilled to announce the SAS Innovate 2025 agenda is now live! Whether you're a business leader, programmer, or data analyst, there is something for everyone. Check out the agenda Dive into the detailed agenda and browse all the top-tier learning opportunities including technical demos, hands-on workshops, industry-specific breakout sessions, and so much more. With over 200 sessions, 100+ demos and 60+ booths to visit, you'll be spoiled for choice. Use the filters to find the content most relevant to you. Narrow down the agenda by Session Type, Industry, Target Audience, Topic, and SAS Platform. If you haven't already, register for SAS Innovate now to be a part of these top-notch sessions!

AlexBeaver · ‎02-03-2025

The nominations are in and now is your chance to vote for the winners! Read through all the submissions for the following categories and give an entry a "Like" to help your favorites win. Community Uplift: Awarded to a SAS customer who has made an impact in their community at large using SAS products. Innovative Problem Solver: Awarded to a SAS customer who uses SAS in innovative ways to solve a business problem. SAS Analytics Explorers Advocate: Awarded to a SAS customer who is leveraging the SAS Analytics Explorers program to grow their skills, their career and/or their network. Vote Now! While the online voting is happening, we have a panel of six judges (Nancy Brucken, Josh Horstman, Quentin McMullen, Chris Hemedinger, Stacey Syphus, and Rajiv Ramarajan) that will score each entry on a 1-5 scale in three categories: Spirit of story Strength of evidence Results When the voting closes on February 12, the top three vote-getters will get bonus points added to the scores from the judges to create a total score for each entry. The top three scores in each category will be the winners, with the first-place winner getting a trip to SAS Innovate in Orlando!* Get all the details and meet the judges in this blog by Chris Hemedinger. *Please visit the SAS Customer Recognition Awards site for all the program details, rules and to cast your vote.

AlexBeaver · ‎01-29-2025

Thanks for your feedback. Brené’s expertise on resilience and leadership can inspire us to innovate boldly, and her insights into human connection can help us create more user-centric and ethical AI solutions. She’s also a master storyteller which is a critical skill in data science! SAS Innovate brings experiences and perspectives that our audience appreciates. Not all speakers have been announced yet, so stay tuned!

AlexBeaver · ‎01-15-2025

Just announced - Brené Brown will be joining us at SAS Innovate 2025, our biggest global event of the year! Dr. Brené Brown, a globally renowned researcher and storyteller, has spent over two decades studying courage, vulnerability, shame, and empathy. She is the author of six #1 New York Times bestsellers and hosts two award-winning podcasts. Brené works with organizations worldwide to develop braver leaders and more courageous cultures. Join us in Orlando, Florida on Thursday, May 8, for a fireside chat between Brené and SAS CMO Jenn Chase, where they’ll explore daring leadership and rising strong after setbacks. Don’t miss this opportunity to hear from one of the world’s foremost thought leaders! Register now!

AlexBeaver · ‎09-25-2024

SAS has made the decision to no longer deliver or support CData JDBC drivers for Facebook, Google Analytics, Google Drive, Microsoft OneDrive, and YouTube Analytics (the “Drivers”) in SAS Viya, effective immediately. This decision was made due to limited usage and technical issues that have caused restricted capabilities for the Drivers. SAS is removing these drivers from currently supported releases and all future releases of SAS Viya. For existing deployments that included the Drivers, SAS will continue to provide support in accordance with SAS Technical Support Policies. Please note that support documentation will remain accessible in accordance with SAS policies. In releases of the offering already deployed before 2024.07, customers will see no immediate change; the Drivers will be available with the software. However, if the software is re-downloaded and re-deployed – or updated to the 2024.07 or newer release – the Drivers will no longer be available. After this point any code using the Drivers will return an error stating that the Drivers do not exist. If you require the Drivers going forward, SAS recommends that you contact CData directly to obtain a license, work with another vender to obtain a license, to connect to these data sources. You can also learn how SAS connects with popular Microsoft 365 tools like Microsoft OneDrive, Teams and SharePoint. SAS Viya 4 releases shown below are affected by the removal of the Drivers. Current and 3 previous stable releases: 2024.06 2024.07 2024.08 2024.09 (Current) Current and 3 previous Long-Term Stable releases: 2022.09 LTS 2023.03 LTS 2023.10 LTS 2024.03 LTS (Current)

AlexBeaver · ‎09-25-2024

SAS has made the decision to no longer deliver or support CData JDBC drivers for Facebook, Google Analytics, Google Drive, Microsoft OneDrive, and YouTube Analytics (the “Drivers”) in SAS Viya, effective immediately. This decision was made due to limited usage and technical issues that have caused restricted capabilities for the Drivers. SAS is removing these drivers from currently supported releases and all future releases of SAS Viya. For existing deployments that included the Drivers, SAS will continue to provide support in accordance with SAS Technical Support Policies. Please note that support documentation will remain accessible in accordance with SAS policies. In releases of the offering already deployed before 2024.07, customers will see no immediate change; the Drivers will be available with the software. However, if the software is re-downloaded and re-deployed – or updated to the 2024.07 or newer release – the Drivers will no longer be available. After this point any code using the Drivers will return an error stating that the Drivers do not exist. If you require the Drivers going forward, SAS recommends that you contact CData directly to obtain a license, work with another vender to obtain a license, to connect to these data sources. You can also learn how SAS connects with popular Microsoft 365 tools like Microsoft OneDrive, Teams and SharePoint. SAS Viya 4 releases shown below are affected by the removal of the Drivers. Current and 3 previous stable releases: 2024.06 2024.07 2024.08 2024.09 (Current) Current and 3 previous Long-Term Stable releases: 2022.09 LTS 2023.03 LTS 2023.10 LTS 2024.03 LTS (Current)

AlexBeaver · ‎03-08-2024

Hi @ronan, we are migrating the Focus Areas pages! Scalability & Performance can now be found at https://support.sas.com/en/software/scalability-performance.html.

AlexBeaver · ‎03-07-2024

The SAS System provides the FULLSTIMER option to collect performance statistics on each SAS step, and for the job as a whole and place them in the SAS log. It is important to note that the FULLSTIMER measures only give you a snapshot view of performance at the step and job level. Each SAS port yields different FULLSTIMER statistics based on the host operating system. See the SAS host specific documentation for the exact statistics offered. FULLSTIMER is invoked as a SAS option and takes effect after the option invocation. If you would like to have the performance statistics written to a SAS data set, download the attached ZIP file which contains the experimental %LOGPARSE macro. Why start with the FULLSTIMER option for monitoring? The best reason is that it tells you what is happening with the SAS system specifically. The statistics it provides are at the job step and can help pinpoint performance problems down to the step. This is extremely helpful in narrowing troublesome activity, and relating it to what your code is telling the system to do. (Note: If the test execution is long, expensive, high impact to the environment, and is not easily set up, the SAS session monitoring can be done simultaneously with server and system performance monitoring.) FULLSTIMER measures can be used to help determine if more in-depth performance monitoring with host monitoring or third party tools is indicated. A sample result of a FULLSTIMER option UNIX output for a SAS Data Step is listed below: NOTE: DATA statement used: real time 0.06 seconds user cpu time 0.02 seconds system cpu time 0.00 seconds Memory 88k Page Faults 10 Page Reclaims 0 Page Swaps 0 Voluntary Context Switches 22 Involuntary Context Switches 0 Block Input Operations 10 Block Output Operations 12 It is important to know how these numbers are defined and what can be derived from them. FULLSTIMER Statistics Definition and Interpretation Real Time - the Real Time represents the elapsed time or "wall clock" time. This is the time spent to execute a job or step. This is the time the user experiences in wait for the job/step to complete. Note: As host system resources are heavily utilized the Real Time can go up significantly - representing a wait for various system resources to become available for the SAS job/step's usage. User CPU Time - the time spent by the processor to execute user-written code. This is user-written from the perspective of the operating system and not the customer's language statements. That is all SAS system code that is not operating system code. System CPU Time - the time spent by the processor to execute operating system tasks that support user-written code (all CPU tasks that were not executing user-written code). The user CPU time and system CPU time are mutually exclusive. Memory - Memory represents the amount of memory allocated to that job/step. This does not represent the entire amount of memory that the SAS session is consuming, as it does not reflect any SAS overhead activities (SAS manager, etc.). Page Faults - Represents the number of virtual memory page faults that occurred during the job/step. Page Faults are pages that required an I/O to retrieve (a read was done to the I/O subsystem). Page Reclaims - Represents the number of pages retrieved from the page list awaiting re-allocation (all done in memory). These pages did not require I/O activity to obtain. Page Swaps - The number of times a process was swapped out of main memory. Voluntary Context Switches - Represents the number of times a process releases its CPU time-slice voluntarily before it's time-slice allocation is expired. This usually occurs when the process needs an external resource, like making an I/O call for more data. Involuntary Context Switches - The number of times a process releases its CPU time-slice involuntarily. This usually happens when its CPU time-slice has expired before the task was finished, or a higher priority task takes its time-slice away. Block Input Operations - The number of "bufsize" reads that occur. These are I/O operations to read the data into memory for usage. Not all reads have to utilize an I/O operation since the page being requested may still be cached in memory from previous reads. Block Output Operations - This represents the number of "bufsize" writes that occur. These are the same as block input operations except that they pertain to the writes to files. As in the case of block input operations, not all block outputs will cause an I/O operation. Some files may still be cached in memory. Performance problems usually involve one or more of the following physical areas: CPU activity Memory activity I/O subsystem activity (disk and file systems) Network activity (this will be discussed outside the context of the SAS system later). By examining FULLSTIMER statistics, and interpreting what is happening with and between the factors producing the measures, we can get a quick idea of where the system is having problems. We can then resort to host-level and third-party measuring tools to obtain a very detailed picture of problem issue. If the host-level and third-party tools give such detail why not use them first? Very simply, there are many tools to use, and each is fairly good at one or more specific areas of investigation, such as CP, Memory, and I/O. Also some require Server Root-Level access to deploy. FULLSTIMER is quick and easy (incorporated in the SAS system), requires no special privileges, you can do it yourself, and it can help quickly narrow the field of things to test next. The following is a general list of interpretations you can make using FULLSTIMER: Real Time/CPU Time. The most valuable way to use FULLSTIMER is to compare timing information. By comparing the Real Time (elapsed time), with the total CPU time (system CPU time plus user CPU time) you can quickly determine if the problem is CPU related. If the Real time and total CPU time are within 15 percent of each other, this usually indicates that the system is moving data well (at least during the run time of that job/step processing). This means that the ratio of CPU process time is close to that of the total job. This indicates that the system memory, disk system, and file system are getting data to the CPU quickly enough to not be a problem. If you are experiencing bad task performance, and the real and CPU time are within 15 percent of each other, it most likely means that your task is CPU bound. The only way to improve the performance will be to get a faster CPU, split the process over more CPUs (multi-threading or parallel processing), or reengineer the code to be more efficient. If the Real time and total CPU time are routinely very disparate, (for example if there is a 50 percent margin between them), then you very likely have a problem in your system getting information to the CPU fast enough. Make a closer examination of the Memory and I/O subsystems using the host or third-party tools mentioned in the next section. Other valuable information from FULLSTIMER can be gained by looking at the other statistics: Memory. If a sizeable quantity of memory is used and your elapsed time differs greatly from your total CPU time, you may also want to take a close look at your memory using host or third-party tools that are mentioned in the next section. Involuntary Context Switches. If Involuntary Context Switches are consistently high across many steps and jobs over long time-periods, then your CPU system is under a heavy load, and you will want to examine that more closely with the tools mentioned in the next section. Page Swaps. If Page Swaps are consistently high then your memory system is being stressed, and needs more examination. Other statistics like Block Input and Output operations, Page Faults and Reclaims, and Voluntary Context switches can hint at issues, but require more corroboration from the measures previously discussed to make a case for narrowing down investigation. These measures could be high in-and-of themselves without being a symptom of performance problems. Once FULLSTIMER statistics have been examined, they should help indicate which area(s) should be examined in more detail. It is often the case on overloaded systems that multiple areas present themselves for examination. The FULLSTIMER activity should help point to tools that could be used to get a more detailed level picture of any hardware/file system issues. This comprises our next step, detecting performance issues at the host server system level. Note: This content was originally published on support.sas.com.

AlexBeaver · ‎01-09-2024

SPD Engine or SPD Server? The SPD Engine and the SPD Server product share a common heritage and, therefore, share a great number of features and performance benefits. However, there are some important differences, primarily in the execution environment. SPDE should be considered an entry-level scalable product. It runs as a libname engine in the SAS environment. SPD Server is a standalone client/server product. Applications initially developed on SPDE can be migrated to SPD Server with ease, as the need to move to a full client/server environment arises. Compared to SPDE, SPD Server: requires its computer be a mostly dedicated server; the more dedicated, the better. requires more skills to set up and administer than SPDE. supports multi-user client/server access. supports an Access Control List–based security model. supports the SQL functions: parallel BY-group (PBG) processing. implicit pass-through. is not available on Linux (LNX), OS/390 (MVS), HP/UX for the Itanium Processor Family (H6I), or OpenVMS Alpha (ALP). For more information, visit the SPD Server Learn & Support page. SPDE Engine or Base Engine? SPDE is optimized for the storage and sequential access of large and very large data sets (millions of rows, many GB of data). For medium to small data sets, the base engine is often a better performer. Compared to the base engine, SPDE: supports more than 32K columns in SAS 9 and later. The base engine supports more than 32K columns in SAS 9.1 and later. is the only SAS engine that supports more than 231 - 1 (approximately 2 billion) rows on 32-bit hosts. supports the implicit sort for BY processing. supports optimization of the WHERE expression with multiple indexes. supports optimization of the WHERE expression containing OR. supports partitioned data sets. locks at the member level; the base engine locks at the record level. requires an index-reorganization utility to rebalance the index tree. does not support some of the base engine features: utility (byte) files. catalogs. views. MDDBs. integrity constraints. data set generations. CEDA. audit trail. options for national language support. Content originally published in 2003. If you're looking for a solution for advanced analytics and real-time data processing, see SAS with SingleStore.

AlexBeaver · ‎01-09-2024

The attached documents and samples provide the detail on how to configure high availability of your critical SAS services using SAS Grid Manager. Visit the SAS Grid Manager support page for more information about this product. High Availability Services with SAS Grid Manager This document addresses the requirements for implementing High Availability (HA) services running in a SAS grid environment using the EGO capabilities of Platform Suite for SAS which is included with SAS Grid Manager. Configuration examples are included that provide details for configuring essential SAS services to be Highly Available in the grid. server_wrap.sh This is a UNIX shell script that can be used to wrap a service init script. It will keep execution in the foreground until the service daemons exit. ego_server.sh This is a UNIX shell script for interfacing with egosh. Installing and Configuring SAS Environment Manager in a SAS Grid Environment This document describes the additional configuration steps needed when deploying SAS Environment Manager in a SAS Grid with a shared configuration directory. It also documents the deploy-ev-agents.sh script that automates this process. This script is available at the link below. deploy-ev-agents.sh This is a UNIX shell script that automates the steps necessary to deploy SAS Environment Manager in a SAS Grid Environment.

AlexBeaver · ‎12-21-2023

Overview Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population. Researchers apply probability-based scientific designs to select the sample in order to reduce the risk of a distorted view of the population and to enable statistically valid inferences to be made from the sample. The SURVEYMEANS, SURVEYFREQ, SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures in SAS/STAT software properly analyze complex survey data by taking into account the sample design. You can use these procedures for multistage or single-stage designs, with or without stratification, and with or without unequal weighting. The survey analysis procedures provide a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife. When you use most other SAS/STAT procedures, statistical inference is based on the assumption that the sample is drawn from an infinite population by simple random sampling. If the sample is in fact selected from a finite population by using a complex survey design, these procedures usually do not calculate the estimates and their variances according to the design that is actually used. Using analyses that are not appropriate for your sample design might lead to incorrect statistical inferences. However, there might be times when you want to analyze data that are sampled from a finite population by using a complex survey design, but the only SAS/STAT procedure capable of fitting the type of model that you need is not designed to account for sampling based on a complex survey design. In such cases, you can sometimes use a non-survey procedure to obtain valid point estimates of the model parameters, and use the SURVEYMEANS procedure and a little programming to obtain valid standard errors for the model parameter estimates. Specifically, this example demonstrates how to combine the generalized linear modeling capabilities of the GENMOD procedure and the delete-1 jackknife (resampling) method of the SURVEYMEANS procedure to fit a Poisson model to count data that are sampled from a finite population by using a complex survey design. Performing the delete-1 jackknife estimation of the standard errors of the model parameter estimates requires fitting a model to each of the jackknife replicates. As is typical in programming, there is more than one way to perform most tasks. This example demonstrates two different ways to accomplish the same task. Step 3a: Fit a Model to Each Replicate Sample by Using BY-Group Processing uses the GENMOD procedure’s BY-group processing capabilities to fit a model to each replicate; this is the most efficient method. Step 3b: Fit a Model to Each Replicate Sample by Looping Through the Replicates demonstrates how to perform the same task by using a SAS macro to loop through the replicates. Looping is less efficient than by-group processing but requires less computer memory, which might become an issue if you have a very large sample. Analysis Obtaining Point Estimates of Model Parameters Consider a finite population whose members are indexed by U = {1, 2, ...,N} and where F N is the set of values for the population. Suppose you specify a population density function ƒ(y,θ), where the parameter θ is of interest. If the entire population is observed, then this likelihood can be used to estimate . Let be the desired estimator. is obtained by maximizing the log likelihood can be used to estimate θ. Let θ N be the desired estimator. θ N is obtained by maximizing the log likelihood. with respect to θ. Assume that probability sample A is selected from the finite population U and π i is the selection probability for unit i. An estimator of the finite population log likelihood is A sample-based estimator for the finite population quantity θ N can be obtained by maximizing the pseudo-log-likelihood l π (θ) with respect to θ. The design-based variance for is obtained by assuming the set of finite population values F n to be fixed. For more information about maximum pseudo-likelihood estimators and other inferential approaches for survey data, see Kish and Frankel (1974); Godambe and Thompson (1986); Pfeffermann (1993); Korn and Graubard (1999, chapter 3); Chambers and Skinner (2003, chapter 2); and Fuller (2009, section 6.5). The practical implication of the preceding analysis is that if a SAS/STAT procedure performs weighted maximum likelihood estimation and the weights are applied such that the weights can be factored out of the log likelihood, then that procedure can generate valid point estimates of model parameters when the data are sampled according to a complex survey design. The WEIGHT statement in the GENMOD procedure identifies a variable in the input data set to be used as the exponential family dispersion parameter weight for each observation. The exponential family dispersion parameter is divided by the WEIGHT variable value for each observation. This is true regardless of whether the parameter is estimated by the procedure or specified in the MODEL statement by using the SCALE= option. It is also true for distributions such as the Poisson and binomial that are not usually defined to have a dispersion parameter. For these distributions, a WEIGHT variable weights the overdispersion parameter, which has the default value of 1. Consider a Poisson regression model of the observed number of counts, y i , on a set of covariates, x i , for units i ∈ A. Assume that y i ~ Piosson(θ i ) and the mean θ i of the response in the ith observation is related to a linear predictor through the link function log(θ i ) = x i 'β where β is a vector of unknown parameters. The log likelihood can be written as Because the weight, w i , can be factored out of the log likelihood, you can use PROC GENMOD with a WEIGHT statement to obtain valid point estimates of the model parameters. Caution However, the log likelihood for the negative binomial model is The weight, w i , cannot be factored out of the log likelihood, so you cannot use PROC GENMOD with a WEIGHT statement to obtain point estimates of the model parameters that account for the unequal weights. Whereas the weighted maximum likelihood point estimates that PROC GENMOD generates appropriately account for the unequal weights for distributions such as the Poisson, the weighted maximum likelihood variances and standard errors that PROC GENMOD computes do not account for the complex survey design. You must compute the variances and standard errors by using a different method. One such method is the delete-1 jackknife (resampling) method. Obtaining Variance Estimates by Using the Delete-1 Jackknife Method The jackknife method of variance estimation deletes one primary sampling unit (PSU) at a time from the full sample to create replicates. This method is also known as the delete-1 jackknife method, because it deletes exactly one PSU in every replicate. The total number of replicates R is the same as the total number of PSUs. In each replicate, the sampling weights of the remaining PSUs are modified by the jackknife coefficient α r . The modified weights are called replicate weights. Let PSU i in stratum h r be omitted from the rth replicate; then the jackknife coefficient and replicate weights are computed as and You can use the VARMETHOD=JACKKNIFE(OUTJKCOEFS=) method-option with any of the survey estimation procedures to store the jackknife coefficients in a SAS data set and use the VARMETHOD=JACKKNIFE(OUTWEIGHTS=) method-option to store the replicate weights in a SAS data set. Let be the estimated parameters from the full sample, and let be the estimated parameters for the rth replicate. You can estimate the covariance matrix of by It is common to assume that the distribution of can be approximated by using a x 2 distribution with R – H degrees of freedom, where R is the number of replicates and H is the number of strata, or R – 1 degrees of freedom when there is no stratification. If one or more components of cannot be calculated for some replicates, then you use only the replicates for which the parameters can be estimated. Estimability and nonconvergence are two common reasons why might not be available for a replicate sample even if is defined for the full sample. Let R α be the number of replicates where are available, and let R –R α be the number of replicates where are not available. Without loss of generality, assume that is available only for the first R α replicates; then the jackknife variance estimator is with R α – H degrees of freedom, where H is the number of strata. Example Consider a hypothetical regional survey that seeks to describe the number of visits to health professionals that are made annually by members of a population. The survey is conducted by using a stratified clustered sampling design. The following statements create the SAS data set Counts. The variable Visits is a count variable that records the number of visits to a health professional; Sex is a binary variable that records the gender of the respondent; Race is a categorical variable that records each respondent’s race; Marital is a categorical variable that records each respondent’s marital status; Private is a categorical variable that records whether a respondent has private health insurance, and if so, what type; Education is a categorical variable that records each respondent’s highest attained level of education; Person is a respondent’s unique identifier; Strata identifies the stratum from which each observation is sampled; PSU identifies the primary sampling units; and SamplingWeight records the sampling weights. data counts; input visits sex race marital private education person strata psu SamplingWeight @@; datalines; 5 1 1 2 1 5 71511 1 1 1002.59 1 2 1 4 2 3 307568 1 1 1002.59 2 1 1 4 4 3 457473 1 1 1002.59 9 1 1 3 1 5 849963 1 1 1002.59 3 2 1 3 2 5 892466 1 1 1002.59 0 2 1 2 3 3 249075 1 2 1002.59 3 1 1 2 4 1 835408 1 2 1002.59 1 2 1 4 2 4 159262 1 3 1002.59 ... more lines ... 2 2 1 1 4 2 244599 5 40 998.26 1 2 1 3 4 4 738928 5 40 998.26 2 2 1 3 2 2 830211 5 40 998.26 3 1 1 3 2 3 920025 5 40 998.26 ; run; Step 1: Generate the Jackknife Coefficients and Replicate Weights In the first step in the process, you generate the jackknife coefficients and replicate weights by using the SURVEYMEANS procedure and save the number of replicates and the number of strata in macro variables. The following statements analyze the variable Visits and save the jackknife coefficients and replicate weights in the data sets JKcoefs and JKweights, respectively. It does not matter which variable you choose to analyze; the jackknife coefficients and replicate weights are the same regardless of the variable that you choose. If the replicate weights are available to you, then you can skip the PROC SURVEYMEANS step. However, you still need to create the macro variables &Replicates and &H, which are generated to contain the number of replicates and the number of strata, respectively. ods select none; ods output VarianceEstimation=VE Summary=Summary; proc surveymeans data=counts plots=none varmethod=jackknife(outweights=jkweights outjkcoefs=jkcoefs); cluster psu; strata strata; weight SamplingWeight; var visits; run; The first statement suppresses all ODS output. You can omit this statement if you want to see the output from each step. The ODS OUTPUT statement saves variance estimation table in the data set VE and saves the sampling design summary information in the data set Summary. VE contains the number of jackknife replicates that are created, and Summary contains the number of strata. Both the number of jackknife replicates and the number of strata are later retrieved and saved in macro variables. The VARMETHOD=JACKKNIFE option in the PROC SURVEYMEANS statement specifies the delete-one jackknife variance estimation method. The OUTWEIGHTS= suboption saves the jackknife replicate weights in the data set JKweights. The OUTCOEFS= suboption saves the jackknife coefficients in the data set JKcoefs. The CLUSTER, STRATA, and WEIGHT statements specify the sampling design. The VAR statement names the variable to be analyzed. The following statements retrieve the number of replicates from the VE data set and the number of strata from the Summary data set. These values are stored in the macro variables &Replicates and &H, respectively. data _null_; set VE(where=(Label1="Number of Replicates")); call symput('replicates',cValue1); run; data _null_; set Summary; if Label1="Number of Strata" then do; call symput('H',cValue1); end; run; Step 2: Fit the Model by Using the Full Sample and the Original Sampling Weights In the second step you fit a model by using the full sample and the original sampling weights. You then compute the number of parameters that are estimated by using the full sample and save that value in a macro variable. The following statements use the GENMOD procedure to fit a Poisson model by using the full sample and the original sampling weights: ods output ParameterEstimates=FullSample(where=(Parameter ne "Scale") keep=Parameter Estimate Level1 rename=(Estimate=Estimate0)) ParameterEstimates=parms(keep=df); proc genmod data=jkweights; class sex race marital private education; weight SamplingWeight; model visits = sex race marital private education / dist=poisson; run; The ODS OUTPUT statement saves the parameter estimates from the Poisson model to the data set FullSample; the scale parameter is excluded and the variable Estimate, which contains the parameter estimates, is renamed Estimate0. The same statement also saves the variable DF, which contains the number of regression parameters that are estimated by using the full sample, in the data set Parms. The DATA= option in the PROC GENMOD statement specifies that the data set JKweights, which contains the original data as well as the replicate weights, be used. The CLASS statement names the classification variables to be used as explanatory variables in the analysis. The WEIGHT statement specifies that the variable SamplingWeight be used as the exponential family dispersion parameter weight for each observation. The MODEL statement specifies the response variable and the explanatory variables, and the DIST= option specifies the Poisson distribution. The following statements compute the number of parameters that are estimated by using the full sample and saves that value in the macro variable &P. This step is needed because the full model might not be defined in some replicate samples and you need to exclude replicate models that do not have the same number of parameters as the full model. ods output Statistics=statistics; proc surveymeans data=parms sum; var df; run; data _null_; set statistics; call symput('p',Sum); run; Step 3a: Fit a Model to Each Replicate Sample by Using BY-Group Processing In the third step, you need to prepare the data set that contains the original data and the jackknife weights (JKweights) so that you can use the GENMOD procedure’s BY-group processing capabilities. You then use the GENMOD procedure’s BY-group processing capabilities to fit a Poisson model to each replicate. The data set JKweights is in what is known as wide form. This means that there is one observation for each respondent and there are R variables that contain the replicate weights. To use BY-group processing, the data must be in what is known as long form. In long form, you have R observations for each respondent and a single variable that contains the jackknife replicate weights. The following statements create and call the macro %STACK, which reshapes the JKweights data set from wide form to long form. It creates the variable Replicate, which indexes the R copies of the original data, and the variable Repweight, which contains the replicate weights, and it sorts the newly reshaped data set by the variable Replicate. The macro has one required argument, DATA=, which specifies the name of the data set that contains the original data as well as the replicate weights. %macro stack(data=); data &data; set &data; %do i=1 %to &replicates; Replicate=&i; Repweight=RepWt_&i; output; %end; run; proc sort data=&data; by replicate; run; %mend stack; %stack(data=jkweights) The following statements fit a Poisson model to each replicate: ods output ParameterEstimates=jkparms(where=(Parameter ne "Scale") keep=Replicate Parameter Estimate Level1) ParameterEstimates=jkdf(where=(Parameter ne "Scale") keep= Replicate Parameter Level1 df) ConvergenceStatus=converge; proc genmod data=jkweights; class sex race marital private education; weight repweight; model visits = sex race marital private education / dist=poisson; by replicate; run; The ODS OUTPUT statement saves the parameter estimates for all R models in the data set JKparms, the degrees of freedom for all the models in the data set JKDF, and the convergence status for all the models in the data set Converge. The WEIGHT statement specifies that the variable RepWeight be used as the exponential family dispersion parameter weight for each observation. The BY statement requests separate analyses of observations in groups that are indexed by the variable Replicate. Step 3b: Fit a Model to Each Replicate Sample by Looping Through the Replicates Rather than fitting a model to each replicate sample by using the GENMOD procedure’s by-goup processing capabilities, you can write and execute the macro %JKLOOP. This method is less efficient but requires less computer memory, which might become an issue if you have a very large sample. The macro %JKLOOP has one required argument, REPLICATES=, which specifies the number of jackknife replicates. The macro loops through the R replicates and fits a Poisson model by using the appropriate replicate sample and jackknife replicate weights. The parameter estimates from each model are saved in the temporary data set Temp, the degrees of freedom for each model is saved in the temporary data set Temp2, and the convergence status of the model is saved in the temporary data set Temp3. A series of DATA steps then add the variable Replicate to Temp, Temp2, and Temp3. The data sets Temp, Temp2, and Temp3 are then appended to the data sets JKparms, JKDF, and Converge, respectively. Finally, the variable Estimate in the data set FullSample is renamed Estimate0. The following statements create the macro %JKLOOP: %macro jkloop(replicates=); %local _nopt; %let _nopt = %sysfunc(getoption(notes)); options nonotes; ods select none; %do i=1 %to &replicates; ods output ParameterEstimates=temp(where=(Parameter ne "Scale") keep=Parameter Estimate Level1) ParameterEstimates=temp2(where=(Parameter ne "Scale") keep=Parameter Level1 df) ConvergenceStatus=temp3; proc genmod data=jkweights; class sex race marital private education; weight RepWt_&i; model visits = sex race marital private education / dist=poisson; run; data temp; set temp; Replicate=&i; run; data temp2; set temp2; Replicate=&i; run; data temp3; set temp3; Replicate=&i; run; proc append base=jkparms data=temp; run; proc append base=jkdf data=temp2; run; proc append base=converge data=temp3; run; %end; data FullSample; set FullSample; rename estimate=estimate0; run; ods select all; options &_nopt; %mend jkloop; %jkloop(replicates=&replicates) Step 4: Compute the Jackknife Variances and Print the Results In the fourth step, you merge the full-sample parameter estimates, the parameter estimates from the R replicates, and the jackknife coefficients into a single data set; compute the jackknife variances of the parameter estimates; and print the results. Because generalized linear models are not guaranteed to converge and because the full model might not be defined in some replicate samples, the following statements check to see how many of the replicate models both converged and have the same number of parameters as the full-sample model. This number is retrieved and saved in the macro variable &R. The number is used later to compute confidence intervals for the parameter estimates. ods output Statistics=statistics(keep=replicate sum); proc surveymeans data=jkdf sum; var df; by replicate; run; data statistics; set statistics; full=ifn(sum=&p,0,1); run; data converge; merge converge statistics; by replicate; run; data converged; set converge(where=(Status=0 & full=0)); run; data nobs; dsid=open("converged"); converged_replicates=attrn(dsid, "nobs"); call symput('R',converged_replicates); run; The following statements create the data set JK by sorting and merging the data sets JKparms, Converge, FullSample, and JKcoefs, which contain the parameter estimates from the replicate models, the convergence status of the replicate models, the parameter estimates that were obtained by using the full sample, and the jackknife coefficients, respectively: proc sort data=jkparms; by parameter level1; run; proc sort data=FullSample; by parameter level1; run; data jk; merge jkparms FullSample; by parameter level1; run; proc sort data=jk; by replicate parameter level1; run; data jk; merge jk jkcoefs converge; by replicate; run; The next statements create the data set JKconverged by subsetting the data set JK so that JKconverged contains only parameter estimates from the replicate models that converged and that have the same number of parameters as the full-sample model. The variable SqrDev is created by computing the weighted squared deviations of the parameter estimates; the jackknife coefficients are used as the weights. JKconverged is then sorted by the variables Parameter and Level1. data jkconverged; set jk(where=(Status=0 & full=0)); sqrdev=JKCoefficient*(estimate-estimate0)**2; run; data vce; set jkconverged(keep= replicate parameter level1 estimate estimate0); diff=estimate0-estimate; run; proc sort data=jkconverged; by parameter Level1; run; The following statements compute the sum of squared deviations of the parameter estimated by using PROC SURVEYMEANS. The computed sums are in fact the jackknife variances of the parameter estimates. The ODS OUTPUT statement saves the computed variances in the data set JKvariance. ods output Statistics=jkVariance; proc surveymeans data=jkconverged sum plots=none; var sqrdev; by parameter Level1; run; The following DATA step merges the data set JKvariance, which contains the jackknife variances, with the data set FullSample, which contains the full-sample parameter estimates. The variable StdErr is created by computing the square roots of the variances; the full covariance matrix of the parameter estimates is computed later. The variables UL and LL are also created to contain the 95% confidence limits of the parameter estimates. data jkVariance(drop=stddev varname); merge jkVariance fullsample(rename=(estimate0=Estimate)); by parameter Level1; StdErr=sqrt(Sum); rename Sum=Variance; DF=&R - &H; t=quantile('T', .975, &R-&H); ul=estimate+t*stderr; ll=estimate-t*stderr; label ul="Upper 95% CL"; label ll="Lower 95% CL"; run; The following statements print the parameter estimates, the standard errors, the degrees of freedom, and the 95% confidence limits: ods select all; title "Survey Poisson Regression"; title2 "with Delete-1 Jackknife Variance Estimation"; proc print data=jkVariance noobs label; var Parameter Level1 Estimate StdErr DF ll ul; run; title;title2; Output 1 displays the parameter estimates, the jackknife standard errors, the degrees of freedom, and the 95% confidence limits. The table displays how the numbers of visits made by different groups are different. For example, the average number of visits made by a female is exp(0.08) times higher than the average number of visits made by males, after adjusting for race, education, marital status, and private insurance coverage in the study population. However, because the 95% confidence interval contains 0, the difference is not statistically significant at the 0.05 level. Output 1: Parameter Estimates and Jackknife Confidence Intervals Survey Poisson Regression with Delete-1 Jackknife Variance Estimation Parameter Level1 Estimate StdErr DF Lower 95% CL Upper 95% CL Intercept 0.2854 0.10767 195 0.07304 0.49772 education 1 0.0412 0.09177 195 -0.13976 0.22221 education 2 0.1228 0.06841 195 -0.01214 0.25770 education 3 0.0355 0.06498 195 -0.09266 0.16365 education 4 -0.0217 0.06656 195 -0.15294 0.10960 education 5 0.0000 0.00000 195 0.00000 0.00000 marital 1 0.0366 0.06832 195 -0.09816 0.17133 marital 2 0.0489 0.06579 195 -0.08083 0.17868 marital 3 0.2383 0.05623 195 0.12739 0.34917 marital 4 0.0000 0.00000 195 0.00000 0.00000 private 1 1.3705 0.06588 195 1.24061 1.50047 private 2 -0.0805 0.06181 195 -0.20245 0.04137 private 3 -0.1291 0.09129 195 -0.30919 0.05090 private 4 0.0000 0.00000 195 0.00000 0.00000 race 1 0.1219 0.08017 195 -0.03617 0.28006 race 2 0.2789 0.09114 195 0.09912 0.45863 race 3 0.0000 0.00000 195 0.00000 0.00000 sex 1 0.0848 0.04523 195 -0.00444 0.17398 sex 2 0.0000 0.00000 195 0.00000 0.00000 Step 5: Compute the Full Jackknife Covariance Matrix In the fifth and final step, you use statements such as the following to generate the covariance matrix of the parameter estimates, which you need if you want to perform hypothesis tests that involve more that one parameter: proc sort data=jkdf; by replicate parameter level1; run; data temp; merge vce jkdf; by replicate parameter level1; run; proc transpose data=temp(where=(df=1)) out=temp2(drop=_name_) prefix=parm; by replicate; var diff; run; data temp3(drop=donorstratum); merge temp2 jkcoefs; by replicate; do i=1 to &p; row=i; output; end; run; data temp3(drop=parm: jkcoefficient i j); set temp3; array col[&p]; array parm[*] parm:; do i=1 to &p; if row=i then do; do j = 1 to &p; col[j]=jkcoefficient*parm[i]*parm[j]; end; end; end; run; proc sort data=temp3; by row; run; ods select none; ods output Statistics=statistics(drop=StdDev); proc surveymeans data=temp3 sum plots=none; var col1-col14; by row; run; ods select all; proc transpose data=statistics out=CovB(drop=_name_ row) prefix=parm; var sum; by row; run; proc print data=covb noobs; run; Output 2 displays the covariance matrix. Output 2: Parameter Estimates Covariance Matrix parm1 parm2 parm3 parm4 parm5 parm6 parm7 parm8 parm9 parm10 parm11 parm12 parm13 parm14 0.011592 -0.002057 -0.002977 -0.002101 -0.002247 -0.001635 -0.001023 -0.001872 -0.002992 -0.002539 -0.002066 -0.005925 -0.006169 -0.001456 -0.002057 0.008421 0.002328 0.001955 0.001429 -0.000579 -0.000520 -0.000238 0.000445 0.000262 -0.000020050 0.000185 0.000172 0.000033987 -0.002977 0.002328 0.004680 0.002394 0.002534 0.000059913 -0.000012181 0.000237 -0.000311 -0.000336 -0.000905 0.000299 0.000561 0.000466 -0.002101 0.001955 0.002394 0.004223 0.002354 0.000010428 0.000209 0.000257 -0.000263 -0.000492 -0.000900 0.000306 0.000008650 -0.000108 -0.002247 0.001429 0.002534 0.002354 0.004430 0.000297 0.000194 0.000095299 -0.000622 -0.000470 -0.000869 0.000372 0.000142 -0.000077882 -0.001635 -0.000579 0.000059913 0.000010428 0.000297 0.004668 0.001877 0.001714 0.000139 0.000295 0.001283 -0.000067801 0.000103 -0.000332 -0.001023 -0.000520 -0.000012181 0.000209 0.000194 0.001877 0.004328 0.001706 -0.000478 -0.000080900 0.000821 -0.000177 -0.000350 -0.000416 -0.001872 -0.000238 0.000237 0.000257 0.000095299 0.001714 0.001706 0.003161 0.000479 0.000203 0.000504 -0.000323 -0.000230 0.000121 -0.002992 0.000445 -0.000311 -0.000263 -0.000622 0.000139 -0.000478 0.000479 0.004340 0.002663 0.002782 0.000424 0.000725 0.000012122 -0.002539 0.000262 -0.000336 -0.000492 -0.000470 0.000295 -0.000080900 0.000203 0.002663 0.003821 0.003159 0.000039949 0.000069672 -0.000154 -0.002066 -0.000020050 -0.000905 -0.000900 -0.000869 0.001283 0.000821 0.000504 0.002782 0.003159 0.008334 -0.000706 -0.000586 0.000038905 -0.005925 0.000185 0.000299 0.000306 0.000372 -0.000067801 -0.000177 -0.000323 0.000424 0.000039949 -0.000706 0.006427 0.005783 0.000355 -0.006169 0.000172 0.000561 0.000008650 0.000142 0.000103 -0.000350 -0.000230 0.000725 0.000069672 -0.000586 0.005783 0.008307 0.000708 -0.001456 0.000033987 0.000466 -0.000108 -0.000077882 -0.000332 -0.000416 0.000121 0.000012122 -0.000154 0.000038905 0.000355 0.000708 0.002046 References Chambers, R.L., and Skinner, C. J. (2003). Analysis of Survey Data. Chichester, UK: John Wiley & Sons. Fuller, W.A. (2009). Sampling Statistics. Hoboken, NJ: John Wiley & Sons. Godambe, V.P., and Thompson, M.E. (1986). “Parameters of Superpopulation and Survey Population: Their Relationships and Estimation.” International Statistical Review 54:127–138. Kish, L., and Frankel, M.R. (1974). “Inference from Complex Samples.” Journal of the Royal Statistical Society, Series B 36:1–37. Korn, E.L., and Graubard, B.I. (1999). Analysis of Health Surveys. New York: John Wiley & Sons. Pfeffermann, D. (1993). “The Role of Sampling Weights When Modeling Survey Data.” International Statistical Review 61:317–337.

Online Status	Offline
Date Last Visited	Tuesday

SAS Innovate Digital Pass: Stream Event Content from Anywhere!

Announcing the 2025 Customer Recognition Awards Winners

Can't Miss: Microsoft CEO Satya Nadella & Dr. Jim Goodnight

Last Chance to Save $200 on SAS Innovate 2025 Registration!

Agenda is now available for SAS Innovate 2025

It’s Time to Vote! - SAS Customer Recognition Awards

Re: Brené Brown joins as keynote speaker at SAS Innovate 2025

Brené Brown joins as keynote speaker at SAS Innovate 2025

Information on select CData JDBC drivers in SAS Viya 4

Information on select CData JDBC drivers in SAS Viya 4

Modernizing the SAS customer support experience

50 years of SAS

Information on select CData JDBC drivers in SAS Viya 4

SAS Innovate Digital Pass: Stream Event Content from Anywhere!

Announcing the 2025 Customer Recognition Awards Winners

Can't Miss: Microsoft CEO Satya Nadella & Dr. Jim Goodnight

Last Chance to Save $200 on SAS Innovate 2025 Registration!

Past Conference Proceedings: 1976-onward

SAS Innovate Digital Pass: Stream Event Content from Anywhere!

Announcing the 2025 Customer Recognition Awards Winners

Can't Miss: Microsoft CEO Satya Nadella & Dr. Jim Goodnight

Last Chance to Save $200 on SAS Innovate 2025 Registration!

Agenda is now available for SAS Innovate 2025

It’s Time to Vote! - SAS Customer Recognition Awards

Re: Brené Brown joins as keynote speaker at SAS Innovate 2025

Brené Brown joins as keynote speaker at SAS Innovate 2025

Information on select CData JDBC drivers in SAS Viya 4

Information on select CData JDBC drivers in SAS Viya 4

Re: FULLSTIMER SAS Option

FULLSTIMER SAS Option

When Should You Use the Scalable Performance Data Engine (SPDE)?

High Availability with SAS Grid Manager

Poisson Regressions for Complex Surveys

SAS Innovate 2026

SAS Explore

SAS Analytics Explorers