SAS Support Communities

Adam_Black · ‎07-27-2017

Yes, the mode is what I am after. So I guess I have to do this aggregation one column at a time and then merge the results since there is no MODE aggregate function in PROC SQL. Thanks for the help!

Adam_Black · ‎07-20-2017

I would like to aggreate the columns of a dataset based on the plurality of non-missing values for each column. Suppose my dataset was Name Color Food Jane Red Sushi Jane Blue Jane Red John Green Yogurt John Green Sushi John Green Yogurt John Red I would like to summarize my dataset using something like this: proc sql; select Name, plurality(Color) as Color, plurality(Food) as Food from raw_data group by Name; The result would be Name Color Food Jane Red Sushi John Green Yogurt The plurality function would return the value that occurs most often after missing values are removed. Ties could be handled using alphabetical order. What is the best way to accomplish this data transformation in SAS (version 9.3 or 9.4)? (Is it possible to combine a user defined function with proc sql to accomplish this?)

Adam_Black · ‎10-31-2016

Hi, I would like to add two simple indexes to a large dataset based on "column1" and "column2". Is adding the two simple indexes effectively the same thing as sorting the dataset on "column1" and adding an index based on "column2" assuming that the datasets will not be sorted again in the future? The options I'm considering are: proc datasets library=mylib; modify largeDataset; index create column1; index create column2; quit; vs. proc sort data=mylib.largeDataset; by column1; run; proc datasets library=mylib; modify largeDataset; index create column2; quit; Wouldn't the second option be more space efficient than the first? Thanks for your help! Adam SAS version 9.3

Adam_Black · ‎07-06-2016

Thanks for the documentation references. I was expecting all three of the the methods that add one to the variable 'a' to give me the same result. Also I mistakenly thought that the sum statement, "a+1;", is equivalent to "a = a +1;" In fact, the sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here: retain variable 0; variable=sum(variable,expression); Thanks.

Adam_Black · ‎07-06-2016

I'm confused about the way SAS handles missing values. I've recently realized I have to be very careful about assuming what SAS will do when it encounters a missing value. Here is a simple example. data _null_; a = .; b = a + 1; c = sum(a,1); a+1; put a= b= c=; run; The result is: a=1 b=. c=1 This means that adding 1 to missing with + results in missing, but adding 1 to missing with either the sum function or increment operator results in 1. Is there any logical reason for this behavior? Thanks!

Adam_Black · ‎05-18-2016

For future reference, in case anyone else has the same issue, the solution below is what I was really after. Its output is a single table with statistics for each area. This is very helpful if you have many areas. I'm gradually learning how to use the ODS! ods exclude all; proc stdrate data=counts refdata=aggregate method=indirect stat=rate(mult=100) plots=smr ; population event=death total=denom; reference event=death total=denom; strata Age; by area; ods output smr=Smr_Cs; run; ods exclude none; proc print data=Smr_Cs; run;

Adam_Black · ‎12-08-2015

I would like to reproduce this simple example using PROC STDRATE. I do not see any way to specify a group option when using the indirect standardization method. http://www.dartmouthatlas.org/downloads/methods/indirect_adjustment.pdf Here is my code so far. data counts; input area $ age $ death denom ; datalines; Area1 65-69 6 500 Area1 70-74 15 300 Area1 75-79 20 200 Area2 65-69 3 300 Area2 70-74 12 300 Area2 75-79 36 400 ; run; proc sql; create table aggregate as select age, sum(death) as death, sum(denom) as denom from counts group by age; quit; /* I need an option to group by area */ proc stdrate data=counts refdata=aggregate method=indirect stat=rate(mult=100) ; population event=death total=denom; reference event=death total=denom; strata Age; run; Thanks for your help!

Adam_Black · ‎08-25-2015

In the following code... data all; input x @@; datalines; 1 2 3 4 5 6 7 8 9 10 ; data even; do _n_=1 to howmany; set all nobs=howmany; if ^mod(x,2) then output; end; run; I would love it if you could explain the control flow of the second data step.Namely, is there an implied loop created by the set statement or is the implied loop overridden by the outer do loop? Thanks again for your help!

Adam_Black · ‎08-25-2015

In response to "Values on the RIGHT of the = are known (variables or constants). Values on the LEFT of the = can be new or already known variables." The behavior that I think is odd is that SAS allows new variables on the RIGHT side of the =. For example.. data out; new_var1 = new_var2; run; NOTE: Variable new_var2 is uninitialized. NOTE: The data set WORK.OUT has 1 observations and 2 variables. SAS does print a note telling me that new_var is uninitialized but allows it nevertheless. This note could get lost in the log a large program making a variable name typo a hard error to find.

Adam_Black · ‎08-24-2015

Thank you all for your help. Here are a couple examples of what I am talking about. The simplest example is the following. data output; if new_var = . then put "new_var exists and was never declared"; run; A more complicated example comes from a problem I was trying to solve involving a sohisticated merge. Imagine we have a dataset with babies and the days they were born. We also have a dataset with doctors containing flags for the days they worked at the hospital. I wanted to create a dataset that would list all the possible baby-doctor combinations such that the doctor might have delivered the baby. ie. The doctor worked on the baby's birthday. Below is the solution which I adapted from code someone posted online in response to this question. data babies; input baby_name $ birth_day birth_day_name $; datalines; Jake 1 day1 Sonny 4 day4 North 5 day5 Apple 6 day6 ; run; data doctors; input DrLastname $ day1 day2 day3 day4 day5 day6; datalines; Jones 1 0 0 1 1 1 Lewis 1 1 1 0 0 1 Smith 0 1 1 1 0 1 ; run; data babies_doctors_array; array drnames[3] $10 _temporary_; array drdays[3,6] _temporary_; /* load doctors dataset into temp arrays */ if _n_=1 then do i = 1 to nobs_doctors; set doctors point=i nobs=nobs_doctors; array days day1-day6; drnames=DrLastname; do j = 1 to dim(days); drdays[i,j]=days ; end; end; /* go through babies to find doctors that worked on thei birthday*/ set babies; do k = 1 to nobs_doctors; if drdays[k,birth_day]=1 then do; babys_doctor = drnames ; output; end; end; keep baby_name birth_day babys_doctor; run; proc print data=babies_doctors_array; run; The variable nobs_doctors is used in the do loop before the set statement in which it is declared. The most recent case of this I've encountered that prompted me to start this discussion looks like it is a coding error to me. Here is a really stripped down version of the code. data raw; format dos date9.; input id dos mmddyy. comp1 comp2 comp3; datalines; 1 121299 1 0 0 1 121299 0 1 0 1 101103 0 1 0 2 030400 1 1 0 2 030400 0 0 0 2 040400 0 0 1 3 041190 0 1 0 4 092090 0 0 1 4 051589 0 1 0 5 040300 0 0 0 5 071710 1 0 0 5 070899 0 1 0 6 030299 0 1 0 7 121200 1 0 0 ; run; proc print data=raw;run; proc sort data=raw; by id dos; run; data fin; set raw; by id dos; /* not sure about using compsum before it is defined */ if compsum = 0 then no_comps = 1; compsum = sum(comp1, comp2, comp3); run; This just looks like a mistake to me and illustrates why I think this behavior is dangerous. It makes this kind of coding error hard to catch. Thanks again for all your help. -Adam

Adam_Black · ‎08-24-2015

Thank you for the references! I have found "The SAS Supervisor..." paper particularly helpful. I was aware of the different ways to declare variables in SAS but did not understand how to think about undeclared variables as in the following example. data output; if new_var = . then put "new_var exists but was never declared"; run; It sounds like in this example the SAS supervisor creates new_var and initializes it to missing at compile time. Then the if statement is performed during execution. This seems like dangerous behavior to me. I could imagine that a typo in a variable name would be a difficult error to find since it would not create an warning. Instead SAS automatically defines a new variable. Thanks for the help.

Adam_Black · ‎08-24-2015

I have encountered the following situation a handful of times and it has always confused me. As I read through a datastep I notice that a variable is used before it is declared or assigned an initial value. That is, the first mention of a variable as I read the code from top to bottom is in a statement that assumes the variable already has a value. I think I remember reading that the datastep does some pre-processing, perhaps in which all variables are created, before any statements are executed.Would someone please explain when referring to a variable before it is declared is allowed in a datastep and how to correctly think about this situation. Thanks! Adam Black

Online Status	Offline
Date Last Visited	‎06-11-2018 11:42 AM

SAS Support Communities

Re: How to aggregate columns based on plurality?

How to aggregate columns based on plurality?

Indexing vs. Sorting

Re: Confused by how sas handles missing values

Confused by how sas handles missing values

Re: Group option with PROC STDRATE when using indirect method

Group option with PROC STDRATE when using indirect method

Re: When is referring to a variable before it is defined allowed in a ...

Re: When is referring to a variable before it is defined allowed in a ...

Re: When is referring to a variable before it is defined allowed in a ...

Re: How to aggregate columns based on plurality?

Re: Indexing vs. Sorting

Re: Indexing vs. Sorting

Re: Confused by how sas handles missing values

Re: Confused by how sas handles missing values

Re: Group option with PROC STDRATE when using indirect method

Re: How to aggregate columns based on plurality?

How to aggregate columns based on plurality?

Indexing vs. Sorting

Re: Confused by how sas handles missing values

Confused by how sas handles missing values

Re: Group option with PROC STDRATE when using indirect method

Group option with PROC STDRATE when using indirect method

Re: When is referring to a variable before it is defined allowed in a ...

Re: When is referring to a variable before it is defined allowed in a ...

Re: When is referring to a variable before it is defined allowed in a ...

Re: When is referring to a variable before it is defined allowed in a ...

When is referring to a variable before it is defined allowed in a SAS ...

Follow Us

What is...