I'm looking for how to implement the spss aggregate command in sas. I'm sure that some you use more than sas and have used/use spss and you may be familiar with this command. Let me give you an example that applies to this specific problem.
Aggregate outfile=*/break=var1 var2/varA varB varC=first(varA varB varC)/varJ varK=last(varJ varK).
Let me explain: Imagine a dataset with multiple records per combination of values of var1 and var2. We want a new dataset with one record per var1*var2 combination such that the value of varA, varB, and varC is the first non-missing value of the target variable for the var1*var2 combination in the original dataset. Likewise, the value of varJ and varK is the last non-missing value of the target variable.
Now, command details. "outfile=*" means that the to-be created dataset replaces the currently open dataset (alternatively, a file name may be provided for the to-be created dataset). "break=var1 var2" specifies the variables that will define records in the to-be created dataset (I think you might substitute the sas syntax word "by"). I've shown that the same variable names are reused, e.g., varA. Of course, a new variable name may be used, as in v27=first(varA).
Here's a very simple example.
Suppose this input dataset.
x y a b j k
2 1 2 . 3 4
2 1 2 5 8 1
2 1 1 9 7 8
2 2 8 . 1 1
3 3 . 3 5 2
3 3 . 4 2 6
aggregate outfile=*/break=x y/a b=first(a b)/j k=last(j k).
The resulting dataset
x y a b j k
2 1 2 . 7 8
2 2 8 . 1 1
3 3 . 3 2 6
3 3 . 4 2 6
Although "first" and "last"are two functions i'm now interested in, there is a much larger set of functions and the command has greater functionality that i've ignored for this question.
I'm a very new sas user and although i did the standard google search and I found nothing. However, i may not have used the best search string, but just like in spss or stata, if you don't know what you are searching for is called in sas, you'll never find it.
Thank you,
Gene Maguin
Note: For more normal types of aggregations look at PROC MEANS (also known as PROC SUMMARY). Easy enough to calculate min/max/mean and other statistics per group. Perhaps the IDGROUP option could even do some of these exotic combinations you are looking for.
Your result does not seem to match your description. Your posted results seem to be taking the value from the first observation for the group and not the first non missing value within the group. Also your result has two observations for the last group, but only one for the first two groups.
If the goal is to find the first and last OBSERVATION for the group then use BY group processing.
data have ;
input x y a b j k;
cards;
2 1 2 . 3 4
2 1 2 5 8 1
2 1 1 9 7 8
2 2 8 . 1 1
3 3 . 3 5 2
3 3 . 4 2 6
;
data first last;
set have;
by x y;
if first.y then output first;
if last.y then output last;
run;
data want;
merge first(keep=x y a b) last(keep=x y j k);
by x y ;
run;
Results:
Obs x y a b j k 1 2 1 2 . 7 8 2 2 2 8 . 1 1 3 3 3 . 3 2 6
If you really want the LAST non-missing value you can use the UPDATE statement.
If you really want the FIRST non-missing value you could pull in non missing values for each variable separately and take the first observation for each group.
data last ;
update have(obs=0) have;
by x y;
run;
data first;
merge
have(keep=x y a where=(not missing(a)))
have(keep=x y b where=(not missing(b)))
have(keep=x y j where=(not missing(j)))
have(keep=x y k where=(not missing(k)))
;
by x y;
if first.y;
run;
data want;
merge first(keep=x y a b) last(keep=x y j k);
by x y;
run;
Results:
Obs x y a b j k 1 2 1 2 5 7 8 2 2 2 8 . 1 1 3 3 3 . 3 2 6
Use the coalesce() function:
data have;
input x y a b j k;
datalines;
2 1 2 . 3 4
2 1 2 5 8 1
2 1 1 9 7 8
2 2 8 . 1 1
3 3 . 3 5 2
3 3 . 4 2 6
;
data want;
set have (rename=(a=_a b=_b j=_j k=_k));
by x y;
retain a b j k;
if first.y then call missing(a,b,j,k);
a = coalesce(a,_a);
b = coalesce(b,_b);
j = coalesce(_j,j);
k = coalesce(_k,k);
if last.y;
drop _:;
run;
data want;
set have (rename=(a=_a b=_b j=_j k=_k));
/* I rename the incoming variables, so I can use their names for new ones */
by x y;
/* sets up by-group processing */
retain a b j k;
/* create new variables, and have them keep their values across data step iterations */
if first.y then call missing(a,b,j,k);
/* initialize all new variables to missing at a group change */
a = coalesce(a,_a);
/* as long as the incoming value is missing, a will stay missing */
/* the first non-missing value will be kept */
b = coalesce(b,_b);
j = coalesce(_j,j);
/* will stay missing until non-missing value */
/* subsequent non-missing values will overwrite, until only missing values
come or the next group change is encountered */
k = coalesce(_k,k);
if last.y; /* output only when group ends */
drop _:; /* throw away the old variables */
run;
If you have further questions, feel free to ask.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.