Please peruse following partial dataset:
i_X N Diff_2b Ratio_2a Slope_5050x_3 Slope_5050x_5
_20206_6 25 0.48 -1.41 1.82 0.21
_20206_6 75 0.82 2.40 -0.39 0.91
_20206_6 92 1.09 0.80 -1.45 0.19
_22101_0 14 0.28 0.23 1.09 -0.22
_22101_0 85 1.31 0.22 0.30 -0.51
_22101_0 49 -1.19 -0.40 0.23 -0.52
_22101_0 20 1.58 -0.03 -0.70 1.49
_22101_0 64 -0.73 -0.13 1.05 -0.55
_23002_6 97 1.22 1.64 0.40 1.28
_23002_6 24 -1.72 -0.12 -0.61 0.07
_23002_6 80 -0.09 0.25 2.65 -0.31
_20502_6 45 -0.02 -0.48 -0.35 0.20
_20502_6 71 -1.91 0.61 -0.86 0.31
_20502_6 13 1.43 -0.71 0.97 -0.43
I would like to collapse the dataset by the first column.
All the other numeric columns should be kept. Values in those columns should be averages based on the character values being collapsed on.
Any suggestions greatly appreciated.
Nicholas Kormanik
Hi:
In addition to Reeza's suggestion for SQL, you could also do this with TABULATE or REPORT. I prefer REPORT in something like this because the structure of the output dataset will be in the structure specified in your COLUMN statement and depending on your usage for the i_X variable, you will either get collapsing (GROUP usage) or ordering (ORDER usage).
Here's the PROC REPORT example. What you called N in your example, I called CNT because I wanted to use N in PROC REPORT to give me the number of rows.
cynthia
I am not sure if you looking for this. Otherwise please elaborate
proc sql;
create table want as
select i_X
,avg(N)
,avg( Diff_2b)
,avg(Ratio_2a)
,avg( Slope_5050x_3)
,avg( Slope_5050x_5)
from want
group by i_X;
quit;
Extend solutiom here by adding mean to the stats list.
Hi:
In addition to Reeza's suggestion for SQL, you could also do this with TABULATE or REPORT. I prefer REPORT in something like this because the structure of the output dataset will be in the structure specified in your COLUMN statement and depending on your usage for the i_X variable, you will either get collapsing (GROUP usage) or ordering (ORDER usage).
Here's the PROC REPORT example. What you called N in your example, I called CNT because I wanted to use N in PROC REPORT to give me the number of rows.
cynthia
Beautifully done and presented, Cynthia. And further demonstration of the powers of Proc Report.
Thanks very much!
Cynthia, two follow-up questions, please:
1) Your code as provided produces two adjacent columns (columns 2 and 3), "numrows" and "cnt". I would like to see as column 4 the product of those two columns, c2*c3, called, perhaps, "rows_x_cols". What would be the additional code to produce that?
2) Most columns resulting contain way too many decimal places, making things a little hard to read. What would be the additional code to format various columns as having just three (3) decimal places?
[Edit:] I see now, from other posts, that I could add the following toward the top:
format _numeric_ 9.3;
Adding this works to change the format of the report columns. Super! However, I would like to change the format of the exported dataset as well. Should I do that as a subsequent data step?
Thanks very much!
Worked perfectly, Cynthia.
Thanks a million!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.