BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Please peruse following partial dataset:

 

i_X	N	Diff_2b	Ratio_2a	Slope_5050x_3	Slope_5050x_5
_20206_6	25	0.48	-1.41	1.82	0.21
_20206_6	75	0.82	2.40	-0.39	0.91
_20206_6	92	1.09	0.80	-1.45	0.19
_22101_0	14	0.28	0.23	1.09	-0.22
_22101_0	85	1.31	0.22	0.30	-0.51
_22101_0	49	-1.19	-0.40	0.23	-0.52
_22101_0	20	1.58	-0.03	-0.70	1.49
_22101_0	64	-0.73	-0.13	1.05	-0.55
_23002_6	97	1.22	1.64	0.40	1.28
_23002_6	24	-1.72	-0.12	-0.61	0.07
_23002_6	80	-0.09	0.25	2.65	-0.31
_20502_6	45	-0.02	-0.48	-0.35	0.20
_20502_6	71	-1.91	0.61	-0.86	0.31
_20502_6	13	1.43	-0.71	0.97	-0.43

I would like to collapse the dataset by the first column.

 

All the other numeric columns should be kept.  Values in those columns should be averages based on the character values being collapsed on.

 

Any suggestions greatly appreciated.

 

Nicholas Kormanik

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Cynthia_sas
Diamond | Level 26

Hi:

  In addition to Reeza's suggestion for SQL, you could also do this with TABULATE or REPORT. I prefer REPORT in something like this because the structure of the output dataset will be in the structure specified in your COLUMN statement and depending on your usage for the i_X variable, you will either get collapsing (GROUP usage) or ordering (ORDER usage).

 

  Here's the PROC REPORT example. What you called N in your example, I called CNT because I wanted to use N in PROC REPORT to give me the number of rows.

 

cynthia

 

report_slope.png

View solution in original post

8 REPLIES 8
RahulG
Barite | Level 11

 

I am not sure if you looking for this. Otherwise please elaborate

proc sql;
create table want as
select  i_X	
,avg(N) 
,avg(	Diff_2b) 
,avg(Ratio_2a)
,avg(	Slope_5050x_3)
,avg(	Slope_5050x_5)
from want
group by i_X;
quit;


 

Cynthia_sas
Diamond | Level 26

Hi:

  In addition to Reeza's suggestion for SQL, you could also do this with TABULATE or REPORT. I prefer REPORT in something like this because the structure of the output dataset will be in the structure specified in your COLUMN statement and depending on your usage for the i_X variable, you will either get collapsing (GROUP usage) or ordering (ORDER usage).

 

  Here's the PROC REPORT example. What you called N in your example, I called CNT because I wanted to use N in PROC REPORT to give me the number of rows.

 

cynthia

 

report_slope.png

NKormanik
Barite | Level 11

Beautifully done and presented, Cynthia.  And further demonstration of the powers of Proc Report.

 

Thanks very much!

 

 

NKormanik
Barite | Level 11

Cynthia, two follow-up questions, please:

 

1) Your code as provided produces two adjacent columns (columns 2 and 3), "numrows" and "cnt". I would like to see as column 4 the product of those two columns, c2*c3, called, perhaps, "rows_x_cols". What would be the additional code to produce that?

 

2) Most columns resulting contain way too many decimal places, making things a little hard to read. What would be the additional code to format various columns as having just three (3) decimal places?

 

[Edit:]  I see now, from other posts, that I could add the following toward the top:

 

format _numeric_ 9.3;

Adding this works to change the format of the report columns.  Super!  However, I would like to change the format of the exported dataset as well.  Should I do that as a subsequent data step?

 

Thanks very much!

 

 

Cynthia_sas
Diamond | Level 26
HI:
Do I remember correctly that you might be sending your output from the REPORT step or the dataset to Excel? If so, I would not look for the SAS FORMAT statement to impact ODS TAGSETS.EXCELXP or ODS EXCEL or even ODS CSV -- Excel is very persnickety about whether it uses SAS formats -- for the 2 XML based Excel destinations you can use TAGATTR to control the Microsoft format to be used by Excel. Most other SAS destinations (HTML, RTF, PDF will respect SAS formats).

To change the report, you only need to use a format statement as you show OR, if you want different formats for different variables -- as, for example if you have dates and numbers, then put your format= or f= in the DEFINE statement.

If you want to "compute" a new report column based on the value of other columns on the report, such as you describe, then you would need to use a COMPUTE block. There are a lot of previous forum postings about the "left to right" rule of PROC REPORT and the special way you need refer to analysis variables. However, in my example, numrows is an alias and CNT is an analysis variable.

So if he COLUMN statement changed to something like this:
column grpvar n=numrows cnt newvar var3 var4 var5;

Then you would need:
define newvar / computed;

and then later in the PROC REPORT step:
compute newvar;
newvar = numrows * cnt.sum;
endcomp;

I suggest you look in the PROC REPORT documenation for more examples of calculating report columns using a COMPUTE block.

cynthia
NKormanik
Barite | Level 11

Worked perfectly, Cynthia.

 

Thanks a million!

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3337 views
  • 6 likes
  • 4 in conversation