BookmarkSubscribeRSS Feed
axescot78
Quartz | Level 8

I have questions about working with multiple columns using the proc univariate statement.

 

1) I need to create histograms with the column header in the title, as such:

proc univariate data=data noprint;
   histogram var1; 
   title 'histogram for var1';
run;

I know this

proc univariate data=data noprint;
   histogram;
run;

will produce histograms for all continuous variables, but how to include the column header in the title?

 

2) I need to replace extreme values of each continuous variable as missing value. I calculate the quartiles and IQR as such:

PROC UNIVARIATE DATA = data NOPRINT;
OUTPUT OUT = boxStats p25 = p25 p75 = p75 QRANGE = iqr;
RUN;

This saves the quartiles and IQR of the last column but not the remaining. Do I need to go through and create variables for each one? There is more than 20.

5 REPLIES 5
PGStats
Opal | Level 21

1) You can transpose and use #BYVAL processing, as long as you don't need the variable name on the X axis:

 

proc transpose data=sashelp.class out=classList name=varName;
var age height weight;
by name sex notsorted;
run;

proc sort data=classList; by varname; run;

option nobyline;
title "Histogram for #byval1";
proc sgplot data=classList;
by varName;
histogram col1;
xaxis display=(nolabel); /* remove the "Col1" label */
run;

PG
PGStats
Opal | Level 21

2) When you say OUTPUT p25=, you must specify a name list with a new name of every variable, such as

 

OUTPUT out=... p25=age25 height25 weight25;

PG
axescot78
Quartz | Level 8

Because there are so many variables, the code would be long and convuluted. However, this is how far I got with it:

/* Calculate Median and IQR */
PROC UNIVARIATE DATA = kddcup98(drop=TARGET_B) OUTTABLE= boxStats(keep=_VAR_ _Q1_ _Q3_ _QRANGE_) NOPRINT;
RUN;

 

/* Calculate upper and lower bounds */
DATA boxStats;
   SET boxStats;
   upper_bound = _Q3_ + 1.5*_QRANGE_;
   lower_bound = _Q3_ - 1.5*_QRANGE_;
RUN;


DATA kddcup98_continuous;
   SET kddcup98_continuous;

   ARRAY Num_Col[*] _NUMERIC_;
   DO i = 1 to dim(Num_Col);
      IF Num_Col[i] > boxStats[i, "upper_bound"] OR Num_Col[i] < boxStats[i, "lower_bound"] THEN Num_Col[i] = .;
   END;
RUN;

 

I have the main data table and a table of stats from which I computed upper and lower bounds. I need to reference those values from the boxStats table. How I can I reference those values?

Reeza
Super User
For #2 why do you need this? There may be other ways.
axescot78
Quartz | Level 8

I don't need it that specific way. Just trying to make SAS cooperate. I ended up getting it using symput and symget. 🙂

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2025 views
  • 1 like
  • 3 in conversation