BookmarkSubscribeRSS Feed
axescot78
Quartz | Level 8

I have questions about working with multiple columns using the proc univariate statement.

 

1) I need to create histograms with the column header in the title, as such:

proc univariate data=data noprint;
   histogram var1; 
   title 'histogram for var1';
run;

I know this

proc univariate data=data noprint;
   histogram;
run;

will produce histograms for all continuous variables, but how to include the column header in the title?

 

2) I need to replace extreme values of each continuous variable as missing value. I calculate the quartiles and IQR as such:

PROC UNIVARIATE DATA = data NOPRINT;
OUTPUT OUT = boxStats p25 = p25 p75 = p75 QRANGE = iqr;
RUN;

This saves the quartiles and IQR of the last column but not the remaining. Do I need to go through and create variables for each one? There is more than 20.

5 REPLIES 5
PGStats
Opal | Level 21

1) You can transpose and use #BYVAL processing, as long as you don't need the variable name on the X axis:

 

proc transpose data=sashelp.class out=classList name=varName;
var age height weight;
by name sex notsorted;
run;

proc sort data=classList; by varname; run;

option nobyline;
title "Histogram for #byval1";
proc sgplot data=classList;
by varName;
histogram col1;
xaxis display=(nolabel); /* remove the "Col1" label */
run;

PG
PGStats
Opal | Level 21

2) When you say OUTPUT p25=, you must specify a name list with a new name of every variable, such as

 

OUTPUT out=... p25=age25 height25 weight25;

PG
axescot78
Quartz | Level 8

Because there are so many variables, the code would be long and convuluted. However, this is how far I got with it:

/* Calculate Median and IQR */
PROC UNIVARIATE DATA = kddcup98(drop=TARGET_B) OUTTABLE= boxStats(keep=_VAR_ _Q1_ _Q3_ _QRANGE_) NOPRINT;
RUN;

 

/* Calculate upper and lower bounds */
DATA boxStats;
   SET boxStats;
   upper_bound = _Q3_ + 1.5*_QRANGE_;
   lower_bound = _Q3_ - 1.5*_QRANGE_;
RUN;


DATA kddcup98_continuous;
   SET kddcup98_continuous;

   ARRAY Num_Col[*] _NUMERIC_;
   DO i = 1 to dim(Num_Col);
      IF Num_Col[i] > boxStats[i, "upper_bound"] OR Num_Col[i] < boxStats[i, "lower_bound"] THEN Num_Col[i] = .;
   END;
RUN;

 

I have the main data table and a table of stats from which I computed upper and lower bounds. I need to reference those values from the boxStats table. How I can I reference those values?

Reeza
Super User
For #2 why do you need this? There may be other ways.
axescot78
Quartz | Level 8

I don't need it that specific way. Just trying to make SAS cooperate. I ended up getting it using symput and symget. 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2064 views
  • 1 like
  • 3 in conversation