I have about 100 numeric variables in the dataset, plus one character variable for the purpose of identification. For each numemic variable, the values can be negative or positive, and the values are not sorted in a proper order.
For each variable, I want to pick up the smallest positive value. For example, -10, 2, 0.5, then the value 0.5 should be choosed. Together with the smallest positive value, the value of the character variable should also be choosed for the purpose of identification.
The final output should be a dataset with a single row, with all values representing the smallest positive values for the numeric variables.
Hope this is not that challenging and complex. Thanks a lot for help.
There is probably an easier way to do this but, since no one has responded, here is a brute force way:
data have;
input a $ b1-b5;
cards;
a .3 -10 2 4 5
b 6 4 -2 .1 3
c 5 .2 .4 .2 1
;
proc transpose data=have out=want;
var _numeric_;
id a;
run;
data want (keep=min_value name);
set want;
array values(*) _numeric_;
do i=1 to dim(values);
/* reset values(i) <=0 to a missing value*/
if values(i) le 0 then do;
call missing(values(i));
end;
end;
min_value=min(of values(*));
do i=1 to dim(values);
if min_value=values(i) then name=vname(values(i));
end;
run;
/* the above includes Rick's suggested tweak */
There is probably an easier way to do this but, since no one has responded, here is a brute force way:
data have;
input a $ b1-b5;
cards;
a .3 -10 2 4 5
b 6 4 -2 .1 3
c 5 .2 .4 .2 1
;
proc transpose data=have out=want;
var _numeric_;
id a;
run;
data want (keep=min_value name);
set want;
array values(*) _numeric_;
do i=1 to dim(values);
/* reset values(i) <=0 to a missing value*/
if values(i) le 0 then do;
call missing(values(i));
end;
end;
min_value=min(of values(*));
do i=1 to dim(values);
if min_value=values(i) then name=vname(values(i));
end;
run;
/* the above includes Rick's suggested tweak */
Nice. I'd tweak the line that excludes missing values:
if values(i) le 0 then values(i)=.;
I would like a technique that needs no sorting nor transpose : just count the numerics, and run a process, like~:
data results ;
if 0 then set your.data ;
array num(&n_nums) _numeric_ ;
array res(&n_nums ) _temporary_ ;
do until( eof ) ;
set your.data end= eof ;
do _n_ = 1 to &n_nums ;
res(_n_) = min( max( 0, num(_n_)), res(_n_) ) ;
end ;
end ;
call missing( of num(*), of _character_ ) ;
do _n_ = 1 to &n_nums ;
if res(_n_) > 0 then num(_n_) = res(_n_) ;
end ;
output ;
stop ;
run ;
and that just needs the number of numeric variables, which can be obtained with
proc contents data= your.data noprint out= _data_ ;
run ;
proc sql noprint ;
select count(*) into :n_nums separated by ' ' from &syslast
where type=1 ;
quit ;
sorry, my suggestion wasn't good enough
> not enough testing, and not identifying each lowest positive value.
So I expanded my test data with
libname your (work);
data your.data;
set sashelp.class;
neg = -1;
mis = .;
tst = (_n_ -3 ) / 7; * mixture of negatives and fractions;
big = constant('big') * ranuni(1);
id = name;
run;
proc contents data= your.data noprint out= _data_;
run;
and collected the additional column info with sql
proc sql noprint;
select name
, cats( 'id_', name)
, trim( name ) !!' '!! cats( 'id_', name)
into : names separated by ' '
, : id_name separated by ' '
, : ordered separated by ' '
from &syslast
where type=1;
%let n_nums = &sqlobs;
quit;
and finally make that data collection
data results( keep= &ordered );
retain &ordered;
array num(&n_nums) &names;
array ids(&n_nums) $32 &id_name;
array res(&n_nums ) _temporary_ ( &n_nums * %sysfunc(constant(big)) );
do until( eof );
set your.data end= eof;
do _n_ = 1 to &n_nums;
if 0< num(_n_) < res(_n_) then
do;
res(_n_) = num(_n_);
ids(_n_) = id;
end;
end;
end;
do _n_ = 1 to &n_nums;
if ids(_n_) ne ' ' then
num(_n_) = res(_n_);
else num(_n_) = .;
end;
output;
stop;
run;
notes about the data methods:
I used _N_ as a pointer within the array to avoid any contentiuon with the names of numeric variables in "your.data".
Using the CONSTANT() function value returned for BIG as the initial values in the "RESults" array guarantees I can safely use that range test
0< num(_n_) < res(_n_)
SInce I needed to define column names to hold ID values, it is no longer satisfactory to use the _numeric_ variable list (beware the column order must be consistent between values and IDs)
Having extracted the column names, &sqlobs provides a counter that is useful in DO loops and in ARRAY statements
If there might be column names exceeding 30 characters wide, then this method _might_ fail having ID_name columns with a name wider than the 32 limit. That can be resolved, but the extra code needed probably reduces the clarity of the method while seldom adding value.
PROC CONTENTS extracts the table information without needing to navigate the dictionary tables which would become more complex when "your.data" might be a case sensitive table name in some rdbms.
When writing out the variable information from proc contents, _DATA_ provides some safety by using the "automatic table naming" convention, and avoids overwriting a relevant table that may already exist in the work library.
I was thing there might be a way to use a FORMAT to effectively remove the values less than 0. This is what I came up with but I don't think it scale. It is mildly interesting perhaps.
The output is a bit strange, I transposed the mins into variables named by the ID for the MIN.
data have;
set sashelp.class;
neg = -1;
mis = .;
tst = (_n_ -3 ) / 7; * mixture of negatives and fractions;
big = constant('big') * ranuni(1);
id = name;
run;
proc print;
run;
proc format;
value noNeg low-0 = ' ' other=[best32];
run;
ods listing close;
ods output List=Freqs(drop=F_: Frequency);
options missing=' ';
proc freq;
tables (_numeric_)*ID / list nocum nofreq nopercent;
format _numeric_ noNeg.;
run;
ods output close;
ods listing;
data FreqsV / view=FreqsV;
set freqs;
vname = vnameX(scan(table,2,' '));
if missing(vvalueX(vname)) then delete;
drop table;
run;
data freqs2V / view=Freqs2V;
set FreqsV;
by vname notsorted;
if first.vname;
run;
proc transpose out=minpos(where=(_name_ eq vname));
by vname notsorted;
id id;
format _numeric_;
run;
proc print width=min;
run;
Obs vname _NAME_ Joyce Carol Jeffrey
1 Age Age 11.0
2 Height Height 51.3
3 Weight Weight 50.5
4 tst tst 0.14286
5 big big 8.9514E306
And this is how I might actually do it regardless of the OPs no transpose requirement. No code gen no arrays.....
proc transpose out=_data_(where=(col1 gt 0));
by id notsorted;
run;
proc summary nway;
class _name_;
output idgroup(min(col1) out(col1 id)=);
run;
Obs _NAME_ _TYPE_ _FREQ_ COL1 id
1 Age 1 19 11.0000 Joyce
2 Height 1 19 51.3000 Joyce
3 Weight 1 19 50.5000 Joyce
4 big 1 19 8.9514E306 Jeffrey
5 tst 1 16 0.1429 Carol
not an original requirement, I suggested "no transpose" to improve run-time performance
Proc means or summary can get the minimum value. To meet you other requirement why not create a view to convert the non-positive values to missing?
data truncate / view=truncate ;
set have ;
array _n _numeric_;
do over _n; if _n <= 0 then _n=.; end;
run;
proc summary data=truncate ;
var _numeric_;
output out=want min=;
run;
Now to get the ID of the minumum value then you will need to read in the variable names (from PROC CONTENTS or a dictionary.column metadata table) and generate the code for the minid() clause of the OUTPUT statement.
For example here is code to find the minumums from SASHELP.CLASS.
proc summary data=sashelp.class ;
var _numeric_;
output out=want min=
minid( age(name) height(name) weight(name) )= age_id height_id weight_id
;
run;
data have; input a $ b1-b5; cards; a .3 -10 2 4 5 b 6 4 -2 .1 3 c 5 .2 .4 .2 1 ; run; options mprint mlogic symbolgen; %macro s; proc sql ; select 'select a,'||strip(name)||' as min from have where '||strip(name)||' ge 0 having '||strip(name)||' eq min('||strip(name)||')' from dictionary.columns where libname='WORK' and memname='HAVE' and type='num'; select 'select a,'||strip(name)||' as min from have where '||strip(name)||' ge 0 having '||strip(name)||' eq min('||strip(name)||')' into : s1 - : s&sqlobs from dictionary.columns where libname='WORK' and memname='HAVE' and type='num'; create table want as %do i=1 %to &sqlobs ; &&s&i %if &i ne &sqlobs %then %do; union all %end; %end; ;quit; %mend s; %s
Ksharp
Ksharp's anwer is very helpful. This SAS site only allows me to give a maximum of 2 answers as helpful.
Thanks very much!
Hi Ruth,
Did you say yiou want "The final output should be a dataset with a single row, with all values representing the smallest positive values for the numeric variables."?
what do you mean by "a single row"?
hi .. lots of answers posted already, but here's another (it uses TRANSPOSE like a few others) that uses the SMALLEST function ...
* some fake data (10 random ids, 10 numeric variables);
data have;
array x(10);
do j = 1 to 10;
id = put(ceil(1000*ranuni(999)),z4.);
do k = 1 to 10;
x(k) = ceil(100*ranuni(999));
x(k) = ifn(ranuni(999) le .3, x(k)*-1, x(k));
end;
output;
end;
drop j k;
run;
proc transpose data=have out=want;
var _numeric_;
run;
* use SMALLEST (use COUNTC to determine the number of values to skip);
data want;
set want;
smallest_positive = smallest(countc(cats(of _numeric_),'-') + 1, of _numeric_);
which_obs = whichn(smallest_positive, of _numeric_);
set have (keep=id) point=which_obs;
keep sm: id _name_;
run;
results ...
smallest_
_NAME_ positive id
x1 12 0778
x2 14 0778
x3 16 0423
x4 13 0609
x5 16 0031
x6 8 0031
x7 8 0352
x8 21 0750
x9 15 0750
x10 3 0491
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.