Solved: The smallest positive value for each variable

Ruth · Posted 11-21-2011 01:21 PM

I have about 100 numeric variables in the dataset, plus one character variable for the purpose of identification. For each numemic variable, the values can be negative or positive, and the values are not sorted in a proper order.

For each variable, I want to pick up the smallest positive value. For example, -10, 2, 0.5, then the value 0.5 should be choosed. Together with the smallest positive value, the value of the character variable should also be choosed for the purpose of identification.

The final output should be a dataset with a single row, with all values representing the smallest positive values for the numeric variables.

Hope this is not that challenging and complex. Thanks a lot for help.

art297 · Posted 11-21-2011 03:48 PM

There is probably an easier way to do this but, since no one has responded, here is a brute force way:

data have;

input a $ b1-b5;

cards;

a .3 -10 2 4 5

b 6 4 -2 .1 3

c 5 .2 .4 .2 1

;

proc transpose data=have out=want;

var _numeric_;

id a;

run;

data want (keep=min_value name);

set want;

array values(*) _numeric_;

do i=1 to dim(values);

/* reset values(i) <=0 to a missing value*/

if values(i) le 0 then do;

call missing(values(i));

end;

min_value=min(of values(*));

do i=1 to dim(values);

if min_value=values(i) then name=vname(values(i));

end;

run;

/* the above includes Rick's suggested tweak */

View solution in original post

art297 · Posted 11-21-2011 03:48 PM

There is probably an easier way to do this but, since no one has responded, here is a brute force way:

data have;

input a $ b1-b5;

cards;

a .3 -10 2 4 5

b 6 4 -2 .1 3

c 5 .2 .4 .2 1

;

proc transpose data=have out=want;

var _numeric_;

id a;

run;

data want (keep=min_value name);

set want;

array values(*) _numeric_;

do i=1 to dim(values);

/* reset values(i) <=0 to a missing value*/

if values(i) le 0 then do;

call missing(values(i));

end;

min_value=min(of values(*));

do i=1 to dim(values);

if min_value=values(i) then name=vname(values(i));

end;

run;

/* the above includes Rick's suggested tweak */

Rick_SAS · Posted 11-21-2011 04:23 PM

Nice. I'd tweak the line that excludes missing values:

if values(i) le 0 then values(i)=.;

Peter_C · Posted 11-21-2011 06:21 PM

I would like a technique that needs no sorting nor transpose : just count the numerics, and run a process, like~:

data results ;

if 0 then set your.data ;

array num(&n_nums) _numeric_ ;

array res(&n_nums ) _temporary_ ;

do until( eof ) ;

set your.data end= eof ;

do _n_ = 1 to &n_nums ;

res(_n_) = min( max( 0, num(_n_)), res(_n_) ) ;

end ;

call missing( of num(*), of _character_ ) ;

do _n_ = 1 to &n_nums ;

if res(_n_) > 0 then num(_n_) = res(_n_) ;

end ;

output ;

stop ;

run ;

and that just needs the number of numeric variables, which can be obtained with

proc contents data= your.data noprint out= _data_ ;

run ;

proc sql noprint ;

select count(*) into :n_nums separated by ' ' from &syslast

where type=1 ;

quit ;

Peter_C · Posted 11-22-2011 04:45 AM

sorry, my suggestion wasn't good enough

> not enough testing, and not identifying each lowest positive value.

So I expanded my test data with

libname your (work);

data your.data;

set sashelp.class;

neg = -1;

mis = .;

tst = (_n_ -3 ) / 7; * mixture of negatives and fractions;

big = constant('big') * ranuni(1);

id = name;

run;

proc contents data= your.data noprint out= _data_;

run;

and collected the additional column info with sql

names for numeric variables
new names to hold the ID values
an ordering list (ID to follow value)
and get a count of numerics from &SQLOBS

proc sql noprint;

select name

, cats( 'id_', name)

, trim( name ) !!' '!! cats( 'id_', name)

into : names separated by ' '

, : id_name separated by ' '

, : ordered separated by ' '

from &syslast

where type=1;

%let n_nums = &sqlobs;

quit;

and finally make that data collection

data results( keep= &ordered );

retain &ordered;

array num(&n_nums) &names;

array ids(&n_nums) $32 &id_name;

array res(&n_nums ) _temporary_ ( &n_nums * %sysfunc(constant(big)) );

do until( eof );

set your.data end= eof;

do _n_ = 1 to &n_nums;

if 0< num(_n_) < res(_n_) then

do;

res(_n_) = num(_n_);

ids(_n_) = id;

end;

do _n_ = 1 to &n_nums;

if ids(_n_) ne ' ' then

num(_n_) = res(_n_);

else num(_n_) = .;

end;

output;

stop;

run;

notes about the data methods:

I used _N_ as a pointer within the array to avoid any contentiuon with the names of numeric variables in "your.data".

Using the CONSTANT() function value returned for BIG as the initial values in the "RESults" array guarantees I can safely use that range test

0< num(_n_) < res(_n_)

SInce I needed to define column names to hold ID values, it is no longer satisfactory to use the _numeric_ variable list (beware the column order must be consistent between values and IDs)

Having extracted the column names, &sqlobs provides a counter that is useful in DO loops and in ARRAY statements

If there might be column names exceeding 30 characters wide, then this method _might_ fail having ID_name columns with a name wider than the 32 limit. That can be resolved, but the extra code needed probably reduces the clarity of the method while seldom adding value.

PROC CONTENTS extracts the table information without needing to navigate the dictionary tables which would become more complex when "your.data" might be a case sensitive table name in some rdbms.

When writing out the variable information from proc contents, _DATA_ provides some safety by using the "automatic table naming" convention, and avoids overwriting a relevant table that may already exist in the work library.

data_null__ · Posted 11-22-2011 10:14 AM

I was thing there might be a way to use a FORMAT to effectively remove the values less than 0. This is what I came up with but I don't think it scale. It is mildly interesting perhaps.

The output is a bit strange, I transposed the mins into variables named by the ID for the MIN.

data have;
   set sashelp.class;
   neg = -1;
   mis = .;
   tst = (_n_ -3 ) / 7;  * mixture of  negatives and fractions;
   big = constant('big') * ranuni(1);
   id  = name;
   run;
proc print;
   run;
proc format;
   value noNeg low-0 = ' ' other=[best32];
   run;
ods listing close;
ods output List=Freqs(drop=F_: Frequency);
options missing=' ';
proc freq;
   tables (_numeric_)*ID / list nocum nofreq nopercent;
   format _numeric_ noNeg.;
   run;
ods output close;
ods listing;
data FreqsV / view=FreqsV;
   set freqs;
   vname = vnameX(scan(table,2,' '));
   if missing(vvalueX(vname)) then delete;
   drop table;
   run;
data freqs2V / view=Freqs2V;   
   set FreqsV;
   by vname notsorted;
   if first.vname;
   run;
proc transpose out=minpos(where=(_name_ eq vname));
   by vname notsorted;
   id id;
   format _numeric_;
   run;
proc print width=min;
   run;

Obs vname _NAME_ Joyce Carol Jeffrey

1 Age Age 11.0

2 Height Height 51.3

3 Weight Weight 50.5

4 tst tst 0.14286

5 big big 8.9514E306

And this is how I might actually do it regardless of the OPs no transpose requirement. No code gen no arrays.....

proc transpose out=_data_(where=(col1 gt 0));
   by id notsorted;
   run;
proc summary nway;
   class _name_;
   output idgroup(min(col1) out(col1 id)=);
   run;
Obs    _NAME_    _TYPE_    _FREQ_          COL1      id
 
 1     Age          1        19         11.0000    Joyce
 2     Height       1        19         51.3000    Joyce
 3     Weight       1        19         50.5000    Joyce
 4     big          1        19      8.9514E306    Jeffrey
 5     tst          1        16          0.1429    Carol

Peter_C · Posted 11-22-2011 05:43 PM

not an original requirement, I suggested "no transpose" to improve run-time performance

Tom · Posted 11-21-2011 06:56 PM

Proc means or summary can get the minimum value. To meet you other requirement why not create a view to convert the non-positive values to missing?

data truncate / view=truncate ;

set have ;

array _n _numeric_;

do over _n; if _n <= 0 then _n=.; end;

run;

proc summary data=truncate ;

var _numeric_;

output out=want min=;

run;

Now to get the ID of the minumum value then you will need to read in the variable names (from PROC CONTENTS or a dictionary.column metadata table) and generate the code for the minid() clause of the OUTPUT statement.

For example here is code to find the minumums from SASHELP.CLASS.

proc summary data=sashelp.class ;

var _numeric_;

output out=want min=

minid( age(name) height(name) weight(name) )= age_id height_id weight_id

;

run;

Ksharp · Posted 11-22-2011 12:11 AM

data have;
  input a $ b1-b5;
  cards;
a .3 -10 2 4 5
b 6 4 -2 .1 3
c 5 .2 .4 .2 1
;
run;
options mprint mlogic symbolgen;
%macro s;
proc sql ;
 select 'select a,'||strip(name)||' as min from have where '||strip(name)||' ge 0  having '||strip(name)||' eq min('||strip(name)||')'
  from dictionary.columns
   where libname='WORK' and memname='HAVE' and type='num';

 select 'select a,'||strip(name)||' as min from have where '||strip(name)||' ge 0  having '||strip(name)||' eq min('||strip(name)||')'
  into : s1 - : s&sqlobs
  from dictionary.columns
   where libname='WORK' and memname='HAVE' and type='num';


 create table want as
 %do i=1 %to &sqlobs ;
  &&s&i 
 %if &i ne &sqlobs  %then %do; union all %end;
 %end;
;quit;
%mend s;

%s

Ksharp

Ruth · Posted 11-22-2011 08:13 AM

Ksharp's anwer is very helpful. This SAS site only allows me to give a maximum of 2 answers as helpful.

Thanks very much!

Linlin · Posted 11-22-2011 10:03 AM

Hi Ruth,

Did you say yiou want "The final output should be a dataset with a single row, with all values representing the smallest positive values for the numeric variables."?

what do you mean by "a single row"?

MikeZdeb · Posted 11-23-2011 12:06 PM

hi .. lots of answers posted already, but here's another (it uses TRANSPOSE like a few others) that uses the SMALLEST function ...

* some fake data (10 random ids, 10 numeric variables);

data have;

array x(10);

do j = 1 to 10;

id = put(ceil(1000*ranuni(999)),z4.);

do k = 1 to 10;

x(k) = ceil(100*ranuni(999));

x(k) = ifn(ranuni(999) le .3, x(k)*-1, x(k));

end;

output;

end;

drop j k;

run;

proc transpose data=have out=want;

var _numeric_;

run;

* use SMALLEST (use COUNTC to determine the number of values to skip);

data want;

set want;

smallest_positive = smallest(countc(cats(of _numeric_),'-') + 1, of _numeric_);

which_obs = whichn(smallest_positive, of _numeric_);

set have (keep=id) point=which_obs;

keep sm: id _name_;

run;

results ...

smallest_

_NAME_ positive id

x1 12 0778

x2 14 0778

x3 16 0423

x4 13 0609

x5 16 0031

x6 8 0031

x7 8 0352

x8 21 0750

x9 15 0750

x10 3 0491

The smallest positive value for each variable

Re: The smallest positive value for each variable

Re: The smallest positive value for each variable

The smallest positive value for each variable

The smallest positive value for each variable

The smallest positive value for each variable

Re: The smallest positive value for each variable

The smallest positive value for each variable

The smallest positive value for each variable

Re: The smallest positive value for each variable

The smallest positive value for each variable

The smallest positive value for each variable

Re: The smallest positive value for each variable

Registration is open

SAS Training: Just a Click Away