BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ruth
Fluorite | Level 6

I have about 100 numeric variables in the dataset, plus one character variable for the purpose of identification. For each numemic variable, the values can be negative or positive, and the values are not sorted in a proper order.

For each variable, I want to pick up the smallest positive value. For example, -10, 2, 0.5, then the value 0.5 should be choosed. Together with the smallest positive value, the value of the character variable should also be choosed for the purpose of identification.

The final output should be a dataset with a single row, with all values representing the smallest positive values for the numeric variables.

Hope this is not that challenging and complex. Thanks a lot for help.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

There is probably an easier way to do this but, since no one has responded, here is a brute force way:

data have;

  input a $ b1-b5;

  cards;

a .3 -10 2 4 5

b 6 4 -2 .1 3

c 5 .2 .4 .2 1

;

proc transpose data=have out=want;

  var _numeric_;

  id a;

run;

data want (keep=min_value name);

  set want;

  array values(*) _numeric_;

  do i=1 to dim(values);

    /* reset values(i) <=0 to a missing value*/

    if values(i) le 0 then do;

      call missing(values(i));

    end;

  end;

  min_value=min(of values(*));

  do i=1 to dim(values);

    if min_value=values(i) then name=vname(values(i));

  end;

run;

/* the above includes Rick's suggested tweak */

View solution in original post

11 REPLIES 11
art297
Opal | Level 21

There is probably an easier way to do this but, since no one has responded, here is a brute force way:

data have;

  input a $ b1-b5;

  cards;

a .3 -10 2 4 5

b 6 4 -2 .1 3

c 5 .2 .4 .2 1

;

proc transpose data=have out=want;

  var _numeric_;

  id a;

run;

data want (keep=min_value name);

  set want;

  array values(*) _numeric_;

  do i=1 to dim(values);

    /* reset values(i) <=0 to a missing value*/

    if values(i) le 0 then do;

      call missing(values(i));

    end;

  end;

  min_value=min(of values(*));

  do i=1 to dim(values);

    if min_value=values(i) then name=vname(values(i));

  end;

run;

/* the above includes Rick's suggested tweak */

Rick_SAS
SAS Super FREQ

Nice. I'd tweak the line that excludes missing values:

  if values(i) le 0 then values(i)=.;

Peter_C
Rhodochrosite | Level 12

I would like a technique that needs no sorting nor transpose : just count the numerics, and run a process, like~:

data results ;

   if 0 then set your.data ;

  array num(&n_nums) _numeric_ ;

  array res(&n_nums ) _temporary_ ;

  do until( eof ) ;

      set your.data end= eof ;

      do _n_ = 1 to &n_nums ;

          res(_n_) = min( max( 0, num(_n_)), res(_n_) ) ;

      end ;

   end ;

   call missing( of num(*), of _character_ ) ;

   do _n_ = 1 to &n_nums   ;

       if res(_n_) > 0 then num(_n_) = res(_n_) ;

   end ;

   output ;

   stop ;

run ;

and that just needs the number of numeric variables, which can be obtained with

proc contents data= your.data noprint out= _data_ ;

run ;

proc sql noprint ;

select count(*) into :n_nums separated by ' ' from &syslast

where type=1 ;

quit ;

Peter_C
Rhodochrosite | Level 12


sorry, my suggestion wasn't good enough

> not enough testing, and not identifying each lowest positive value.

So I expanded my test data with

libname your (work);

data your.data;

   set sashelp.class;

   neg = -1;

   mis = .;

   tst = (_n_ -3 ) / 7 * mixture of  negatives and fractions;

   big = constant('big') * ranuni(1);

   id  = name;

run;

proc contents data= your.data noprint out= _data_;

run;

and collected the additional column info with sql

  • names for numeric variables
  • new names to hold the ID values
  • an ordering list (ID to follow value)
  • and get a count of numerics from &SQLOBS

proc sql noprint;

   select name

       , cats( 'id_', name)

       , trim( name ) !!' '!! cats( 'id_', name)

     into : names   separated by ' '

       , : id_name separated by ' '

       , : ordered separated by ' '

      from &syslast

     where type=1;

   %let n_nums = &sqlobs;

quit;

and finally make that data collection

data results( keep= &ordered );

   retain &ordered;

   array num(&n_nums)        &names;

   array ids(&n_nums) $32 &id_name;

   array res(&n_nums )  _temporary_  ( &n_nums * %sysfunc(constant(big)) );

   do until( eof );

      set your.data end= eof;

      do _n_ = 1 to &n_nums;

         if 0< num(_n_) < res(_n_) then

            do;

           res(_n_) = num(_n_);

           ids(_n_) = id;

            end;

      end;

   end;

   do _n_ = 1 to &n_nums;

      if   ids(_n_) ne ' ' then

           num(_n_) = res(_n_);

      else num(_n_) = .;

   end;

   output;

   stop;

run;

notes about the data methods:

I used _N_ as a pointer within the array to avoid any contentiuon with the names of numeric variables in "your.data".

Using the CONSTANT() function value returned for BIG as the initial values in the "RESults" array guarantees I can safely use that range test

0< num(_n_) < res(_n_)

SInce I needed to define column names to hold ID values, it is no longer satisfactory to use the _numeric_ variable list (beware the column order must be consistent between values and IDs)

Having extracted the column names, &sqlobs provides a counter that is useful in DO loops and in ARRAY statements

If there might be column names exceeding 30 characters wide, then this method _might_ fail having ID_name columns with a name wider than the 32 limit. That can be resolved, but the extra code needed probably reduces the clarity of the method while seldom adding value.

 

PROC CONTENTS extracts the table information without needing to navigate the dictionary tables which would become more complex when "your.data" might be a case sensitive table name in some rdbms.

When writing out the variable information from proc contents, _DATA_ provides some safety by using the "automatic table naming" convention, and avoids overwriting a relevant table that may already exist in the work library.

data_null__
Jade | Level 19

I was thing there might be a way to use a FORMAT to effectively remove the values less than 0.  This is what I came up with but I don't think it scale.  It is mildly interesting perhaps.

The output is a bit strange, I transposed the mins into variables named by the ID for the MIN.

data have;

   set sashelp.class;

   neg = -1;

   mis = .;

   tst = (_n_ -3 ) / 7* mixture of  negatives and fractions;

   big = constant('big') * ranuni(1);

   id  = name;

   run;

proc print;

   run;

proc format;

   value noNeg low-0 = ' ' other=[best32];

   run;

ods listing close;

ods output List=Freqs(drop=F_: Frequency);

options missing=' ';

proc freq;

   tables (_numeric_)*ID / list nocum nofreq nopercent;

   format _numeric_ noNeg.;

   run;

ods output close;

ods listing;

data FreqsV / view=FreqsV;

   set freqs;

   vname = vnameX(scan(table,2,' '));

   if missing(vvalueX(vname)) then delete;

   drop table;

   run;

data freqs2V / view=Freqs2V;  

   set FreqsV;

   by vname notsorted;

   if first.vname;

   run;

proc transpose out=minpos(where=(_name_ eq vname));

   by vname notsorted;

   id id;

   format _numeric_;

   run;

proc print width=min;

   run;

Obs    vname     _NAME_    Joyce     Carol       Jeffrey

1     Age       Age        11.0

2     Height    Height     51.3

3     Weight    Weight     50.5

4     tst       tst                0.14286

5     big       big                           8.9514E306


And this is how I might actually do it regardless of the OPs no transpose requirement.  No code gen no arrays.....

proc transpose out=_data_(where=(col1 gt 0));

   by id notsorted;

   run;

proc summary nway;

   class _name_;

   output idgroup(min(col1) out(col1 id)=);

   run;

Obs    _NAME_    _TYPE_    _FREQ_          COL1      id

1     Age          1        19         11.0000    Joyce

2     Height       1        19         51.3000    Joyce

3     Weight       1        19         50.5000    Joyce

4     big          1        19      8.9514E306    Jeffrey

5     tst          1        16          0.1429    Carol

Peter_C
Rhodochrosite | Level 12


not an original requirement, I suggested "no transpose" to improve run-time performance

Tom
Super User Tom
Super User

Proc means or summary can get the minimum value. To meet you other requirement why not create a view to convert the non-positive values to missing?

data truncate / view=truncate ;

   set have ;

   array _n _numeric_;

   do over _n; if _n <= 0 then _n=.; end;

run;

proc summary data=truncate  ;

   var _numeric_;

  output out=want min=;

run;

Now to get the ID of the minumum value then you will need to read in the variable names (from PROC CONTENTS or a dictionary.column metadata table) and generate the code for the minid() clause of the OUTPUT statement.

For example here is code to find the minumums from SASHELP.CLASS.

proc summary data=sashelp.class ;

  var _numeric_;

  output out=want min=

   minid( age(name) height(name) weight(name) )= age_id height_id weight_id

  ;

run;

Ksharp
Super User
data have;
  input a $ b1-b5;
  cards;
a .3 -10 2 4 5
b 6 4 -2 .1 3
c 5 .2 .4 .2 1
;
run;
options mprint mlogic symbolgen;
%macro s;
proc sql ;
 select 'select a,'||strip(name)||' as min from have where '||strip(name)||' ge 0  having '||strip(name)||' eq min('||strip(name)||')'
  from dictionary.columns
   where libname='WORK' and memname='HAVE' and type='num';

 select 'select a,'||strip(name)||' as min from have where '||strip(name)||' ge 0  having '||strip(name)||' eq min('||strip(name)||')'
  into : s1 - : s&sqlobs
  from dictionary.columns
   where libname='WORK' and memname='HAVE' and type='num';


 create table want as
 %do i=1 %to &sqlobs ;
  &&s&i 
 %if &i ne &sqlobs  %then %do; union all %end;
 %end;
;quit;
%mend s;

%s

Ksharp

Ruth
Fluorite | Level 6

Ksharp's anwer is very helpful. This SAS site only allows me to give a maximum of 2 answers as helpful.

Thanks very much!

Linlin
Lapis Lazuli | Level 10

Hi Ruth,

Did you say yiou want "The final output should be a dataset with a single row, with all values representing the smallest positive values for the numeric variables."?

what do you mean by "a single row"?

MikeZdeb
Rhodochrosite | Level 12

hi .. lots of answers posted already, but here's another (it uses TRANSPOSE like a few others) that uses the SMALLEST function ...

* some fake data (10 random ids, 10 numeric variables);

data have;

array x(10);

do j = 1 to 10;

   id = put(ceil(1000*ranuni(999)),z4.);

do k = 1 to 10;

   x(k) = ceil(100*ranuni(999));

   x(k) = ifn(ranuni(999) le .3, x(k)*-1, x(k));

end;

   output;

end;

drop j k;

run;


proc transpose data=have out=want;

var _numeric_;

run;

* use SMALLEST (use COUNTC to determine the number of values to skip);

data want;

set want;

smallest_positive = smallest(countc(cats(of _numeric_),'-') + 1, of _numeric_);

which_obs = whichn(smallest_positive, of _numeric_);

set have (keep=id) point=which_obs;

keep sm: id _name_;

run;

results ...

         smallest_

_NAME_     positive     id

x1           12       0778

x2           14       0778

x3           16       0423

x4           13       0609

x5           16       0031

x6            8       0031

x7            8       0352

x8           21       0750

x9           15       0750

x10           3       0491



SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 9101 views
  • 9 likes
  • 9 in conversation