Re: Use Array to anything greater than 2 within column

omka_ · Posted 08-05-2019 08:44 PM

Hi Everyone,

I basically need help looking through columns using an array to find any column that has a value greater than 2. Basically, if the column doesn't at least have a value that's 2, it is removed. However, I am having trouble coding it. I am trying to do an if statement too subset but it is not working. Below is my code and the necessary file is attached. Version 9.4.

data final;
	set final;
		array col{34} col1-col34;
		do i= 1 to 34;
			if col(i) = '' then col(i) = 0;
			if col(i) ge 2;
		end;
run;

PaigeMiller · Posted 08-05-2019 09:08 PM

Arrays don't look down columns, they look across rows. So it doesn't seem like arrays are what you want here. But it would help if you could show us the desired output for this input data set.

If you want to find columns that have a value greater than two, you could use PROC MEANS/PROC SUMMARY and find the maximum in each column.

--
Paige Miller

omka_ · Posted 08-05-2019 09:41 PM

Hi Paige,

If arrays look across rows, then how would I do the same thing as I was trying? Attached is the dataset. Also, for the final output, it should contain the records that say Abdominal pain, Eye Irritation and Nasopharyngitis. I am trying to make an array that will look across the rows and choose only the columns that have at least the value 2. If it looks across the row and doesn't find a value greater than or equal to 2, it won't be part of the subset. Below is my code.

data omkar.final;
  set final;
  array n{*} _numeric_;  /* gets all numeric variables into array */
  do i=1 to dim(n);         /* Do for all elements */
    if n{i}=. then n{i}=0;
    if n(i) gt 2;
  end;
run;

PaigeMiller · Posted 08-06-2019 06:52 AM

@omka_ wrote:

Hi Paige,

If arrays look across rows, then how would I do the same thing as I was trying? Attached is the dataset. Also, for the final output, it should contain the records that say Abdominal pain, Eye Irritation and Nasopharyngitis. I am trying to make an array that will look across the rows and choose only the columns that have at least the value 2. If it looks across the row and doesn't find a value greater than or equal to 2, it won't be part of the subset. Below is my code.

Your explanation seems to indicate that you want columns that don't contain values greater than two. But at times, it seems that you want rows where no values exceeds two. So which is it that you want?

I specifically asked for the desired final output, and that has not been provided. It would clear up a lot of confusion.

If you are trying to find columns where no value exceeds two, then you can't use arrays. I showed (and others have showed) how you can use PROC SUMMARY to identify these columns, which is a very simple solution. Others have provided a much more complicated way to do this, but again, we're not really 100% sure what you want.

--
Paige Miller

hashman · Posted 08-05-2019 10:29 PM

@omka_ :

If you really want to kill the columns (whose every value is less than 2), you can use an array to find, at the end of the DATA step after having read the entire file, which columns satisfy the condition and then either (a) generate code for another step where these columns will be dropped or (b) compile a list of the columns to be dropped and feed it to a different step. But you cannot do it in a single step without either (a) or (b) because which columns are dropped and which are kept is determined at compile time, i.e. before the step starts its execution.

For example:

data _null_ ;                                      
  set final end = z ;                              
  array col col1-col34 ;                           
  array t [34] _temporary_ ;                       
  do over col ;                                    
    if col => 2 then t [_i_] = 1 ;                 
  end ;                                            
  if z ;                                           
  call execute ("data final; set final (drop=") ; 
  do over col ;                                    
    if not t[_i_] then call execute (vname (col)) ;
  end ;                                            
  call execute ("); run;") ;                       
run ;

Or, if you prefer to drop the columns using SQL:

data _null_ ;                                         
  set final end = z ;                                 
  array col col1-col34 ;                              
  array t [34] _temporary_ ;                          
  do over col ;                                       
    if col => 2 then t [_i_] = 1 ;                    
  end ;                                               
  if z ;                                              
  call execute ("proc sql; alter table final drop") ;
  do over col ;                                       
    if t[_i_] then continue ;                         
    q + 1  ;                                          
    if q = 1 then call execute (vname (col)) ;        
    else          call execute ("," || vname(col)) ;  
  end ;                                               
  call execute ("; quit;") ;                          
run ;

Kind regards

Paul D.

Kurt_Bremser · Posted 08-06-2019 03:25 AM

Run a summary to find the max values, transpose the result while keeping only those with desired values, and use that to create dynamic drop code:

data class;
set sashelp.class;
new = 1;
run;

proc summary data=class;
var _numeric_;
output out=max (drop=_type_ _freq_) max()=;
run;

proc transpose
  data=max
  out=droplist (where=(col1 < 2))
;
var _all_;
run;

data _null_;
set droplist end=eof;
if _n_ = 1 then call execute('data want;set class;drop');
call execute(_name_);
if eof then call execute(';run;');
run;

And if you want that as a completely dynamic macro for easy reuse, you can do that:

%macro cutoff(inds=,outds=,cutoffvalue=);

proc summary data=&inds.;
var _numeric_;
output out=&outds._max (drop=_type_ _freq_) max()=;
run;

proc transpose
  data=&outds._max
  out=&outds._droplist (where=(col1 < &cutoffvalue.))
;
var _all_;
run;

data _null_;
set &outds._droplist end=eof;
if _n_ = 1 then call execute("data &outds.;set &inds.;drop");
call execute(_name_);
if eof then call execute(';run;');
run;

proc delete data=&outds._max &outds._droplist;
run;

%mend;

%cutoff(inds=class,outds=want,cutoffvalue=2)

Note that the call makes no assumptions at all about variable names or number of variables.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX