BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EduanR
Fluorite | Level 6

Good day! 

 

It seems that my first post was somehow marked as SPAM so I apologize if you are seeing this post a second time. 

 

I have the following problem: I want to remove variables in a data set that contain more than 90% of the following values: -1,-2,-3,-4,-5,-6,-7. Stated otherwise, If a variable (of which there are 675) is made up of more than 90% of the above listed values, it should be removed from the data set. Below is an example of the data set I am working with. 

 

IDTarget VariableVar1Var2...Var675
0011-2-2...-2
0021-2-2...-3
0030-2-2...1500
00415-3...-3
0050-318...-2

 

Since there is a large number of columns in the data set I figured that I will probably need a do loop to do this. Therefore, I created the following macro to remove variables for any given data set that contain more than 90% of these special values. 

 

%macro control(dsn, method = Quantile);

%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
%let method = %upcase(&method);

/*	Creatign macro's for each column in the data set	*/

data _null_;
	set sashelp.vcolumn(keep=libname memname name type
	where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
		call symputx('col'||strip(_n_),strip(name));
		if eof then call symputx('max',strip(_n_));
run;

%do i = 1 %to &max;

I assume that this is were the code will be written to determine the percentage of special
values in a variable and then drop it from the original data set if it exceeds 90%. %mend control;

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
EduanR
Fluorite | Level 6

Thank you @sbxkoenk for the assistance. 

 

I also managed to write the following code and it seems to work/provide the desired output. 

 

Firstly, I created an empty table with the necessary specification before the macro as follows: 

proc sql;
	Create table Datasets.Special_Perc_&Split
	(Variable			char 	length = 23 	format=$50.,
	Sum_Special			num 	length = 8		format = best12.,
	Sum_positive		num		length = 8		format = best12.,
	Percentage 			num		length = 8		format = Percent6.2);		
quit; 

 Secondly,  I wrote a macro to calculate the percentage of special values for each column in any given data set.  This is then added to the table that I created above for each iteration of the loop.

 

%macro Special(dsn);

	%let lib = %upcase(%scan(&dsn,1,.));
	%let dsn = %upcase(%scan(&dsn,2,.));

	/*	Creatign macro's for each column in the data set	*/

	data _null_;
		set sashelp.vcolumn(keep=libname memname name type
		where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
			call symputx('col'||strip(_n_),strip(name));
			if eof then call symputx('max',strip(_n_));
	run;

	%do i = 1 %to &max;
	
		proc sql noprint;
		select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
		       sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
			round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
			into :SN, :SP, :Perc
		from &lib..&dsn;
		quit;

		data b;
			Variable = "&&col&i";
			Sum_Special = &sn;
			Sum_positive = &sp;
			Percentage = &Perc;
		run;

		proc append base = Datasets.Special_Perc_&split data = b force; 
		run;

		proc sql;
			drop table b;
		quit;
   %end;

%mend special;

%special(Datasets.Longevity_Numeric_&split);

Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%. 

 

It may not be the best way to do it, but it got the job done. 😀

View solution in original post

2 REPLIES 2
sbxkoenk
SAS Super FREQ

Hello @EduanR ,

Something like the below :

data work.have;
input ID $ Target_Variable Var1 Var2 Var3;
cards;
001	1	-2	-2	-2
002	1	-2	-2	-3
003	0	-2	-2	1500
004	1	5	-3	-3
005	0	-3	18	-2
;
run;

data work.have1(drop=i);
 set work.have;
 array myvars{3}     Var1-  Var3;
 array countvar{3} c_Var1-c_Var3;
 do i=1 to dim(myvars);
  if myvars(i) in (-1,-2,-3,-4,-5,-6,-7) then countvar(i)=1; else countvar(i)=0;
 end;
run;

ods trace off;
ods output OneWayFreqs=work.OneWayFreqs;
PROC FREQ data=work.have1 /*noprint*/;
 tables c_Var1-c_Var3 / list /* missing out=work.myds */;
run;

data work.vars_to_throw_away;
 set work.OneWayFreqs;
 where c_Var1=1 OR c_Var2=1 OR c_Var3=1;
 where also Percent >= 80;
run; 

PROC SQL noprint;
 select scan(Table,2,'_') into :Lijst separated by ' '
 from work.vars_to_throw_away;
QUIT;
%PUT &=Lijst;

data work.want;
 set work.have(drop=&Lijst.);
run;
/* end of program */

Koen

EduanR
Fluorite | Level 6

Thank you @sbxkoenk for the assistance. 

 

I also managed to write the following code and it seems to work/provide the desired output. 

 

Firstly, I created an empty table with the necessary specification before the macro as follows: 

proc sql;
	Create table Datasets.Special_Perc_&Split
	(Variable			char 	length = 23 	format=$50.,
	Sum_Special			num 	length = 8		format = best12.,
	Sum_positive		num		length = 8		format = best12.,
	Percentage 			num		length = 8		format = Percent6.2);		
quit; 

 Secondly,  I wrote a macro to calculate the percentage of special values for each column in any given data set.  This is then added to the table that I created above for each iteration of the loop.

 

%macro Special(dsn);

	%let lib = %upcase(%scan(&dsn,1,.));
	%let dsn = %upcase(%scan(&dsn,2,.));

	/*	Creatign macro's for each column in the data set	*/

	data _null_;
		set sashelp.vcolumn(keep=libname memname name type
		where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
			call symputx('col'||strip(_n_),strip(name));
			if eof then call symputx('max',strip(_n_));
	run;

	%do i = 1 %to &max;
	
		proc sql noprint;
		select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
		       sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
			round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
			into :SN, :SP, :Perc
		from &lib..&dsn;
		quit;

		data b;
			Variable = "&&col&i";
			Sum_Special = &sn;
			Sum_positive = &sp;
			Percentage = &Perc;
		run;

		proc append base = Datasets.Special_Perc_&split data = b force; 
		run;

		proc sql;
			drop table b;
		quit;
   %end;

%mend special;

%special(Datasets.Longevity_Numeric_&split);

Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%. 

 

It may not be the best way to do it, but it got the job done. 😀

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 738 views
  • 1 like
  • 2 in conversation