BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EduanR
Fluorite | Level 6

Good day! 

 

It seems that my first post was somehow marked as SPAM so I apologize if you are seeing this post a second time. 

 

I have the following problem: I want to remove variables in a data set that contain more than 90% of the following values: -1,-2,-3,-4,-5,-6,-7. Stated otherwise, If a variable (of which there are 675) is made up of more than 90% of the above listed values, it should be removed from the data set. Below is an example of the data set I am working with. 

 

IDTarget VariableVar1Var2...Var675
0011-2-2...-2
0021-2-2...-3
0030-2-2...1500
00415-3...-3
0050-318...-2

 

Since there is a large number of columns in the data set I figured that I will probably need a do loop to do this. Therefore, I created the following macro to remove variables for any given data set that contain more than 90% of these special values. 

 

%macro control(dsn, method = Quantile);

%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
%let method = %upcase(&method);

/*	Creatign macro's for each column in the data set	*/

data _null_;
	set sashelp.vcolumn(keep=libname memname name type
	where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
		call symputx('col'||strip(_n_),strip(name));
		if eof then call symputx('max',strip(_n_));
run;

%do i = 1 %to &max;

I assume that this is were the code will be written to determine the percentage of special
values in a variable and then drop it from the original data set if it exceeds 90%. %mend control;

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
EduanR
Fluorite | Level 6

Thank you @sbxkoenk for the assistance. 

 

I also managed to write the following code and it seems to work/provide the desired output. 

 

Firstly, I created an empty table with the necessary specification before the macro as follows: 

proc sql;
	Create table Datasets.Special_Perc_&Split
	(Variable			char 	length = 23 	format=$50.,
	Sum_Special			num 	length = 8		format = best12.,
	Sum_positive		num		length = 8		format = best12.,
	Percentage 			num		length = 8		format = Percent6.2);		
quit; 

 Secondly,  I wrote a macro to calculate the percentage of special values for each column in any given data set.  This is then added to the table that I created above for each iteration of the loop.

 

%macro Special(dsn);

	%let lib = %upcase(%scan(&dsn,1,.));
	%let dsn = %upcase(%scan(&dsn,2,.));

	/*	Creatign macro's for each column in the data set	*/

	data _null_;
		set sashelp.vcolumn(keep=libname memname name type
		where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
			call symputx('col'||strip(_n_),strip(name));
			if eof then call symputx('max',strip(_n_));
	run;

	%do i = 1 %to &max;
	
		proc sql noprint;
		select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
		       sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
			round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
			into :SN, :SP, :Perc
		from &lib..&dsn;
		quit;

		data b;
			Variable = "&&col&i";
			Sum_Special = &sn;
			Sum_positive = &sp;
			Percentage = &Perc;
		run;

		proc append base = Datasets.Special_Perc_&split data = b force; 
		run;

		proc sql;
			drop table b;
		quit;
   %end;

%mend special;

%special(Datasets.Longevity_Numeric_&split);

Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%. 

 

It may not be the best way to do it, but it got the job done. 😀

View solution in original post

2 REPLIES 2
sbxkoenk
SAS Super FREQ

Hello @EduanR ,

Something like the below :

data work.have;
input ID $ Target_Variable Var1 Var2 Var3;
cards;
001	1	-2	-2	-2
002	1	-2	-2	-3
003	0	-2	-2	1500
004	1	5	-3	-3
005	0	-3	18	-2
;
run;

data work.have1(drop=i);
 set work.have;
 array myvars{3}     Var1-  Var3;
 array countvar{3} c_Var1-c_Var3;
 do i=1 to dim(myvars);
  if myvars(i) in (-1,-2,-3,-4,-5,-6,-7) then countvar(i)=1; else countvar(i)=0;
 end;
run;

ods trace off;
ods output OneWayFreqs=work.OneWayFreqs;
PROC FREQ data=work.have1 /*noprint*/;
 tables c_Var1-c_Var3 / list /* missing out=work.myds */;
run;

data work.vars_to_throw_away;
 set work.OneWayFreqs;
 where c_Var1=1 OR c_Var2=1 OR c_Var3=1;
 where also Percent >= 80;
run; 

PROC SQL noprint;
 select scan(Table,2,'_') into :Lijst separated by ' '
 from work.vars_to_throw_away;
QUIT;
%PUT &=Lijst;

data work.want;
 set work.have(drop=&Lijst.);
run;
/* end of program */

Koen

EduanR
Fluorite | Level 6

Thank you @sbxkoenk for the assistance. 

 

I also managed to write the following code and it seems to work/provide the desired output. 

 

Firstly, I created an empty table with the necessary specification before the macro as follows: 

proc sql;
	Create table Datasets.Special_Perc_&Split
	(Variable			char 	length = 23 	format=$50.,
	Sum_Special			num 	length = 8		format = best12.,
	Sum_positive		num		length = 8		format = best12.,
	Percentage 			num		length = 8		format = Percent6.2);		
quit; 

 Secondly,  I wrote a macro to calculate the percentage of special values for each column in any given data set.  This is then added to the table that I created above for each iteration of the loop.

 

%macro Special(dsn);

	%let lib = %upcase(%scan(&dsn,1,.));
	%let dsn = %upcase(%scan(&dsn,2,.));

	/*	Creatign macro's for each column in the data set	*/

	data _null_;
		set sashelp.vcolumn(keep=libname memname name type
		where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
			call symputx('col'||strip(_n_),strip(name));
			if eof then call symputx('max',strip(_n_));
	run;

	%do i = 1 %to &max;
	
		proc sql noprint;
		select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
		       sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
			round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
			into :SN, :SP, :Perc
		from &lib..&dsn;
		quit;

		data b;
			Variable = "&&col&i";
			Sum_Special = &sn;
			Sum_positive = &sp;
			Percentage = &Perc;
		run;

		proc append base = Datasets.Special_Perc_&split data = b force; 
		run;

		proc sql;
			drop table b;
		quit;
   %end;

%mend special;

%special(Datasets.Longevity_Numeric_&split);

Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%. 

 

It may not be the best way to do it, but it got the job done. 😀

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 459 views
  • 1 like
  • 2 in conversation