BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EduanR
Fluorite | Level 6

Good day! 

 

Hope all is well with you. This is my first time posting a question on the SAS blog so I apologize if a listed it under the wrong location. I thought of listing it to enterprise guide since I am working in enterprise guide 😀

 

I have the following problem: I want to remove variables in a data set that have more than 90% of the following values:-1 , -2, -3, -4, -5 , -6 , -7. Stated otherwise, if a variable is made up of more than 90% of the above listed values, the variable should be removed from the data set. Below is an example of the data set I am working with. 

 

IDTarget VariableVar1Var2...Var675
0011-2-2..-3
0021-2-2...-2
0030241000...100
0040-2-4...-4
0051-3-5...-3

 

I have not yet written any code. But I got the feeling that I will probably have to use a loop (to loop through all the columns). So therefore I created the following macro definition that will perform this process for all the numeric columns in any given data set. All the columns from Var1 - Var675 are numeric. 

 

%macro control(dsn, method = Quantile);

%let lib = %upcase(%scan(&dsn,1,.));
%let dsn = %upcase(%scan(&dsn,2,.));
%let method = %upcase(&method);

/*	Creatign macro's for each column in the data set	*/

data _null_;
	set sashelp.vcolumn(keep=libname memname name type
	where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
		call symputx('col'||strip(_n_),strip(name));
		if eof then call symputx('max',strip(_n_));
run;

%do i = 1 %to &max;

I presume here is where the code will go for my problem.
 
%end;

%mend control

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
EduanR
Fluorite | Level 6

Thank you @Norman21 fro the help. 

 

I managed to write the following code and it seems to work/provide the desired output. 

 

Firstly, I created an empty table with the necessary specification before the macro as follows: 

proc sql;
	Create table Datasets.Special_Perc_&Split
	(Variable			char 	length = 23 	format=$50.,
	Sum_Special			num 	length = 8		format = best12.,
	Sum_positive		num		length = 8		format = best12.,
	Percentage 			num		length = 8		format = Percent6.2);		
quit; 

 Secondly,  I wrote a macro to calculate the percentage of special values for each column in any given data set.  This is then added to the table that I created above for each iteration of the loop.

 

%macro Special(dsn);

	%let lib = %upcase(%scan(&dsn,1,.));
	%let dsn = %upcase(%scan(&dsn,2,.));

	/*	Creatign macro's for each column in the data set	*/

	data _null_;
		set sashelp.vcolumn(keep=libname memname name type
		where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
			call symputx('col'||strip(_n_),strip(name));
			if eof then call symputx('max',strip(_n_));
	run;

	%do i = 1 %to &max;
	
		proc sql noprint;
		select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
		       sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
			round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
			into :SN, :SP, :Perc
		from &lib..&dsn;
		quit;

		data b;
			Variable = "&&col&i";
			Sum_Special = &sn;
			Sum_positive = &sp;
			Percentage = &Perc;
		run;

		proc append base = Datasets.Special_Perc_&split data = b force; 
		run;

		proc sql;
			drop table b;
		quit;
   %end;

%mend special;

%special(Datasets.Longevity_Numeric_&split);

Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%. 

 

It may not be the best way to do it, but it got the job done. 

EduanR_0-1636283540132.png

 

View solution in original post

3 REPLIES 3
Norman21
Lapis Lazuli | Level 10

Welcome to the Community!

 

You might find the following helpful:

 

https://blogs.sas.com/content/iml/2011/09/19/count-the-number-of-missing-values-for-each-variable.ht...

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

EduanR
Fluorite | Level 6

Thank you @Norman21

 

I will take a look and i will let you know if it works. 

EduanR
Fluorite | Level 6

Thank you @Norman21 fro the help. 

 

I managed to write the following code and it seems to work/provide the desired output. 

 

Firstly, I created an empty table with the necessary specification before the macro as follows: 

proc sql;
	Create table Datasets.Special_Perc_&Split
	(Variable			char 	length = 23 	format=$50.,
	Sum_Special			num 	length = 8		format = best12.,
	Sum_positive		num		length = 8		format = best12.,
	Percentage 			num		length = 8		format = Percent6.2);		
quit; 

 Secondly,  I wrote a macro to calculate the percentage of special values for each column in any given data set.  This is then added to the table that I created above for each iteration of the loop.

 

%macro Special(dsn);

	%let lib = %upcase(%scan(&dsn,1,.));
	%let dsn = %upcase(%scan(&dsn,2,.));

	/*	Creatign macro's for each column in the data set	*/

	data _null_;
		set sashelp.vcolumn(keep=libname memname name type
		where=(libname = "&lib" AND memname = "&dsn" and type = "num")) end=eof;
			call symputx('col'||strip(_n_),strip(name));
			if eof then call symputx('max',strip(_n_));
	run;

	%do i = 1 %to &max;
	
		proc sql noprint;
		select sum(&&col&i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
		       sum(&&col&i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
			round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
			into :SN, :SP, :Perc
		from &lib..&dsn;
		quit;

		data b;
			Variable = "&&col&i";
			Sum_Special = &sn;
			Sum_positive = &sp;
			Percentage = &Perc;
		run;

		proc append base = Datasets.Special_Perc_&split data = b force; 
		run;

		proc sql;
			drop table b;
		quit;
   %end;

%mend special;

%special(Datasets.Longevity_Numeric_&split);

Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%. 

 

It may not be the best way to do it, but it got the job done. 

EduanR_0-1636283540132.png

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1065 views
  • 1 like
  • 2 in conversation