<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to remove variables in a data set that contain a certain percentage of special values in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-remove-variables-in-a-data-set-that-contain-a-certain/m-p/778543#M247831</link>
    <description>&lt;P&gt;Good day!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It seems that my first post was somehow marked as SPAM so I apologize if you are seeing this post a second time.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have the following problem: I want to remove variables in a data set that contain more than 90% of the following values: -1,-2,-3,-4,-5,-6,-7. Stated otherwise, If a variable (of which there are 675) is made up of more than 90% of the above listed values, it should be removed from the data set. Below is an example of the data set I am working with.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;Target Variable&lt;/TD&gt;&lt;TD&gt;Var1&lt;/TD&gt;&lt;TD&gt;Var2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;Var675&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;001&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;002&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;003&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;1500&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;004&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;005&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;TD&gt;18&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since there is a large number of columns in the data set I figured that I will probably need a do loop to do this. Therefore, I created the following macro to remove variables for any given data set that contain more than 90% of these special values.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;%macro control(dsn, method = Quantile);

%let lib = %upcase(%scan(&amp;amp;dsn,1,.));
%let dsn = %upcase(%scan(&amp;amp;dsn,2,.));
%let method = %upcase(&amp;amp;method);

/*	Creatign macro's for each column in the data set	*/

data _null_;
	set sashelp.vcolumn(keep=libname memname name type
	where=(libname = "&amp;amp;lib" AND memname = "&amp;amp;dsn" and type = "num")) end=eof;
		call symputx('col'||strip(_n_),strip(name));
		if eof then call symputx('max',strip(_n_));
run;

%do i = 1 %to &amp;amp;max;

I assume that this is were the code will be written to determine the percentage of special&lt;BR /&gt;values in a variable and then drop it from the original data set if it exceeds 90%. 

%mend control; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
    <pubDate>Thu, 04 Nov 2021 17:04:24 GMT</pubDate>
    <dc:creator>EduanR</dc:creator>
    <dc:date>2021-11-04T17:04:24Z</dc:date>
    <item>
      <title>How to remove variables in a data set that contain a certain percentage of special values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-remove-variables-in-a-data-set-that-contain-a-certain/m-p/778543#M247831</link>
      <description>&lt;P&gt;Good day!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It seems that my first post was somehow marked as SPAM so I apologize if you are seeing this post a second time.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have the following problem: I want to remove variables in a data set that contain more than 90% of the following values: -1,-2,-3,-4,-5,-6,-7. Stated otherwise, If a variable (of which there are 675) is made up of more than 90% of the above listed values, it should be removed from the data set. Below is an example of the data set I am working with.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;TABLE border="1"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;Target Variable&lt;/TD&gt;&lt;TD&gt;Var1&lt;/TD&gt;&lt;TD&gt;Var2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;Var675&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;001&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;002&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;003&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;1500&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;004&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;005&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;-3&lt;/TD&gt;&lt;TD&gt;18&lt;/TD&gt;&lt;TD&gt;...&lt;/TD&gt;&lt;TD&gt;-2&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since there is a large number of columns in the data set I figured that I will probably need a do loop to do this. Therefore, I created the following macro to remove variables for any given data set that contain more than 90% of these special values.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;%macro control(dsn, method = Quantile);

%let lib = %upcase(%scan(&amp;amp;dsn,1,.));
%let dsn = %upcase(%scan(&amp;amp;dsn,2,.));
%let method = %upcase(&amp;amp;method);

/*	Creatign macro's for each column in the data set	*/

data _null_;
	set sashelp.vcolumn(keep=libname memname name type
	where=(libname = "&amp;amp;lib" AND memname = "&amp;amp;dsn" and type = "num")) end=eof;
		call symputx('col'||strip(_n_),strip(name));
		if eof then call symputx('max',strip(_n_));
run;

%do i = 1 %to &amp;amp;max;

I assume that this is were the code will be written to determine the percentage of special&lt;BR /&gt;values in a variable and then drop it from the original data set if it exceeds 90%. 

%mend control; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
      <pubDate>Thu, 04 Nov 2021 17:04:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-remove-variables-in-a-data-set-that-contain-a-certain/m-p/778543#M247831</guid>
      <dc:creator>EduanR</dc:creator>
      <dc:date>2021-11-04T17:04:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove variables in a data set that contain a certain percentage of special values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-remove-variables-in-a-data-set-that-contain-a-certain/m-p/778844#M247972</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/405474"&gt;@EduanR&lt;/a&gt;&amp;nbsp;,&lt;BR /&gt;&lt;BR /&gt;Something like the below :&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data work.have;
input ID $ Target_Variable Var1 Var2 Var3;
cards;
001	1	-2	-2	-2
002	1	-2	-2	-3
003	0	-2	-2	1500
004	1	5	-3	-3
005	0	-3	18	-2
;
run;

data work.have1(drop=i);
 set work.have;
 array myvars{3}     Var1-  Var3;
 array countvar{3} c_Var1-c_Var3;
 do i=1 to dim(myvars);
  if myvars(i) in (-1,-2,-3,-4,-5,-6,-7) then countvar(i)=1; else countvar(i)=0;
 end;
run;

ods trace off;
ods output OneWayFreqs=work.OneWayFreqs;
PROC FREQ data=work.have1 /*noprint*/;
 tables c_Var1-c_Var3 / list /* missing out=work.myds */;
run;

data work.vars_to_throw_away;
 set work.OneWayFreqs;
 where c_Var1=1 OR c_Var2=1 OR c_Var3=1;
 where also Percent &amp;gt;= 80;
run; 

PROC SQL noprint;
 select scan(Table,2,'_') into :Lijst separated by ' '
 from work.vars_to_throw_away;
QUIT;
%PUT &amp;amp;=Lijst;

data work.want;
 set work.have(drop=&amp;amp;Lijst.);
run;
/* end of program */&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Fri, 05 Nov 2021 17:01:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-remove-variables-in-a-data-set-that-contain-a-certain/m-p/778844#M247972</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2021-11-05T17:01:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove variables in a data set that contain a certain percentage of special values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-remove-variables-in-a-data-set-that-contain-a-certain/m-p/778945#M248027</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/60547"&gt;@sbxkoenk&lt;/a&gt; for the assistance.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also managed to write the following code and it seems to work/provide the desired output.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Firstly, I created an empty table with the necessary specification before the macro as follows:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;proc sql;
	Create table Datasets.Special_Perc_&amp;amp;Split
	(Variable			char 	length = 23 	format=$50.,
	Sum_Special			num 	length = 8		format = best12.,
	Sum_positive		num		length = 8		format = best12.,
	Percentage 			num		length = 8		format = Percent6.2);		
quit; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;Secondly,&amp;nbsp; I wrote a macro to calculate the percentage of special values for each column in any given data set.&amp;nbsp; This is then added to the table that I created above for each iteration of the loop.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;%macro Special(dsn);

	%let lib = %upcase(%scan(&amp;amp;dsn,1,.));
	%let dsn = %upcase(%scan(&amp;amp;dsn,2,.));

	/*	Creatign macro's for each column in the data set	*/

	data _null_;
		set sashelp.vcolumn(keep=libname memname name type
		where=(libname = "&amp;amp;lib" AND memname = "&amp;amp;dsn" and type = "num")) end=eof;
			call symputx('col'||strip(_n_),strip(name));
			if eof then call symputx('max',strip(_n_));
	run;

	%do i = 1 %to &amp;amp;max;
	
		proc sql noprint;
		select sum(&amp;amp;&amp;amp;col&amp;amp;i in (-1,-2,-3,-4,-5,-6,-7)) as NegS,
		       sum(&amp;amp;&amp;amp;col&amp;amp;i not in (-1,-2,-3,-4,-5,-6,-7)) as Pos,
			round(calculated NegS/(calculated NegS + calculated Pos), .01) as Perct
			into :SN, :SP, :Perc
		from &amp;amp;lib..&amp;amp;dsn;
		quit;

		data b;
			Variable = "&amp;amp;&amp;amp;col&amp;amp;i";
			Sum_Special = &amp;amp;sn;
			Sum_positive = &amp;amp;sp;
			Percentage = &amp;amp;Perc;
		run;

		proc append base = Datasets.Special_Perc_&amp;amp;split data = b force; 
		run;

		proc sql;
			drop table b;
		quit;
   %end;

%mend special;

%special(Datasets.Longevity_Numeric_&amp;amp;split);&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Lastly, i use a proc sql to exclude those variables that have a special value percentage of more than 90%.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It may not be the best way to do it, but it got the job done.&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":grinning_face:"&gt;😀&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 06 Nov 2021 19:23:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-remove-variables-in-a-data-set-that-contain-a-certain/m-p/778945#M248027</guid>
      <dc:creator>EduanR</dc:creator>
      <dc:date>2021-11-06T19:23:58Z</dc:date>
    </item>
  </channel>
</rss>

