I take a data with 2000 columns. Names of the columns are like q1,q2,q3.... I want to create a new table with the ones that have the count of non missing(+zero) rows more than 70% in the column. I also want to join the some column whose percentage is less than 70 to the table. My question is , how do i get 2000 output for 2000 columns with a single program and how do i specify and add the column to be created to the data.
The table that I want to create should look like the one below.
Q1 resultPercantage
Q2 resultPercantage
Q3 resultPercantage
There are also some codes that I tried to create below.
%LET VARNAME =
Column1/*Q1*/
Column2/*Q2*/
Column3/*Q3*/
;
%percentage(0.70)
%macro procfreq(Var);
%macro percentage(deger);
PROC SQL;
Create Table new as
SELECT (tt.nonMissingCount/tt.toplamCount) as result format=percent10.2,
case
when calculated result>°er then out=yeni /*????*/
else 'false /*????*/'
end as yeniSonuc
from(
SELECT DISTINCT
(SELECT COUNT(&Var) as NonMissing from work.exampledata WHERE &Var ne 0) as nonMissingCount
,(SELECT COUNT(&Var) as SumCnt from work.exampedata) as sumCount from work.exampledata)as tt;
QUIT;
%mend;
%mend;
%macro loop;
%let i=1;
%do %while(%scan(&VARNAME.,&i.,%STR( ))~=);
%procfreq(%scan(&VARNAME.,&i.,%STR( )));
%let i=&i.+1;
%end;
%mend;
%loop;
I will be very greatful if I hear from you as soon as possible.
This code will create a table with one row for each column and the %age of records with a >0 value; any splitting of records with >70% etc can then easily be accomplished.
%let colCnt=2000;
data out_data(keep=var pcntage);
format pcntage percent7.2;
array t t1-t&colCnt.;
retain t1-t&colCnt. 0; * retain some total counts;
set x end = eof;
array q q1-q&colCnt.;
do i=1 to &colcnt;
t(i)=t(i)+(q(i)>0); * add each non-missing to the total count ;
end;
if eof then do i=1 to &colcnt.;
var='q'||strip(put(i,best.)); * create column name ;
pcntage=t(i)/_n_; * calculate %age ;
output; * output one row for each of the analyzed columns;
end;
run;
Where this count value comes from ?
Number_Missing=&obs_count-count;
There are 4000 numbers of Q columns
I want to see all of them percentage like below then i need to transpose it
Q1 87.50%;
Q2 50.50%;
Q3 40.50%;
Q4 90.50%;
Q5 65.50%;
Q6 70.50%;
Q7 60.50%;
Q8 80.50%;
It looks like you're in EG?
Use Tasks>Transpose and switch your data structure such that it becomes
ID Question Value
1 Q1 1
1 Q2 0
1 Q3 1
....
1 Q5000 0
2 Q1 0
Then use Query Builder with an N() and NMISS() function to determine the number of missing with the Question in the GROUP By Column.
Thanks Reeza, i'll check. On the other hand your code involves a lot of things. I just a little bit confused( https://gist.github.com/statgeek/c3a9ddcb002c469e9d61 ) Can you show me a simpler method to reach my loop and nonmissing aim,please .
Thank you.
Here's a worked example using both a proc transpose and the macro. The macro is inefficient but works quickly and across all types of variables, numeric/character. FYI You should show data in the text, rather than images, so we can copy and paste it into our code, it helps us to help you.
*Generate sample data;
data have;
array q(20) q1-q20;
do id=1 to 100;
do i=1 to 20;
q(i)=floor(rand('uniform')*100+1);
x=rand('bernoulli', 0.1);
if x=1 then q(i)=.;
end;
output;
end;
drop i x;
run;
*check format;
proc print data=have(obs=5);
run;
*Flip dataset;
proc transpose data=have out=flipped(rename=(_name_=Question Col1=Value));
by id;
run;
*check format again;
proc print data=flipped(obs=10);
run;
*Count number missing using SQL step;
proc sql;
create table want as
select question, nmiss(value) as NMISS, count(question) as N,
nmiss(value)/count(question) as PCT_MISS format=percent8.1
from flipped
group by question;
quit;
proc print data=want;
run;
*Sample output if used macro, make sure to run the macro first;
%sum_missing(work, have, want_macro);
proc print data=want_macro;
run;
You do not need macro code, much less a maco loop.
Look at this question that just used PROC FREQ to generate the counts for ALL variables.
This code will create a table with one row for each column and the %age of records with a >0 value; any splitting of records with >70% etc can then easily be accomplished.
%let colCnt=2000;
data out_data(keep=var pcntage);
format pcntage percent7.2;
array t t1-t&colCnt.;
retain t1-t&colCnt. 0; * retain some total counts;
set x end = eof;
array q q1-q&colCnt.;
do i=1 to &colcnt;
t(i)=t(i)+(q(i)>0); * add each non-missing to the total count ;
end;
if eof then do i=1 to &colcnt.;
var='q'||strip(put(i,best.)); * create column name ;
pcntage=t(i)/_n_; * calculate %age ;
output; * output one row for each of the analyzed columns;
end;
run;
Thanks a lot. I was very important for me thanks again 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.