Hi,
I have a dataset with character and numeric variables (var1, var2, ....) and one of the variables is called 'fy' which takes on the character values of '2022','2023', and '2024'.
I want a dataset as follows below. The value 0.7 ,for example, represents the percentage of values of var1 that is missing or blank. In this case, 70% of var1 values in the dataset is missing or blank. Code should be able to handle whether var1 is character or numeric. Thanks!
fy2022 | fy2023 | fy2024 | |
var1 | 0.7 | 0.65 | 0.6 |
var2 | 0.4 | 0.35 | 0.3 |
var3 | 0.5 | 0.45 | 0.4 |
var4 | 0.35 | 0.3 | 0.25 |
var5 | 0.6 | 0.55 | 0.5 |
I tried using proc freq data=out.mydataset;
tables _all_/list missing;
format _character_ $missing.
_numeric_ missing.;
run;
As you have found out the OUT= option only works for one variable. Plus your code doesn't show any attempt to cross the variables with the FY variable.
To get results for each level of FY your tables statement would have to look like:
tables fy *(list of variables) /missing;
If you use fy *_character_ since you say Fy is character (and why a year is ever character baffles me) or fy*_all_, you will get FY crossed with itself. So be prepared to handle that.
This will create a summary data set:
ods output crosstabfreqs=mycrosstab; proc freq data=example; tables FY* _all_/list missing; format _character_ $missing. _numeric_ missing.; run;
With two-way tables like this and what you want to find you would be looking the _type_=11 rows of data.
You would likely pass the output data set through a data step to filter records and then use that as input to something else IF you actually need a data set then Proc transpose with the FY variable as an ID variable (BY TABLE) or one of Proc Report or Tabulate.
Without knowing what your MISSING formats look like I can't suggest exact code but you would be looking for values of the format.
OR instead of your raw data into proc freq I would probably reshape the whole data set to add a variable name and a missing/nonmissing value indicator (no format needed) and summarize that set.
data helpful; set out.mydataset; array c(*) _character_; array v(*) _numeric_; do i= 1 to dim(c); name=vname(c[i]); missing = missing(c[i]); if upcase(name) ne 'FY' then output; end; do i= 1 to dim(v); name=vname(v[i]); missing = missing(v[i]); end; keep FY name missing; run; proc tabulate data=helpful; class fy name; var missing table name, fy*name*mean=' ' ; run;
Data sets that are wide with actual values in data set names are poor choices for further processing.
If you need a data set with the values:
proc summary data=helpful nway; class fy name; var missing; output out=want (drop= _:) mean=; run;
Both the Tabulate and Summary code use the result of the mean of a 1/0 coded numeric value is the percentage of 1 values, which what the MISSING function returns in the data step building helpful. Note that MISSING is one of the few functions that works with both numeric and character values.
This also does not create FY crossed with itself results.
With a temporary modification of the dataset, a PROC TABULATE might be what you want:
data need / view=need;
set have;
array v var1-var6;
do over v;
v=nmiss(v); /*Missing==>1, otherwise==>.) */
end;
run;
proc tabulate data=need noseps ;
class FY;
var var1-var6;
tables (var1-var6)*(mean=' ')*f=percent8.2
, FY all='All Years' / rts=12;
run;
data have;
call streaminit(123);
do fy='2022','2023','2024';
do obs=1 to 100;
var1=rand('uniform'); if rand('uniform')<0.1 then call missing(var1);
var2=rand('uniform'); if rand('uniform')<0.2 then call missing(var2);
var3=rand('uniform'); if rand('uniform')<0.3 then call missing(var3);
var4=rand('uniform'); if rand('uniform')<0.4 then call missing(var4);
var5=rand('uniform'); if rand('uniform')<0.5 then call missing(var5);
output;
end;
end;
run;
proc sql;
create table temp as
select fy,
nmiss(var1)/(select count(*) from have where fy=a.fy) as var1,
nmiss(var2)/(select count(*) from have where fy=a.fy) as var2,
nmiss(var3)/(select count(*) from have where fy=a.fy) as var3,
nmiss(var4)/(select count(*) from have where fy=a.fy) as var4,
nmiss(var5)/(select count(*) from have where fy=a.fy) as var5
from have as a
group by fy;
quit;
proc transpose data=temp out=want prefix=fy;
id fy;
var var1-var5;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.