BookmarkSubscribeRSS Feed
aidant01
Fluorite | Level 6

Hi, 

 

I have a dataset with character and numeric variables (var1, var2, ....) and one of the variables is called 'fy' which takes on the character values of '2022','2023', and '2024'. 

 

I want a dataset as follows below. The value 0.7 ,for example, represents the percentage of values of var1 that is missing or blank. In this case, 70% of var1 values in the dataset is missing or blank. Code should be able to handle whether var1 is character or numeric. Thanks! 

 

 fy2022fy2023fy2024
var10.70.650.6
var20.40.350.3
var30.50.450.4
var40.350.30.25
var50.60.550.5

 

I tried using proc freq data=out.mydataset;

tables _all_/list missing;

format _character_ $missing.

              _numeric_ missing.;

run;

3 REPLIES 3
ballardw
Super User

As you have found out the OUT= option only works for one variable. Plus your code doesn't show any attempt to cross the variables with the FY variable.

To get results for each level of FY your tables statement would have to look like:

tables fy *(list of variables) /missing;

If you use fy *_character_  since you say Fy is character (and why a year is ever character baffles me) or fy*_all_, you will get FY crossed with itself. So be prepared to handle that.

This will create a summary data set:

ods output crosstabfreqs=mycrosstab;
proc freq data=example;
tables FY* _all_/list missing;
format _character_ $missing.
      _numeric_ missing.;
run;

With two-way tables like this and what you want to find you would be looking the _type_=11 rows of data.

You would likely pass the output data set through a data step to filter records and then use that as input to something else IF you actually need a data set then Proc transpose with the FY variable as an ID variable (BY TABLE) or one of Proc Report or Tabulate.

Without knowing what your MISSING formats look like I can't suggest exact code but you would be looking for values of the format.

 

OR instead of your raw data into proc freq I would probably reshape the whole data set to add a variable name and a missing/nonmissing value indicator (no format needed) and summarize that set.

data helpful;
   set out.mydataset;
   array c(*) _character_;
   array v(*) _numeric_;
   do i= 1 to dim(c);
      name=vname(c[i]);
      missing = missing(c[i]);
      if upcase(name) ne 'FY' then output;
   end;
   do i= 1 to dim(v);
      name=vname(v[i]);
      missing = missing(v[i]);
   end;
   keep FY name missing;
run;

proc tabulate data=helpful;
   class fy name;
   var missing
   table name,
         fy*name*mean=' '
   ;
run;

Data sets that are wide with actual values in data set names are poor choices for further processing.

If you need a data set with the values:

proc summary data=helpful nway;
   class fy name;
   var missing;
   output out=want (drop= _:) mean=;
run;

Both the Tabulate and Summary code use the result of the mean of a 1/0 coded numeric value is the percentage of 1 values, which what the MISSING function returns in the data step building helpful. Note that MISSING is one of the few functions that works with both numeric and character values.

This also does not create FY crossed with itself results.

 

 

mkeintz
PROC Star

With a temporary modification of the dataset, a PROC TABULATE might be what you want:

 


data need / view=need;
  set have;
  array v var1-var6;
  do over v;
    v=nmiss(v);  /*Missing==>1,  otherwise==>.) */
  end;
run;

proc tabulate data=need noseps ;
  class FY;
  var var1-var6;
  tables (var1-var6)*(mean=' ')*f=percent8.2
        , FY all='All Years' / rts=12;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
call streaminit(123);
do fy='2022','2023','2024';
  do obs=1 to 100;
     var1=rand('uniform'); if rand('uniform')<0.1 then call missing(var1);
     var2=rand('uniform'); if rand('uniform')<0.2 then call missing(var2);
     var3=rand('uniform'); if rand('uniform')<0.3 then call missing(var3);
     var4=rand('uniform'); if rand('uniform')<0.4 then call missing(var4);
     var5=rand('uniform'); if rand('uniform')<0.5 then call missing(var5);
	 output;
  end;
end;
run;



proc sql;
create table temp as
select fy,
       nmiss(var1)/(select count(*) from have where fy=a.fy) as var1,
	   nmiss(var2)/(select count(*) from have where fy=a.fy) as var2,
	   nmiss(var3)/(select count(*) from have where fy=a.fy) as var3,
	   nmiss(var4)/(select count(*) from have where fy=a.fy) as var4,
	   nmiss(var5)/(select count(*) from have where fy=a.fy) as var5
 from have as a
  group by fy;
quit;
proc transpose data=temp out=want prefix=fy;
id fy;
var var1-var5;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 751 views
  • 0 likes
  • 4 in conversation