SAS Programming

aidant01 · Posted 03-25-2024 11:28 AM

Hi,

I have a dataset with character and numeric variables (var1, var2, ....) and one of the variables is called 'fy' which takes on the character values of '2022','2023', and '2024'.

I want a dataset as follows below. The value 0.7 ,for example, represents the percentage of values of var1 that is missing or blank. In this case, 70% of var1 values in the dataset is missing or blank. Code should be able to handle whether var1 is character or numeric. Thanks!

	fy2022	fy2023	fy2024
var1	0.7	0.65	0.6
var2	0.4	0.35	0.3
var3	0.5	0.45	0.4
var4	0.35	0.3	0.25
var5	0.6	0.55	0.5

I tried using proc freq data=out.mydataset;

tables _all_/list missing;

format _character_ $missing.

_numeric_ missing.;

run;

ballardw · Posted 03-25-2024 12:15 PM

As you have found out the OUT= option only works for one variable. Plus your code doesn't show any attempt to cross the variables with the FY variable.

To get results for each level of FY your tables statement would have to look like:

tables fy *(list of variables) /missing;

If you use fy *_character_ since you say Fy is character (and why a year is ever character baffles me) or fy*_all_, you will get FY crossed with itself. So be prepared to handle that.

This will create a summary data set:

ods output crosstabfreqs=mycrosstab;
proc freq data=example;
tables FY* _all_/list missing;
format _character_ $missing.
      _numeric_ missing.;
run;

With two-way tables like this and what you want to find you would be looking the _type_=11 rows of data.

You would likely pass the output data set through a data step to filter records and then use that as input to something else IF you actually need a data set then Proc transpose with the FY variable as an ID variable (BY TABLE) or one of Proc Report or Tabulate.

Without knowing what your MISSING formats look like I can't suggest exact code but you would be looking for values of the format.

OR instead of your raw data into proc freq I would probably reshape the whole data set to add a variable name and a missing/nonmissing value indicator (no format needed) and summarize that set.

data helpful;
   set out.mydataset;
   array c(*) _character_;
   array v(*) _numeric_;
   do i= 1 to dim(c);
      name=vname(c[i]);
      missing = missing(c[i]);
      if upcase(name) ne 'FY' then output;
   end;
   do i= 1 to dim(v);
      name=vname(v[i]);
      missing = missing(v[i]);
   end;
   keep FY name missing;
run;

proc tabulate data=helpful;
   class fy name;
   var missing
   table name,
         fy*name*mean=' '
   ;
run;

Data sets that are wide with actual values in data set names are poor choices for further processing.

If you need a data set with the values:

proc summary data=helpful nway;
   class fy name;
   var missing;
   output out=want (drop= _:) mean=;
run;

Both the Tabulate and Summary code use the result of the mean of a 1/0 coded numeric value is the percentage of 1 values, which what the MISSING function returns in the data step building helpful. Note that MISSING is one of the few functions that works with both numeric and character values.

This also does not create FY crossed with itself results.

mkeintz · Posted 03-25-2024 03:02 PM

With a temporary modification of the dataset, a PROC TABULATE might be what you want:


data need / view=need;
  set have;
  array v var1-var6;
  do over v;
    v=nmiss(v);  /*Missing==>1,  otherwise==>.) */
  end;
run;

proc tabulate data=need noseps ;
  class FY;
  var var1-var6;
  tables (var1-var6)*(mean=' ')*f=percent8.2
        , FY all='All Years' / rts=12;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 03-25-2024 11:04 PM

data have;
call streaminit(123);
do fy='2022','2023','2024';
  do obs=1 to 100;
     var1=rand('uniform'); if rand('uniform')<0.1 then call missing(var1);
     var2=rand('uniform'); if rand('uniform')<0.2 then call missing(var2);
     var3=rand('uniform'); if rand('uniform')<0.3 then call missing(var3);
     var4=rand('uniform'); if rand('uniform')<0.4 then call missing(var4);
     var5=rand('uniform'); if rand('uniform')<0.5 then call missing(var5);
	 output;
  end;
end;
run;



proc sql;
create table temp as
select fy,
       nmiss(var1)/(select count(*) from have where fy=a.fy) as var1,
	   nmiss(var2)/(select count(*) from have where fy=a.fy) as var2,
	   nmiss(var3)/(select count(*) from have where fy=a.fy) as var3,
	   nmiss(var4)/(select count(*) from have where fy=a.fy) as var4,
	   nmiss(var5)/(select count(*) from have where fy=a.fy) as var5
 from have as a
  group by fy;
quit;
proc transpose data=temp out=want prefix=fy;
id fy;
var var1-var5;
run;

SAS Programming

Proc Freq for missing values trouble shooting

Re: Proc Freq for missing values trouble shooting

Re: Proc Freq for missing values trouble shooting

Re: Proc Freq for missing values trouble shooting

Proc FREQ Remove Missing Values as Category

Swap rows with missing values with the ones without missing

Conditionally remove missing values

Impute missing value with mean of other available variable values if o...

proc freq-show null values dist

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...