BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lberger
Fluorite | Level 6

Hello,

 

New SAS user. Using SAS University through virtual box on win 10. Comfortable coding but getting use to SAS.

 

I have a group of variables (column names) that share a prefix such as: blah1 through blah99.

 

My goal is to count the number of non missing values of each variable and create a new column with that count,

 

I would like to do something like:

 

IF currentVar contains 'blah' and not missing(currentVar) then

do;

...

end;

Or is my only option a Do loop with an Array? If so, I will post the issue I am having with my loop later.

 

Or should I create a new table with just those variables and count them?

 

Thanks in advance,

Lon

 

1 ACCEPTED SOLUTION

Accepted Solutions
lberger
Fluorite | Level 6

It appears there is no function or variable in SAS to tell which column is currently being read in the data step. You can tell the row number with _N_ but not the column. n(of ) as stated by Tom in this thread is nice, but seems to have no character equal. Bizarre. So, if you know what the names of the columns are you can count observations with this:

 

count=0;
array diagGrp(*) DIAG1-DIAG63;
do i=1 to 63;
	if not missing(diagGrp(i)) then
	do;
		count+1;
	end;
end;

No fancy code. Just a basic loop that does nothing other than what is stated.  

View solution in original post

10 REPLIES 10
lberger
Fluorite | Level 6

Actually, creating a table with just those variables won't solve the issue either. So I take that back.

Tom
Super User Tom
Super User

Just use the N function with a variable list.

count = n(of blah1-blah99);

If you don't know the exact variable names but you want all of those that start the a prefix then use:

count = n(of blah:);
lberger
Fluorite | Level 6

Hi Tom,

 

For some reason I was unable to get either to work. I kept getting zero. I will keep trying because it must be an issue on my end. That seems like a logical solution. It is strange, with how robust SAS is, that there is no way to get the current variable (column name) in a data step as it loops.

 

I ended up going with a do loop for now. I can then just get a mean of the new column.

 

count=0;
array diagGrp(*) DIAG1-DIAG63;
do i=1 to 63;
	if not missing(diagGrp(i)) then
	do;
		count+1;
	end;
end;

Thank you,

Lon

lberger
Fluorite | Level 6

By reading ballardw's response I realized that n(of) did not work for me because the data in the variables are character not numeric. 

ballardw
Super User

You might want to provide some example data and the desired result for that example. One interpretation for "count the number of non missing values of each variable" would be across observations which would mean getting your count and then merging the summary data back.

 

Also, are all of the variables numeric? That would make the count in any form easier.

 

@Tom's solution will count non-missing numeric variables per observation (or record / row if you prefer)

lberger
Fluorite | Level 6

Hello ballardw,

 

I want to count non missing values whether or not they are numeric or character.

 

B1  B2  B3  B4  B5

aa   bb  aa   dd   cc

cc   aa          cc  

 

So this would result in 5 for the first row and 3 for the second row. I ended up with a do loop:

 

count=0;
array diagGrp(*) DIAG1-DIAG63;
do i=1 to 63;
	if not missing(diagGrp(i)) then
	do;
		count+1;
	end;
end;

Thank you,

Lon

Reeza
Super User

Sounds like you want a missing report. I would recommend a different approach entirely - use PROC FREQ to summarize data for each variable into missing and non missing. This is a fully worked example that generates a 'pretty table'. 

 

I suggest running it twice, once as is to see the results and if it's what you want. If so, then change the parameters (in red) to your input data set and output data set and run everything below that line again to get the desired results for your table. 

 

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &INPUT_DSN.";

proc print data=&output_dsn. noobs label;
run;
lberger
Fluorite | Level 6

Hello Reeza,

 

I admire the work done here. I only need to create a column that contains the number of non missing values so I can get a mean of those counts. I will note this for future use. I ended up using a do loop as posted in the other responses.

 

Thank you,

Lon

Reeza
Super User

You don't need a loop for that then, use the CMISS function to count missing records. Use DIM to get the dimension/number of variables. 

 


array diagGrp(*) DIAG1-DIAG63;

non_missing = dim(diagGrp) - cmiss(of diagGrp(*));

@lberger wrote:

Hello Reeza,

 

I admire the work done here. I only need to create a column that contains the number of non missing values so I can get a mean of those counts. I will note this for future use. I ended up using a do loop as posted in the other responses.

 

Thank you,

Lon


 

lberger
Fluorite | Level 6

It appears there is no function or variable in SAS to tell which column is currently being read in the data step. You can tell the row number with _N_ but not the column. n(of ) as stated by Tom in this thread is nice, but seems to have no character equal. Bizarre. So, if you know what the names of the columns are you can count observations with this:

 

count=0;
array diagGrp(*) DIAG1-DIAG63;
do i=1 to 63;
	if not missing(diagGrp(i)) then
	do;
		count+1;
	end;
end;

No fancy code. Just a basic loop that does nothing other than what is stated.  

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3430 views
  • 1 like
  • 4 in conversation