I have 923 variables in my dataset and 6345 observations.
For each observation, I need to calculate the number of times any of the 923 variables has a value of "9".
To help illustrate what I need to accomplish, the example below is simplified with just a small handful of observations and a small handful of variables. As you can see, some of the variables allow string text values.
In observation #3219, there are 6 variables for which the value is "9".
In observation #3220, there are 5 variables for which the value is "9".
And so forth for all 923 observations.
I need to be able to perform this calculation in SAS. I already know how to perform this calculation in MSExcel quickly and easily using "COUNTIF" but I am not aware of something that could accomplish this in SAS. Thank you in advance for your kind assistance.
Please respond if you know how to perform this calculation in SAS.
Assume you have both numeric and character variables:
data have;
input var1-var5 (var6-var9)($);
cards;
1 2 3 4 9 a b c d
9 1 2 3 4 9 9 e d
9 9 9 1 9 9 9 9 d
;
data want(drop=i);
set have;
n9=0; c9=0;
array _n(*) _numeric_;
array _c(*) _character_;
do i=1 to dim(_n);
if _n(i)=9 then n9+1;
end;
do i=1 to dim(_c);
if _c(i)='9' then c9+1;
end;
number9=n9+c9;
run;
proc print;run;
Obs var1 var2 var3 var4 var5 var6 var7 var8 var9 n9 c9 number9
1 1 2 3 4 9 a b c d 1 0 1
2 9 1 2 3 4 9 9 e d 1 2 3
3 9 9 9 1 9 9 9 9 d 4 3 7
Linlin
Assume you have both numeric and character variables:
data have;
input var1-var5 (var6-var9)($);
cards;
1 2 3 4 9 a b c d
9 1 2 3 4 9 9 e d
9 9 9 1 9 9 9 9 d
;
data want(drop=i);
set have;
n9=0; c9=0;
array _n(*) _numeric_;
array _c(*) _character_;
do i=1 to dim(_n);
if _n(i)=9 then n9+1;
end;
do i=1 to dim(_c);
if _c(i)='9' then c9+1;
end;
number9=n9+c9;
run;
proc print;run;
Obs var1 var2 var3 var4 var5 var6 var7 var8 var9 n9 c9 number9
1 1 2 3 4 9 a b c d 1 0 1
2 9 1 2 3 4 9 9 e d 1 2 3
3 9 9 9 1 9 9 9 9 d 4 3 7
Linlin
hi ... using Linlin's data (assuming all you have are single digit numbers) ... you have a choice (_numeric_, _character_, _all_) ...
data want;
set have;
num9 = countc(catt(of _numeric_),'9');
char9 = countc(catt(of _character_),'9');
all9 = countc(catt(of _all_),'9')
run;
num9 char9 all9
1 0 1
1 2 3
4 3 7
if all you have are numbers, just do the _numeric_ part
Message was edited by: Mike Zdeb ... added "of _all_"
Thank you, Mike. Please correct me if I am wrong, but I think all9=countc(catt(of _all_),'9') would give me a total of all the 9's that appear in the entire observation and not the number of variables that contain just the number 9 as the value of the variable, yes? If I am not mistaken, it would treat numbers like 19, 29, 39, 49, 59, 69, 79, and 89 as instances where there is one 9. And it would treat values like 199, 299, 399 and 499 as instances where there are two 9's.
When I tried your code, it told me there were 2043 instances of the number 9 in the first observation of my dataset, despite the fact that my dataset only has 923 variables. So that tells me it is counting the number of instances of the number 9 within each variable value.
I am trying Lin Lin's code but am having some questions about the variable names. I will post a question about that on her post.
hi ... the original posting showed single characters and that's when COUNTC works
you already have some replies that work (like CATS and SUM in SQL), here's another that does work with COUNTC in a data step
proc format;
value nn 9 ='1' other='0';
value $n '9'='1' other='0';
run;
filename x dummy;
data want;
set have;
file x;
put (_numeric_) (nn.) (_character_) ($n.) @;
nines = countc(_file_,'1');
put;
run;
Thank you, LinLin. This worked. I needed to study your code, but after I did and I understood what each part means I was able to modify it slightly so it fulfills my needs, with my real working database.
If you don't mind using proc sql, here is another way: (data borrowed from linlin)
data have;
input var1-var5 (var6-var9)($);
cards;
1 2 3 4 9 a b c d
9 1 2 3 4 9 9 e d
9 9 9 1 9 9 9 9 d
;
proc sql;
select cats('cats(',name,')','=','"9"') into :lst separated by ',' from dictionary.columns where libname='WORK' AND MEMNAME='HAVE';
select sum(&lst) as ct from have;
quit;
Haikuo
You can use function COUNTW() and don't forget to make this variable long enough to hold the whole line.
data want;
set have;
all9 = countw(catx(' ',of _all_),'9') ;
run;
Ksharp
hi ... you can do that, but it doesn't work
COUNTW counts the number of words, not the occurrences of specified words (or characters)
the '9' in the COUNTW function is used as a word delimiter, not as a search string
data have;
input a b c d e;
datalines;
1 2 3 4 5
9 8 9 8 9
9 9 9 9 9
99 9 9999 9 9 9
89 89 89 89 89
;
data want;
set have;
all9 = countw(catx(' ',of _all_),'9') ;
run;
a b c d e all9
1 2 3 4 5 1
9 8 9 8 9 3
9 9 9 9 9 5
99 9 9999 9 9 5
89 89 89 89 89 6
It is obviously that I make a mistake here.
NOT testing my code.
Thanks. Mike
Ksharp
A remedy for my mistake.
data have; input a b c d e; datalines; 1 2 3 4 5 9 8 9 8 9 9 9 9 9 9 99 9 9999 9 9 9 89 89 89 89 89 ; run; data want; set have; all9 = countc(prxchange('s/\S{2,}//o',-1,catx(' ',of _all_)),'9') ; run;
Ksharp
Message was edited by: xia keshan Make code more robust.
hi ... nice remedy
Your idea works great, but I need to do the same with two character criteria ('49'). I´m a new SAS user, and I tried your syntax and it had funny results... ¿could you please explain me why this happened?
Thanks in advance,
Andrea
COUNTC() only take care of one character . I don't understand what is two character criteria ('49') ?
Do you mean a string '49' or two single character '4' or '9' ?
You'd better post an example and output to clarify your problem.
Ksharp
Sorry. The criteria I needed was an specific number with two digits (49), and what I really needed is to know whether there is any of the expected number in any of a set of variables. I just finded out that I can do this with an array, works really good as long as each variable in the array has the same format (in my case, numeric). This is an example:
data want; set have;
array test{*}p_5_1--p_5_10;
do i=1 to dim(test);
if test(i)= 1 then ind1 = 1;
else if test(i)= 2 then Ind2 = 1;
else if test(i)= 3 then Ind3 = 1;
etc...
else if test(i)= 68 then Ind68 = 1;
end; drop i; run;
these are things you find yourself doing when someone has become reaaaally original when generating a data base... I´m trying to stay positive about this...
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.