I'm trying to calculate the number of present values (represented by variables starting with "m") that are identified within each comment.
Each comment can be identified by 3 variables: the comment itself (comment), the physician id (phys_id) and date (comment_date). The subsequent columns are values that define a medical expert, it's a dummy variable that confirms if the comment suggest such variable is practiced by the physician in question.
Currently this is how my code looks, considering the description above :
To create a count code, I've included the following:
data med_count;
set med_fnl;
by phys_id;
med_cnt = 0;
array _n(*) _numeric_;
do i = 1 to dim (_n);
if _n(i) = 1 then med_cnt + 1; end; run;
I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?
But also, within the same physician (for example; phys_id = 'ASMTO-01') one line has an accurate count of 2 (where 2 columns of medical expert was marked) and the other has an inaccurate count of 2 (where only one column of medical expert was marked). This example is depicted below with a screenshot image of what the table shows. Please note both highlighted rows have a med_cnt = 2.
I don't know if an array is necessary. You can try this:
data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;
data med_count;
set med_fnl;
by phys_id; /* actually don't need this */
med_cnt2 = sum(of m:);
run;
It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.
Edit to include my PROC PRINT output.
The SAS System
Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2 1 x ASMTO-01 01-12-2011 1 . 1 . . . 2 2 y ASMTO-01 06-15-2017 1 . 1 . . . 2 3 z ASMTO-01 04-19-2012 1 . . . . . 1
I don't know if an array is necessary. You can try this:
data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;
data med_count;
set med_fnl;
by phys_id; /* actually don't need this */
med_cnt2 = sum(of m:);
run;
It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.
Edit to include my PROC PRINT output.
The SAS System
Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2 1 x ASMTO-01 01-12-2011 1 . 1 . . . 2 2 y ASMTO-01 06-15-2017 1 . 1 . . . 2 3 z ASMTO-01 04-19-2012 1 . . . . . 1
Usually DATE is a numeric variable. I suppose in your case data is a string otherwise the summary will have 5 digits.
Is there any other numeric variable except those wanted variables :
Do you want to sum sum only those variables where value=1 or any value of those variables?
I ran your code on a dummy test data in two different methods and got the same summary:
data have;
input id $ m1 m2 m3;
cards;
A 1 . 1
B 1 1 1
C . 1 1
; run;
data sum;
set have;
array _n {*} _numeric_;
sum=0;
do i=1 to dim(_n);
if _n(i) = 1 then sum+1;
end;
drop i;
run;
data sum1;
set have;
sum = sum(of m:);
run;
Yes, I was only looking at when the value=1 (the only option was 1 or . for this dataset).
However, I'm curious how would the code change if it was to look specifically at 1's.
(i.e. to identify columns that specifically state 1 should other columns also include other numbers as shown below)
data have; input id $ m1 m2 m3 m4 m5; cards; A 1 . 1 2 . B 1 1 1 2 1 C . 1 1 2 2 ; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.