I'm trying to calculate the number of present values (represented by variables starting with "m") that are identified within each comment.
Each comment can be identified by 3 variables: the comment itself (comment), the physician id (phys_id) and date (comment_date). The subsequent columns are values that define a medical expert, it's a dummy variable that confirms if the comment suggest such variable is practiced by the physician in question.
Currently this is how my code looks, considering the description above :
To create a count code, I've included the following:
data med_count;
set med_fnl;
by phys_id;
med_cnt = 0;
array _n(*) _numeric_;
do i = 1 to dim (_n);
if _n(i) = 1 then med_cnt + 1; end; run;
I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?
But also, within the same physician (for example; phys_id = 'ASMTO-01') one line has an accurate count of 2 (where 2 columns of medical expert was marked) and the other has an inaccurate count of 2 (where only one column of medical expert was marked). This example is depicted below with a screenshot image of what the table shows. Please note both highlighted rows have a med_cnt = 2.
I don't know if an array is necessary. You can try this:
data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;
data med_count;
set med_fnl;
by phys_id; /* actually don't need this */
med_cnt2 = sum(of m:);
run;
It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.
Edit to include my PROC PRINT output.
The SAS System
Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2 1 x ASMTO-01 01-12-2011 1 . 1 . . . 2 2 y ASMTO-01 06-15-2017 1 . 1 . . . 2 3 z ASMTO-01 04-19-2012 1 . . . . . 1
I don't know if an array is necessary. You can try this:
data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;
data med_count;
set med_fnl;
by phys_id; /* actually don't need this */
med_cnt2 = sum(of m:);
run;
It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.
Edit to include my PROC PRINT output.
The SAS System
Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2 1 x ASMTO-01 01-12-2011 1 . 1 . . . 2 2 y ASMTO-01 06-15-2017 1 . 1 . . . 2 3 z ASMTO-01 04-19-2012 1 . . . . . 1
Usually DATE is a numeric variable. I suppose in your case data is a string otherwise the summary will have 5 digits.
Is there any other numeric variable except those wanted variables :
Do you want to sum sum only those variables where value=1 or any value of those variables?
I ran your code on a dummy test data in two different methods and got the same summary:
data have;
input id $ m1 m2 m3;
cards;
A 1 . 1
B 1 1 1
C . 1 1
; run;
data sum;
set have;
array _n {*} _numeric_;
sum=0;
do i=1 to dim(_n);
if _n(i) = 1 then sum+1;
end;
drop i;
run;
data sum1;
set have;
sum = sum(of m:);
run;
Yes, I was only looking at when the value=1 (the only option was 1 or . for this dataset).
However, I'm curious how would the code change if it was to look specifically at 1's.
(i.e. to identify columns that specifically state 1 should other columns also include other numbers as shown below)
data have; input id $ m1 m2 m3 m4 m5; cards; A 1 . 1 2 . B 1 1 1 2 1 C . 1 1 2 2 ; run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.