BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
deengyn
Obsidian | Level 7

I'm trying to calculate the number of present values (represented by variables starting with "m") that are identified within each comment. 

Each comment can be identified by 3 variables: the comment itself (comment), the physician id (phys_id) and date (comment_date). The subsequent columns are values that define a medical expert, it's a dummy variable that confirms if the comment suggest such variable is practiced by the physician in question.  

Currently this is how my code looks, considering the description above : 

Picture.JPG

 

To create a count code, I've included the following: 

data med_count; 
set med_fnl; 
by phys_id; 
med_cnt = 0; 
array _n(*) _numeric_; 
do i = 1 to dim (_n); 
if _n(i) = 1 then med_cnt + 1; end; run; 

I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?

 

But also, within the same physician (for example; phys_id = 'ASMTO-01') one line has an accurate count of 2 (where 2 columns of medical expert was marked) and the other has an inaccurate count of 2 (where only one column of medical expert was marked). This example is depicted below with a screenshot image of what the table shows. Please note both highlighted rows have a med_cnt = 2. 
Capture.JPG

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

I don't know if an array is necessary. You can try this:

 

data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;

data med_count; 
set med_fnl; 
by phys_id; /* actually don't need this */
	med_cnt2 = sum(of m:);
run; 

It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.

 

Edit to include my PROC PRINT output.

The SAS System

Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2
1 x ASMTO-01 01-12-2011 1 . 1 . . . 2
2 y ASMTO-01 06-15-2017 1 . 1 . . . 2
3 z ASMTO-01 04-19-2012 1 . . . . . 1

View solution in original post

5 REPLIES 5
maguiremq
SAS Super FREQ

I don't know if an array is necessary. You can try this:

 

data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;

data med_count; 
set med_fnl; 
by phys_id; /* actually don't need this */
	med_cnt2 = sum(of m:);
run; 

It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.

 

Edit to include my PROC PRINT output.

The SAS System

Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2
1 x ASMTO-01 01-12-2011 1 . 1 . . . 2
2 y ASMTO-01 06-15-2017 1 . 1 . . . 2
3 z ASMTO-01 04-19-2012 1 . . . . . 1
Shmuel
Garnet | Level 18

Usually DATE is a numeric variable. I suppose in your case data is a string otherwise the summary will have 5 digits.

 

Is there any other numeric variable except those wanted variables :

 

Do you want to sum sum only those variables where value=1 or any value of those variables?

 

I ran your code on a dummy test data in two different methods and got the same summary:

data have;
  input id $ m1 m2 m3;
cards;
A 1 . 1
B 1 1 1 
C . 1 1
; run;
data sum;
 set have;
     array _n {*} _numeric_;
     sum=0;
     do i=1 to dim(_n);
        if _n(i) = 1 then sum+1;
     end;
     drop i;
run;

data sum1;
 set have;
     sum = sum(of m:);
run;
deengyn
Obsidian | Level 7

Yes, I was only looking at when the value=1 (the only option was 1 or . for this dataset).

 

However, I'm curious how would the code change if it was to look specifically at 1's.

(i.e. to identify columns that specifically state 1 should other columns also include other numbers as shown below)

data have;
  input id $ m1 m2 m3 m4 m5;
cards;
A 1 . 1 2 .
B 1 1 1 2 1 
C . 1 1 2 2
; run;

 

 

Reeza
Super User
I don't think that code is doing what you think it's doing.
>I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?
Physician ID as shown would be a character variable and wouldn't be included in the counts.

Additionally, you're checking for the exact value of 1, not if a 1 is contained in the cell.

You can simplify your code as others have shown by simplying adding the values in the M columns.

Count_M = sum(of m:); *will include all variables that start with M;
count_m = sum(of m1-m26); *if you know the indices you can simplify a bit by listing them.
deengyn
Obsidian | Level 7
Yes, I found it weird too.
I was considering it could be the phys_id, but it's very inconsistent.

I've tried the "Count_M = sum(of m:)" code and it works, thank you!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1298 views
  • 2 likes
  • 4 in conversation