BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
deengyn
Obsidian | Level 7

I'm trying to calculate the number of present values (represented by variables starting with "m") that are identified within each comment. 

Each comment can be identified by 3 variables: the comment itself (comment), the physician id (phys_id) and date (comment_date). The subsequent columns are values that define a medical expert, it's a dummy variable that confirms if the comment suggest such variable is practiced by the physician in question.  

Currently this is how my code looks, considering the description above : 

Picture.JPG

 

To create a count code, I've included the following: 

data med_count; 
set med_fnl; 
by phys_id; 
med_cnt = 0; 
array _n(*) _numeric_; 
do i = 1 to dim (_n); 
if _n(i) = 1 then med_cnt + 1; end; run; 

I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?

 

But also, within the same physician (for example; phys_id = 'ASMTO-01') one line has an accurate count of 2 (where 2 columns of medical expert was marked) and the other has an inaccurate count of 2 (where only one column of medical expert was marked). This example is depicted below with a screenshot image of what the table shows. Please note both highlighted rows have a med_cnt = 2. 
Capture.JPG

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

I don't know if an array is necessary. You can try this:

 

data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;

data med_count; 
set med_fnl; 
by phys_id; /* actually don't need this */
	med_cnt2 = sum(of m:);
run; 

It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.

 

Edit to include my PROC PRINT output.

The SAS System

Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2
1 x ASMTO-01 01-12-2011 1 . 1 . . . 2
2 y ASMTO-01 06-15-2017 1 . 1 . . . 2
3 z ASMTO-01 04-19-2012 1 . . . . . 1

View solution in original post

5 REPLIES 5
maguiremq
SAS Super FREQ

I don't know if an array is necessary. You can try this:

 

data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;

data med_count; 
set med_fnl; 
by phys_id; /* actually don't need this */
	med_cnt2 = sum(of m:);
run; 

It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.

 

Edit to include my PROC PRINT output.

The SAS System

Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2
1 x ASMTO-01 01-12-2011 1 . 1 . . . 2
2 y ASMTO-01 06-15-2017 1 . 1 . . . 2
3 z ASMTO-01 04-19-2012 1 . . . . . 1
Shmuel
Garnet | Level 18

Usually DATE is a numeric variable. I suppose in your case data is a string otherwise the summary will have 5 digits.

 

Is there any other numeric variable except those wanted variables :

 

Do you want to sum sum only those variables where value=1 or any value of those variables?

 

I ran your code on a dummy test data in two different methods and got the same summary:

data have;
  input id $ m1 m2 m3;
cards;
A 1 . 1
B 1 1 1 
C . 1 1
; run;
data sum;
 set have;
     array _n {*} _numeric_;
     sum=0;
     do i=1 to dim(_n);
        if _n(i) = 1 then sum+1;
     end;
     drop i;
run;

data sum1;
 set have;
     sum = sum(of m:);
run;
deengyn
Obsidian | Level 7

Yes, I was only looking at when the value=1 (the only option was 1 or . for this dataset).

 

However, I'm curious how would the code change if it was to look specifically at 1's.

(i.e. to identify columns that specifically state 1 should other columns also include other numbers as shown below)

data have;
  input id $ m1 m2 m3 m4 m5;
cards;
A 1 . 1 2 .
B 1 1 1 2 1 
C . 1 1 2 2
; run;

 

 

Reeza
Super User
I don't think that code is doing what you think it's doing.
>I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?
Physician ID as shown would be a character variable and wouldn't be included in the counts.

Additionally, you're checking for the exact value of 1, not if a 1 is contained in the cell.

You can simplify your code as others have shown by simplying adding the values in the M columns.

Count_M = sum(of m:); *will include all variables that start with M;
count_m = sum(of m1-m26); *if you know the indices you can simplify a bit by listing them.
deengyn
Obsidian | Level 7
Yes, I found it weird too.
I was considering it could be the phys_id, but it's very inconsistent.

I've tried the "Count_M = sum(of m:)" code and it works, thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 607 views
  • 2 likes
  • 4 in conversation