Solved: Error with Horizontal Count Variable

deengyn · Posted 02-08-2021 02:36 PM

I'm trying to calculate the number of present values (represented by variables starting with "m") that are identified within each comment.

Each comment can be identified by 3 variables: the comment itself (comment), the physician id (phys_id) and date (comment_date). The subsequent columns are values that define a medical expert, it's a dummy variable that confirms if the comment suggest such variable is practiced by the physician in question.

Currently this is how my code looks, considering the description above :

To create a count code, I've included the following:

data med_count; 
set med_fnl; 
by phys_id; 
med_cnt = 0; 
array _n(*) _numeric_; 
do i = 1 to dim (_n); 
if _n(i) = 1 then med_cnt + 1; end; run;

I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?

But also, within the same physician (for example; phys_id = 'ASMTO-01') one line has an accurate count of 2 (where 2 columns of medical expert was marked) and the other has an inaccurate count of 2 (where only one column of medical expert was marked). This example is depicted below with a screenshot image of what the table shows. Please note both highlighted rows have a med_cnt = 2.

maguiremq · Posted 02-08-2021 03:07 PM

I don't know if an array is necessary. You can try this:

data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;

data med_count; 
set med_fnl; 
by phys_id; /* actually don't need this */
	med_cnt2 = sum(of m:);
run;

It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.

Edit to include my PROC PRINT output.

The SAS System

Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2

1 x ASMTO-01 01-12-2011 1 . 1 . . . 2

2 y ASMTO-01 06-15-2017 1 . 1 . . . 2

3 z ASMTO-01 04-19-2012 1 . . . . . 1

View solution in original post

maguiremq · Posted 02-08-2021 03:07 PM

I don't know if an array is necessary. You can try this:

data med_fnl;
input comment $1. phys_id $9. comment_date $11. m11 m12 m13 m14 m15 m16 3.;
datalines;
x ASMTO-01 01-12-2011 1 . 1 . .
y ASMTO-01 06-15-2017 1 . 1 . .
z ASMTO-01 04-19-2012 1 . . . .
;
run;

data med_count; 
set med_fnl; 
by phys_id; /* actually don't need this */
	med_cnt2 = sum(of m:);
run;

It gets the response I think you need, but I only had a limited number of observations that I could replicate in a DATALINES statement.

Edit to include my PROC PRINT output.

The SAS System

Obs comment phys_id comment_date m11 m12 m13 m14 m15 m16 med_cnt2

1 x ASMTO-01 01-12-2011 1 . 1 . . . 2

2 y ASMTO-01 06-15-2017 1 . 1 . . . 2

3 z ASMTO-01 04-19-2012 1 . . . . . 1

Shmuel · Posted 02-08-2021 03:16 PM

Usually DATE is a numeric variable. I suppose in your case data is a string otherwise the summary will have 5 digits.

Is there any other numeric variable except those wanted variables :

Do you want to sum sum only those variables where value=1 or any value of those variables?

I ran your code on a dummy test data in two different methods and got the same summary:

data have;
  input id $ m1 m2 m3;
cards;
A 1 . 1
B 1 1 1 
C . 1 1
; run;
data sum;
 set have;
     array _n {*} _numeric_;
     sum=0;
     do i=1 to dim(_n);
        if _n(i) = 1 then sum+1;
     end;
     drop i;
run;

data sum1;
 set have;
     sum = sum(of m:);
run;

deengyn · Posted 02-09-2021 02:58 PM

Yes, I was only looking at when the value=1 (the only option was 1 or . for this dataset).

However, I'm curious how would the code change if it was to look specifically at 1's.

(i.e. to identify columns that specifically state 1 should other columns also include other numbers as shown below)

data have;
  input id $ m1 m2 m3 m4 m5;
cards;
A 1 . 1 2 .
B 1 1 1 2 1 
C . 1 1 2 2
; run;

Reeza · Posted 02-08-2021 05:15 PM

I don't think that code is doing what you think it's doing.
>I found it interesting because in some cases the count variable is accurate and other times it is not. Is it because some of the PHYS_ID's include a 1?
Physician ID as shown would be a character variable and wouldn't be included in the counts.

Additionally, you're checking for the exact value of 1, not if a 1 is contained in the cell.

You can simplify your code as others have shown by simplying adding the values in the M columns.

Count_M = sum(of m:); *will include all variables that start with M;
count_m = sum(of m1-m26); *if you know the indices you can simplify a bit by listing them.

deengyn · Posted 02-08-2021 05:43 PM

Yes, I found it weird too.
I was considering it could be the phys_id, but it's very inconsistent.

I've tried the "Count_M = sum(of m:)" code and it works, thank you!

Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Registration is open

Obs	comment	phys_id	comment_date	m11	m12	m13	m14	m15	m16	med_cnt2
1	x	ASMTO-01	01-12-2011	1	.	1	.	.	.	2
2	y	ASMTO-01	06-15-2017	1	.	1	.	.	.	2
3	z	ASMTO-01	04-19-2012	1	.	.	.	.	.	1

Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Re: Error with Horizontal Count Variable

Registration is open

SAS Training: Just a Click Away