Creating a new data set with arithmetic equations

RebeccaB_ · Posted 01-18-2021 11:22 AM

Hello,

I'm trying to create separate data sets (for each quarter we gather data) that would output 1) the total # of uncontrolled patients with diabetes (numerator), 2) the total "well controlled" and "controlled" patients (these would be added together for our denominator), 3) the proportion of patients that are uncontrolled compared to controlled/well-controlled.

[uncntrolled/(well controlled +controlled)]*100

CODE:

MACRO fin(file);

DATA work.&file._dm4;
SET work.&file._dm3;
IF (a1c_result<= 8) THEN DM_control = 1;
IF (8< a1c_result <= 9) THEN DM_control = 2;
IF (a1c_result > 9) OR (a1c_result = .) THEN DM_control = 0;
run;
data work.&file._dm5;
set work.&file._dm4;
if dm_control=1 then well_cntrl=1; else well_cntrl=0;
if dm_control=2 then cntrl=1; else cntrl=0;
if dm_control=0 then uncntrl=1; else uncntrl=0;
RUN;

PROC SUMMARY data=work.&file._dm6;
class well_cntrl cntrl uncntrl;
output out=work.&file._summary sum=sum_wellcntrl sum_cntrl sum_unctrl;
final_freq= (uncntrl/(well_cntrl+cntrl)*100
run;

data=final.&file._sum_final;
set work.&file._summary; 
controlled_DM= (sum_wellcntrl + sum_cntrl); 
keep sum_uncntrl controlled_DM final_freq;
run;


%MEND fin;

%fin(Baseline);
%fin(Q1_2019);
%fin(Q2_2019);
%fin(Q3_2019);
%fin(Q4_2019);
%fin(Q1_2020);
%fin(Q2_2020);

Reeza · Posted 01-18-2021 11:28 AM

What is your question, or what part do you need help with? Does your code work? If not, does it generate an error?

@RebeccaB_ wrote:

Hello,

I'm trying to create separate data sets (for each quarter we gather data) that would output 1) the total # of uncontrolled patients with diabetes (numerator), 2) the total "well controlled" and "controlled" patients (these would be added together for our denominator), 3) the proportion of patients that are uncontrolled compared to controlled/well-controlled.

[uncntrolled/(well controlled +controlled)]*100

CODE:
MACRO fin(file);

DATA work.&file._dm4;
SET work.&file._dm3;
IF (a1c_result<= 8) THEN DM_control = 1;
IF (8< a1c_result <= 9) THEN DM_control = 2;
IF (a1c_result > 9) OR (a1c_result = .) THEN DM_control = 0;
run;
data work.&file._dm5;
set work.&file._dm4;
if dm_control=1 then well_cntrl=1; else well_cntrl=0;
if dm_control=2 then cntrl=1; else cntrl=0;
if dm_control=0 then uncntrl=1; else uncntrl=0;
RUN;

PROC SUMMARY data=work.&file._dm6;
class well_cntrl cntrl uncntrl;
output out=work.&file._summary sum=sum_wellcntrl sum_cntrl sum_unctrl;
final_freq= (uncntrl/(well_cntrl+cntrl)*100
run;

data=final.&file._sum_final;
set work.&file._summary; 
controlled_DM= (sum_wellcntrl + sum_cntrl); 
keep sum_uncntrl controlled_DM final_freq;
run;


%MEND fin;

%fin(Baseline);
%fin(Q1_2019);
%fin(Q2_2019);
%fin(Q3_2019);
%fin(Q4_2019);
%fin(Q1_2020);
%fin(Q2_2020);

RebeccaB_ · Posted 01-18-2021 11:34 AM

What I'm trying to determine is how to create a final summary table for the Baseline, Q1_2019, Q2_2019, Q3_2019, Q4_2019, Q1_2020, Q2_2020. I've received some errors which is located below in my log.

NOTE: There were 38424 observations read from the data set WORK.BASELINE_DM2.
NOTE: The data set WORK.BASELINE_DM2 has 38424 observations and 78 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.11 seconds
      cpu time            0.09 seconds



NOTE: There were 38424 observations read from the data set WORK.BASELINE_DM2.
NOTE: The data set WORK.BASELINE_DM3 has 7403 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds


1004  %final(Q1_2019);

NOTE: There were 102559 observations read from the data set WORK.Q1_2019_F.
NOTE: The data set WORK.Q1_2019_DM has 11026 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.06 seconds



NOTE: There were 11026 observations read from the data set WORK.Q1_2019_DM.
NOTE: The data set WORK.Q1_2019_DM2 has 10934 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.00 seconds



NOTE: There were 10934 observations read from the data set WORK.Q1_2019_DM2.
NOTE: The data set WORK.Q1_2019_DM2 has 10934 observations and 78 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.04 seconds
      cpu time            0.04 seconds



NOTE: There were 10934 observations read from the data set WORK.Q1_2019_DM2.
NOTE: The data set WORK.Q1_2019_DM3 has 5162 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds


1005  %final(Q2_2019);

NOTE: There were 104559 observations read from the data set WORK.Q2_2019_F.
NOTE: The data set WORK.Q2_2019_DM has 11136 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.07 seconds



NOTE: There were 11136 observations read from the data set WORK.Q2_2019_DM.
NOTE: The data set WORK.Q2_2019_DM2 has 10977 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.01 seconds



NOTE: There were 10977 observations read from the data set WORK.Q2_2019_DM2.
NOTE: The data set WORK.Q2_2019_DM2 has 10977 observations and 78 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds



NOTE: There were 10977 observations read from the data set WORK.Q2_2019_DM2.
NOTE: The data set WORK.Q2_2019_DM3 has 5262 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds


1006  %final(Q3_2019);

NOTE: There were 101742 observations read from the data set WORK.Q3_2019_F.
NOTE: The data set WORK.Q3_2019_DM has 10721 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.06 seconds



NOTE: There were 10721 observations read from the data set WORK.Q3_2019_DM.
NOTE: The data set WORK.Q3_2019_DM2 has 10577 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds



NOTE: There were 10577 observations read from the data set WORK.Q3_2019_DM2.
NOTE: The data set WORK.Q3_2019_DM2 has 10577 observations and 78 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds



NOTE: There were 10577 observations read from the data set WORK.Q3_2019_DM2.
NOTE: The data set WORK.Q3_2019_DM3 has 5213 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


1007  %final(Q4_2019);

NOTE: There were 97979 observations read from the data set WORK.Q4_2019_F.
NOTE: The data set WORK.Q4_2019_DM has 10062 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.09 seconds
      cpu time            0.06 seconds



NOTE: There were 10062 observations read from the data set WORK.Q4_2019_DM.
NOTE: The data set WORK.Q4_2019_DM2 has 9945 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds



NOTE: There were 9945 observations read from the data set WORK.Q4_2019_DM2.
NOTE: The data set WORK.Q4_2019_DM2 has 9945 observations and 78 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds



NOTE: There were 9945 observations read from the data set WORK.Q4_2019_DM2.
NOTE: The data set WORK.Q4_2019_DM3 has 5159 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds


1008  %final(Q1_2020);

NOTE: There were 93023 observations read from the data set WORK.Q1_2020_F.
NOTE: The data set WORK.Q1_2020_DM has 9773 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.07 seconds



NOTE: There were 9773 observations read from the data set WORK.Q1_2020_DM.
NOTE: The data set WORK.Q1_2020_DM2 has 9637 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds



NOTE: There were 9637 observations read from the data set WORK.Q1_2020_DM2.
NOTE: The data set WORK.Q1_2020_DM2 has 9637 observations and 78 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds



NOTE: There were 9637 observations read from the data set WORK.Q1_2020_DM2.
NOTE: The data set WORK.Q1_2020_DM3 has 5047 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


1009  %final(Q2_2020);

NOTE: There were 77667 observations read from the data set WORK.Q2_2020_F.
NOTE: The data set WORK.Q2_2020_DM has 9101 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.06 seconds



NOTE: There were 9101 observations read from the data set WORK.Q2_2020_DM.
NOTE: The data set WORK.Q2_2020_DM2 has 8985 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds



NOTE: There were 8985 observations read from the data set WORK.Q2_2020_DM2.
NOTE: The data set WORK.Q2_2020_DM2 has 8985 observations and 78 variables.
NOTE: PROCEDURE SORT used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds



NOTE: There were 8985 observations read from the data set WORK.Q2_2020_DM2.
NOTE: The data set WORK.Q2_2020_DM3 has 4389 observations and 78 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds


1010
1011  /*CONTROL BY TIMEFRAME*/
1012  %MACRO fin(file);
1013
1014  DATA work.&file._dm4;
1015      SET work.&file._dm3;
1016      IF (a1c_result<= 8)  THEN DM_control = 1;
1017      IF (8< a1c_result <= 9) THEN DM_control = 2;
1018      IF (a1c_result > 9) OR (a1c_result = .) THEN DM_control = 0;
1019  run;
1020  data work.&file._dm5;
1021      set work.&file._dm4;
1022      if dm_control=1 then well_cntrl=1; else well_cntrl=0;
1023      if dm_control=2 then cntrl=1; else cntrl=0;
1024      if dm_control=0 then uncntrl=1; else uncntrl=0;
1025  RUN;
1026
1027  Data work.&file._dm6;
1028  set work.&file._dm5;
1029  if well_cntrl=1 then DM_cat= "well controlled";
1030  if cntrl=1 then DM_cat= "controlled";
1031  if uncntrl=1 then DM_cat= "uncontrolled";
1032  run;
1033
1034  PROC SUMMARY data=work.&file._dm6;
1035  class well_cntrl cntrl uncntrl;
1036  output out=work.&file._summary sum=sum_wellcntrl sum_cntrl sum_unctrl;
1037  final_freq= (uncntrl/(well_cntrl+cntrl)*100
1038  run;
1039
1040  data=final.&file._sum_final;
1041  set work.&file._summary;
1042  controlled_DM= (sum_wellcntrl + sum_cntrl);
1043  keep sum_uncntrl controlled_DM final_freq;
1044  run;
1045
1046
1047  /*data work.&file._dm7;
1048  set work.&file._dm6;
1049  sum(uncntrl/ (well_cntrl + cntrl))*100;
1050  run;*/
1051
1052  /*talk to vicki about SQL code
1053  PROC SQL;_
1054      CREATE TABLE final.&file. AS
1055      SELECT uncntrl, Total, (uncntrl/ Total * 100) AS NQF0018
1056      FROM (
1057          SELECT SUM(uncntrl) AS uncontrolled, COUNT(patient_ID) AS Total
1058          FROM work.&file._dm5
1059          )
1060          ;
1061      QUIT;
1062  RUN;*/
1063
1064  %MEND fin;
1065
1066  %fin(Baseline);

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      4384:58    4384:107   4384:153   4384:173
NOTE: Invalid numeric data, a1c_result='HEALTH_MAINTENANCE_COMPLETED' , at line 4384 column 58.

Reeza · Posted 01-18-2021 11:55 AM

Your code appears to be treating a1c_result as a numeric variable, but it isn't a numeric variable, it's a character viariable. One of the values is 'HEALTH_MAINTENANCE_COMPLETED' which is causing issues because your code isn't designed to handle the text values.

It looks like your first data step is the issue. FYI - you may want to look up CALL EXECUTE which can simplify the ability to call your macro multiple times.

The first step in converting a program to a macro is to have working non macro code. Do have a non macro version of this code that works?

DATA work.&file._dm4;
SET work.&file._dm3;
IF (a1c_result<= 😎 THEN DM_control = 1;
IF (8< a1c_result <= 9) THEN DM_control = 2;
IF (a1c_result > 9) OR (a1c_result = .) THEN DM_control = 0;
run;

Reeza · Posted 01-18-2021 11:58 AM

Is the information you posted there real data? If so, you may want to delete that post ASAP since it's showing private health information of individuals.

EDIT: It looks like real data to me, so deleting the problematic sections.

Reeza · Posted 01-18-2021 12:08 PM

You cannot nest calculations within the output the way you were trying with PROC FREQ, that isn't valid SAS code and should generate an error.

Issues I see so far with your code:

Cannot use data step logic within the SUMMARY PROC
Missing semicolon (final_freq line in PROC SUMMARY)
Missing closing parenthesis (final_freq line in PROC SUMMARY)
= sign in last data step (?)

These are issues with the code, but there could also be issues with the logic and that I can't know without more details.'

And you really should use a different macro variable name rather than FILE. It's not going to complain but you can run into weird issues when using a Reserved Key Word, which I did indicate a prior post.


PROC SUMMARY data=work.&file._dm6;
....
final_freq= (uncntrl/(well_cntrl+cntrl)*100
...
run;

Instead, move it to your next data step at the top of your code.

data final.&file._sum_final;
set work.&file._summary; 
final_freq= (uncntrl/(sum(well_cntrl, cntrl)))*100
controlled_DM= sum(sum_wellcntrl, sum_cntrl); 
keep sum_uncntrl controlled_DM final_freq;
run;

Personally, I'd also be changing the structure of the names of these files to the following:

data final.final_sum_&file.;

This way, if you want to combine them together in the future it's very easy, as SAS allows you to reference data sets with a prefix shortcut, but not a suffix one.

I'm not sure why you're not just using a single PROC FREQ with a format here either - this could all be simplified using that approach, but you'd have to provide sample data and expected output if you wanted a better suggestion.

Good Luck.

Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Registration is open

Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Re: Creating a new data set with arithmetic equations

Registration is open

SAS Training: Just a Click Away