Solved: Re: Calculate the frequency of percent change based on the weight.

sandyzman1 · Posted 04-15-2022 01:39 PM

Hi,

I am trying to calculate the frequency of patients who had >10% weight loss among 2 weights measured between 2-4 months before diagnosis. I have attached the pseudo-data.

Each patient has multiple visits with weight measurements. Their diagnosis of the disease is marked by "1" in the diagnosis column, and the date of encounter is the "dateoftest". I am trying to look at a 2-4 months window before the diagnosis date (diagnosis = 1), pull at least 2 weight measures, and compare if there is a >10% weight loss between those two weight measures. If >10% weight loss then flag the patients. There should be two wt measured between 2-4 months before diagnosis, If only 1 wt measure then it is marked as missing. Or if no weight is measured, it is also marked as missing.

Finally, calculate the frequency of patients with >10% weight loss and those with missing weight within 2-4 months.

I would really appreciate the help.

Thanks,
Sandyzman

mkeintz · Posted 04-15-2022 04:16 PM

I am not going to do the work of converting your XLSX file into a SAS data set, so the following code is untested:

data want (drop=nxt_: );

  merge have (keep=id dateoftest diagnosis rename=(dateoftest=diag_date) where=(diagnosis=1)
        have ;
  by id;
  set have (firstobs=2 keep=id dateoftest rename=(id=nxt_id dateoftest=nxt_date));

  retain wgt1 wgt2 .;
  if first.id then call missing(of wgt:);

  if intnx('month',diag_date,-4,'sameday') <= dateoftest <= intnx('month',diag_date,-2,'sameday');

  if wgt1=. then wgt1=weight;
  else wgt2=coalesce(weight,wgt2);

  if intck('month',nxt_date,diag_date,'c')>2;   **See editted note below**;
  if nmiss(wgt1,wgt2)=0 then wgt_change_ratio= wgt2/wgt1-1;
run;

The program assumes your data in dataset HAVE is sorted by ID/DATEOFTEST.

It merges the diagnosis date record (when diagnosis=1) will all the records for a given id. And it keeps only those whose dates fall within the 4-2 month prior window, keeping the earliest weight in the window as WGT1 and the latest within-window weight as WGT2.

Once the within-windows observations are exhausted (i.e. the next obs is outside the window), the weight change is calculated and the data is output.

Editted Note: Corrected the 4th argument to intck from 's' to 'c'. But the code is still untested.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 04-15-2022 01:59 PM

You mention "at least" two weight measure in the 4-month-prior to 2-month-prior time period. Then later, you say that there should be 2 such tests. So are you really excluding the possibiliyt of more than 2 visits in that window? If not, then the case of more than two qualifying tests, which weights should be used to measure weight loss?

Also, what is "two months before" Is April15, 2022 two months before June 10, 2022? Similarly, is April 15, 2022 more than four months before August 17, 2022?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sandyzman1 · Posted 04-15-2022 02:48 PM

Hi mkeintz,

Thanks for the reply. I meant at least 2 or more weight measures before diagnosis (1) and see if there is any of the weight change (within that window) is more than 10% wt loss. Most likely, the extreme 2 wt measure (farthest from diagnosis and closest to diagnosis) will work. or, if possible, any wt loss in between is more than 10% loss.

- The two months before June 10, 2022, include wt measure after April 10, 2022. So, April 15, 2022, is within 2 months
- April 17, 2022, and after is four months before August 17, 2022, so April 15, 2022, is more than four months.

Thanks,
Sandyzman1

mkeintz · Posted 04-15-2022 04:16 PM

I am not going to do the work of converting your XLSX file into a SAS data set, so the following code is untested:

data want (drop=nxt_: );

  merge have (keep=id dateoftest diagnosis rename=(dateoftest=diag_date) where=(diagnosis=1)
        have ;
  by id;
  set have (firstobs=2 keep=id dateoftest rename=(id=nxt_id dateoftest=nxt_date));

  retain wgt1 wgt2 .;
  if first.id then call missing(of wgt:);

  if intnx('month',diag_date,-4,'sameday') <= dateoftest <= intnx('month',diag_date,-2,'sameday');

  if wgt1=. then wgt1=weight;
  else wgt2=coalesce(weight,wgt2);

  if intck('month',nxt_date,diag_date,'c')>2;   **See editted note below**;
  if nmiss(wgt1,wgt2)=0 then wgt_change_ratio= wgt2/wgt1-1;
run;

The program assumes your data in dataset HAVE is sorted by ID/DATEOFTEST.

It merges the diagnosis date record (when diagnosis=1) will all the records for a given id. And it keeps only those whose dates fall within the 4-2 month prior window, keeping the earliest weight in the window as WGT1 and the latest within-window weight as WGT2.

Once the within-windows observations are exhausted (i.e. the next obs is outside the window), the weight change is calculated and the data is output.

Editted Note: Corrected the 4th argument to intck from 's' to 'c'. But the code is still untested.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sandyzman1 · Posted 04-16-2022 11:27 AM

Hi mkeintz,

Thank you so much for the help. I really appreciate it. However, when I imported the xlsx file into SAS and ran the code. I got the following error. I couldn't figure out what is the issue. Would you please suggest how to resolve these issues? Thanks.

NOTE: Invalid argument to function INTCK('month',18235,18270,'s') at line 84 column 6.
id=1 diag_date=01/08/2010 diagnosis=0 dateoftest=11/03/2009 weight=99 FIRST.id=0 LAST.id=0 nxt_id=1
nxt_date=12/04/2009 wgt1=99 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=2
NOTE: Invalid argument to function INTCK('month',17938,18003,'s') at line 84 column 6.
id=3 diag_date=04/16/2009 diagnosis=0 dateoftest=02/09/2009 weight=119 FIRST.id=1 LAST.id=0 nxt_id=3
nxt_date=02/10/2009 wgt1=119 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=12
NOTE: Invalid argument to function INTCK('month',18003,18003,'s') at line 84 column 6.
id=3 diag_date=04/16/2009 diagnosis=0 dateoftest=02/10/2009 weight=. FIRST.id=0 LAST.id=0 nxt_id=3
nxt_date=04/16/2009 wgt1=119 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=13
NOTE: Invalid argument to function INTCK('month',21266,21289,'s') at line 84 column 6.
id=4 diag_date=04/15/2018 diagnosis=0 dateoftest=02/10/2018 weight=. FIRST.id=1 LAST.id=0 nxt_id=4
nxt_date=03/23/2018 wgt1=. wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=18
NOTE: Invalid argument to function INTCK('month',20131,20201,'s') at line 84 column 6.
id=5 diag_date=04/23/2015 diagnosis=0 dateoftest=01/03/2015 weight=120 FIRST.id=0 LAST.id=0 nxt_id=5
nxt_date=02/12/2015 wgt1=120 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=24
NOTE: Invalid argument to function INTCK('month',20201,20201,'s') at line 84 column 6.
id=5 diag_date=04/23/2015 diagnosis=0 dateoftest=02/12/2015 weight=95 FIRST.id=0 LAST.id=0 nxt_id=5
nxt_date=04/23/2015 wgt1=120 wgt2=95 wgt_change_ratio=. _ERROR_=1 _N_=25
NOTE: Invalid argument to function INTCK('month',19401,19471,'s') at line 84 column 6.
id=6 diag_date=04/23/2013 diagnosis=0 dateoftest=01/03/2013 weight=80 FIRST.id=1 LAST.id=0 nxt_id=6
nxt_date=02/12/2013 wgt1=80 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=29
NOTE: Invalid argument to function INTCK('month',19471,19471,'s') at line 84 column 6.
id=6 diag_date=04/23/2013 diagnosis=0 dateoftest=02/12/2013 weight=79 FIRST.id=0 LAST.id=0 nxt_id=6
nxt_date=04/23/2013 wgt1=80 wgt2=79 wgt_change_ratio=. _ERROR_=1 _N_=30
NOTE: Invalid argument to function INTCK('month',21227,21297,'s') at line 84 column 6.
id=7 diag_date=04/23/2018 diagnosis=0 dateoftest=01/03/2018 weight=80 FIRST.id=1 LAST.id=0 nxt_id=7
nxt_date=02/12/2018 wgt1=80 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=35
NOTE: Invalid argument to function INTCK('month',21297,21297,'s') at line 84 column 6.
id=7 diag_date=04/23/2018 diagnosis=0 dateoftest=02/12/2018 weight=79 FIRST.id=0 LAST.id=0 nxt_id=7
nxt_date=04/23/2018 wgt1=80 wgt2=79 wgt_change_ratio=. _ERROR_=1 _N_=36
NOTE: Invalid argument to function INTCK('month',19331,19366,'s') at line 84 column 6.
id=8 diag_date=01/08/2013 diagnosis=0 dateoftest=11/03/2012 weight=99 FIRST.id=0 LAST.id=0 nxt_id=8
nxt_date=12/04/2012 wgt1=99 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=42

Best,
sandyzman1

mkeintz · Posted 04-16-2022 01:44 PM

@sandyzman1 wrote:
Hi mkeintz,

Thank you so much for the help. I really appreciate it. However, when I imported the xlsx file into SAS and ran the code. I got the following error. I couldn't figure out what is the issue. Would you please suggest how to resolve these issues? Thanks.

NOTE: Invalid argument to function INTCK('month',18235,18270,'s') at line 84 column 6.
id=1 diag_date=01/08/2010 diagnosis=0 dateoftest=11/03/2009 weight=99 FIRST.id=0 LAST.id=0 nxt_id=1
nxt_date=12/04/2009 wgt1=99 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=2
NOTE: Invalid argument to function INTCK('month',17938,18003,'s') at line 84 column 6.
id=3 diag_date=04/16/2009 diagnosis=0 dateoftest=02/09/2009 weight=119 FIRST.id=1 LAST.id=0 nxt_id=3
nxt_date=02/10/2009 wgt1=119 wgt2=. wgt_change_ratio=. _ERROR_=1 _N_=12

Best,
sandyzman1

This is one of the consequences of submitting untested code, which is my usual response in the absence of sample data in the form of a working data step.

I will edit my earlier response to change the "S" parameter in the intck function to "C". The "S" parameter serves an analogous role in the INTNX function to the "C" that I should have used in the INTCK function.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sandyzman1 · Posted 04-18-2022 11:31 AM

Hi mkeintz,

Thank you so much for your response. I understand that I should have originally attached the working dataset. I thought the file would be too long to attach. However, I am still getting the error message, and I am new to SAS so I don't totally understand what I going on. I got the following error now.

NOTE: Missing values were generated as a result of performing an operation on missing values.

Each place is given by: (Number of times) at (Line):(Column). 4 at 65:6 4 at 65:61

I have attached the SAS data file and also copied the dataline (if the file doesn't work).

data test;

infile datalines delimiter=',';

input id dateoftest :mmddyy8. diagnosis weight;

datalines;

1, 4/8/2009, 0, 98,

1, 11/3/2009, 0, 99,

1, 12/4/2009, 0, ,

1, 1/8/2010, 1, 89,

1, 3/24/2010, 0, 76,

1, 4/4/2011, 0, 75,

1, 7/14/2011, 0, 70,

2, 4/23/2016, 0, 122,

2, 5/20/2016, 0, 124,

2, 7/10/2016, 0, ,

2, 9/25/2017, 0, 120,

3, 2/9/2009, 0, 119,

3, 2/10/2009, 0, ,

3, 4/16/2009, 1, 120,

3, 5/23/2009, 0, 105,

3, 6/17/2010, 0, 98,

3, 5/12/2011, 0, 94,

4, 3/23/2018, 0, ,

4, 2/10/2018, 0, ,

4, 4/18/2018, 0, ,

4, 4/15/2018, 1, ,

4, 7/3/2019, 0, ,

5, 12/4/2014, 0, 130,

5, 1/3/2015, 0, 120,

5, 2/12/2015, 0, 95,

5, 4/23/2015, 1, 96,

5, 5/5/2015, 0, 87,

5, 8/9/2016, 0, 88,

6, 12/4/2013, 0, 98,

6, 1/3/2013, 0, 80,

6, 2/12/2013, 0, 79,

6, 4/23/2013, 1, 78,

6, 5/5/2013, 0, 75,

6, 8/9/2014, 0, 74,

7, 12/4/2018, 0, 81,

7, 1/3/2018, 0, 80,

7, 2/12/2018, 0, 79,

7, 4/23/2018, 1, 78,

7, 5/5/2018, 0, 75,

7, 8/9/2018, 0, 74,

8, 4/8/2012, 0, 140,

8, 11/3/2012, 0, 99,

8, 12/4/2012, 0, ,

8, 1/8/2013, 1, 89,

8, 3/24/2013, 0, 76,

8, 4/4/2013, 0, 75,

8, 7/14/2013, 0, 70,

;

run;

I really appreciate your help. Thanks.

Sandyzman1

mkeintz · Posted 04-18-2022 02:41 PM

One would need to see the log with the code that generated the message you report, but you haven't provided it.

But this is a good opportunity to see the value of the log. You report the following message:

NOTE: Missing values were generated as a result of performing an operation on missing values.

Each place is given by: (Number of times) at (Line):(Column). 4 at 65:6 4 at 65:61

This means you should look at the line identified as line 65 in your log. The statement there is presumably calculating a value, but one or more of the inputs to that calulation is a missing value. Also note that this happens exactly 4 times. All the rest of your observations did not have the problem, i.e. they did not have to base a calculation on a missing value.

I notice that the 4 observations for ID=2 contain NO observations with diagnosis=1, so any calculation for ID=2 based on the date of diagnosis would generate the above message. So you could avoid that result by putting the calculation in the THEN clause of an IF not missing(diag_date) THEN .... statement.

BTW, what do you want to do in the WANT dataset for

Any ID with not diagnosis date
Any ID with a diagnosis date, but not test observations between exactly 2 and 4 months prior.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sandyzman1 · Posted 04-18-2022 05:17 PM

I just wanted to create a flag for >10% weight loss within 2-4 months before the diagnosis of disease for each ID.

1. If with no diagnosis date then the flag will be 0
2. if any diagnosis date, but not test observation then flag = 0.

Only those who meet the criteria will have a flag = 1.

Thanks.
sandyzman1

sandyzman1 · Posted 04-19-2022 09:10 AM

@mkeintz

Thanks! Your code actually worked. I ran each line seperately and noticed that I need to change wgt2 into wgt1 in the line: wgt2=coalesce(weight,wgt2). When I did that I got what I wanted. For the future reference, I used your following code:

data want ;
merge have (keep=id dateoftest diagnosis rename=(dateoftest=diag_date) where=(diagnosis=1))
have;
by id;
set have (firstobs=2 keep=id dateoftest rename=(id=nxt_id dateoftest=nxt_date));
retain wgt1 wgt2 .;
if first.id then call missing(of wgt:);
if intnx('month',diag_date,-4,'sameday') <= dateoftest <= intnx('month',diag_date,-2,'sameday');
if wgt1=. then wgt1=weight;
else wgt2=coalesce(weight,wgt1);
* if intck('month',nxt_date,diag_date,'c')>2 **See editted note below**;
if nmiss(wgt1,wgt2)=0 then wgt_change_ratio= wgt2/wgt1-1;
if wgt_change_ratio ^=. and wgt_change_ratio <= -0.1 then flag = 1;
else flag = 0;
run;

That was super helpful. Thank you so much for your help @mkeintz.

Best,
Sandyzman1

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away