Solved: longitudinal data: counting observations after a date

lone0708 · Posted 09-21-2020 03:47 AM

Hi experts,

i am not comfortable uploading any dataset, but does any of you have a piece of code to count observations after a certain date?

My dataset contains of patient ids (with 1-6 observations), a measurement parameter and dates of diagnosis 1 and 2 (two variables with dates).

I need to know how many observations participants have before and after the diagnoses (if any).

And further, can this be illustrated by some plot? I have made spaghetti-plots for each participant to illustrate when in time they have the measurements made, but I don't know how to integrate the dates, so I can "mark" the timepoint.

I hope, you can help me.

Kurt_Bremser · Posted 09-23-2020 11:13 AM

I have revised the code:

data want;
merge
  patients
  visits
;
by patientid;
if first.patientid
then do;
  count_before_dx1 = 0;
  count_before_dx2 = 0;
  count_after_dx1 = 0;
  count_after_dx2 = 0;
end;
if visit_date lt dx1 then count_before_dx1 + 1;
if visit_date lt dx2 then count_before_dx2 + 1;
if visit_date ge dx1 then count_after_dx1 + 1;
if visit_date ge dx2 then count_after_dx2 + 1;
if last.patientid;
if dx1 = .
then do;
  count_before_dx1 = .;
  count_after_dx1 = .;
end;
if dx2 = .
then do;
  count_before_dx2 = .;
  count_after_dx2 = .;
end;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;

Please use the "little running man" button to post the code, otherwise the main posting window will destroy the formatting.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

ChrisNZ · Posted 09-21-2020 04:17 AM

This is unclear. If you have 6 observations, you have 12 diagnosis dates.

Please provide simple sample data and desired output.

Same for the plot you'd like. Show us.

High-Performance SAS Coding - Third Edition

lone0708 · Posted 09-21-2020 07:16 AM

Hi ChrisNZ,

Does this help?

The desired output is a table, where I can get an overview of the participants and how many measurements, they have after each date.

A table with a summary of the total numbers of measurements after the dates, will also be helpfull.

Regarding the plot, I am actually not sure how to get it illustrated, but I figured a spaghetti-plot for peach patient with maybe a change in colour or mark of when the date of diagnosis has passed.

data have;

input patientid visit_date measure d_dx1 dx2;

datalines;

1 010118 3.4 010118 100118

1 050518 2.3 010118 100118

2 100217 7.3 050217 .

2 170217 7.2 050217 .

2 200217 7.1 050217 .

3 210217 5.4 210217 250217

4 020218 3.4 010118 020218

4 040218 3.2 010118 020218

5 040518 5.6 . 030518

5 050518 5.0 . 030518

;

run;

ballardw · Posted 09-21-2020 10:41 AM

One of the first things to do when asking about "specific date" is to make sure that your variables are actually dates.

The code you show has a simple numeric. And from the values given it really isn't possible to tell if the values are in a ddmmyy or yymmdd layout.

In case no one has mentioned this: 2 digit years are double-plus ungood.

If your values are day month year then "greater than" is really going to need a SAS date value because the current values you show such as 050518 would not be less than 010119.

See if the formatted dates in this match your expectation. This is one example of how to read dates.

data have;
   input patientid visit_date :ddmmyy6. measure d_dx1 :ddmmyy6. dx2 :ddmmyy6.;
   format  visit_date d_dx1 dx2 date9.;
datalines;
1  010118  3.4 010118 100118 
1  050518  2.3 010118 100118
2  100217  7.3 050217 .
2  170217  7.2 050217 .
2  200217  7.1 050217 .
3  210217  5.4 210217 250217
4  020218  3.4 010118 020218
4  040218  3.2 010118 020218
5  040518  5.6 .      030518
5  050518  5.0 .      030518
;
run;

Then tell use which value we are supposed to be comparing for "after a date". AS in by variable name and the rules concerned.

For the type of plot, with a marker at diagnosis, you would want to have a single value for the diagnosis date per patient and likely the measure corresponding to that date, assuming by spaghetti-plot you were plotting the measure value per visit date.

lone0708 · Posted 09-22-2020 03:56 AM

Dear ballardw,

i am sorry that i am not so familiar in working with SAS.
The format you added for the dates are the right ones - thank you.

I do not need to compare values, i need to get a count of how many have measurements (how many observations per patientid) before and on/after the two diagnosis dates (dx1 and dx2) respectively.

First as an overview of all participants with a line for each patient forexample:

patientid count before dx1 count after dx1 count after dx2

1 0 1 1

And then an overview of all (as in proc freq) (i just added some random numbers below):

n (patients) count before dx1 count after dx1 count after dx2

5 3 3 3

Regarding the plot: sounds like a good idea with your suggestion. Can you tell me how to make a corresponding measure variable via a SAS code?

Thanks for your help

lone0708 · Posted 09-21-2020 09:51 AM

Hi experts,

i am not comfortable uploading any dataset, but does any of you have a piece of code to count observations after a certain date?

My dataset contains of patient ids (with 1-6 observations), a measurement parameter and dates of diagnosis 1 and 2 (two variables with dates).

I need to know how many observations participants have before and after the diagnoses (if any).

And further, can this be illustrated by some plot? I have made spaghetti-plots for each participant to illustrate when in time they have the measurements made, but I don't know how to integrate the dates, so I can "mark" the timepoint.

The desired output is a table, where I can get an overview of the participants and how many measurements, they have after each date.

A table with a summary of the total numbers of measurements after the dates, will also be helpfull.

Regarding the plot, I am actually not sure how to get it illustrated, but I figured a spaghetti-plot for peach patient with maybe a change in colour or mark of when the date of diagnosis has passed.

data have;

input patientid visit_date measure d_dx1 dx2;

datalines;

1 010118 3.4 010118 100118

1 050518 2.3 010118 100118

2 100217 7.3 050217 .

2 170217 7.2 050217 .

2 200217 7.1 050217 .

3 210217 5.4 210217 250217

4 020218 3.4 010118 020218

4 040218 3.2 010118 020218

5 040518 5.6 . 030518

5 050518 5.0 . 030518

;

run;

I hope, you can help me.

ChrisNZ · Posted 09-21-2020 11:01 PM

>I hope, you can help me.

You need to help yourself first.

As mentioned these dates does not look like dates. Are they just a number to be interpreted as a date?

More importantly we still have no idea what result you want.

High-Performance SAS Coding - Third Edition

Kurt_Bremser · Posted 09-22-2020 05:15 AM

First, let's read your data in unambiguous manner, and create two datasets to avoid wasting space for redundant information:

data patients;
input patientid $ (dx1 dx2) (:yymmdd8.);
format dx1 dx2 yymmdd10.;
datalines;
1 20180101 20180101
2 20170205 .
3 20170221 20170225
4 20180101 20180202
5 . 20180503
;

data visits;
input patientid $ visit_date :yymmdd8. measure;
format visit_date yymmdd10.;
datalines;
1  20180101  3.4
1  20180505  2.3
2  20170210  7.3
2  20170217  7.2
2  20170220  7.1
3  20170221  5.4
4  20180202  3.4
4  20180204  3.2
5  20180504  5.6
5  20180505  5.0
;

Then you use BY processing in a DATA step to create the counts:

data want;
merge
  patients
  visits
;
by patientid;
if first.patientid
then do;
  count_before_dx1 = 0;
  count_after_dx1 = 0;
  count_after_dx2 = 0;
end;
if dx1 ne .
then do;
  if visit_date lt dx1
  then count_before_dx1 + 1;
  if visit_date gt dx1
  then count_after_dx1 + 1;
end;
if visit_date gt dx2 and dx2 ne .
then count_after_dx2 + 1;
if last.patientid;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2;
run;

You may need to tweak the conditions to get the counts you want (le/ge instead of lt/gt, or omit the checks for missing).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

lone0708 · Posted 09-22-2020 09:17 AM

Amazing! Thank you Kurtbremser, it works!
okay - well, the only adjustment i need to do now, is if a person is missing a diagnosis. Right now the count occur as "0", but the diagnosis were never there, so how do i correct that to just "." without ruining the true "0"s ?

Kurt_Bremser · Posted 09-22-2020 09:34 AM

After the subsetting IF, you can add logic for missing dx1/dx2.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

lone0708 · Posted 09-23-2020 02:39 AM

Thanks for your quick replies,
I can't make it work with the logic for missing dx1/dx2.
Is it possible for you to write it in the code, please?

Kurt_Bremser · Posted 09-23-2020 02:42 AM

A simple IF-THEN:

if dx1 = . then count_before_dx1 = .;

Likewise for the other counts.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

lone0708 · Posted 09-23-2020 03:15 AM

Thank you very much for your patience 🙂 This saved me hours!

lone0708 · Posted 09-23-2020 04:37 AM

okay , now I am trying to also integrate count_before_dx2, but I can't find out where to put it in the code.
The output showed me "0" I the cells even though my code looks like this:

data want;
merge
patients
visits
;
by patientid;
if first.patientid
then do;
count_before_dx1 = 0;
count_before_dx2 = 0;
count_after_dx1 = 0;
count_after_dx2 = 0;
end;
if dx1 ne .
then do;
if visit_date lt dx1
then count_before_dx1 + 1;
if visit_date lt dx2
then count_before_dx2 + 1;
if visit_date ge dx1
then count_after_dx1 + 1;
end;
if visit_date lt dx2
then count_before_dx2 + 1;
if visit_date gt dx2 and dx2 ne .
then count_after_dx2 + 1;
if dx1 = . then count_before_dx1 = .;
if dx2 = . then count_before_dx2 = .;
if dx1 = . then count_after_dx1 = .;
if dx2 = . then count_after_dx2 = .;
if last.patientid;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;

Kurt_Bremser · Posted 09-23-2020 11:13 AM

I have revised the code:

data want;
merge
  patients
  visits
;
by patientid;
if first.patientid
then do;
  count_before_dx1 = 0;
  count_before_dx2 = 0;
  count_after_dx1 = 0;
  count_after_dx2 = 0;
end;
if visit_date lt dx1 then count_before_dx1 + 1;
if visit_date lt dx2 then count_before_dx2 + 1;
if visit_date ge dx1 then count_after_dx1 + 1;
if visit_date ge dx2 then count_after_dx2 + 1;
if last.patientid;
if dx1 = .
then do;
  count_before_dx1 = .;
  count_after_dx1 = .;
end;
if dx2 = .
then do;
  count_before_dx2 = .;
  count_after_dx2 = .;
end;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;

Please use the "little running man" button to post the code, otherwise the main posting window will destroy the formatting.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

counting observations after certain date, longitudinal measurements

Re: counting observations after certain date, longitudinal measurements

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Re: longitudinal data: counting observations after a date

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!