Hi experts,
i am not comfortable uploading any dataset, but does any of you have a piece of code to count observations after a certain date?
My dataset contains of patient ids (with 1-6 observations), a measurement parameter and dates of diagnosis 1 and 2 (two variables with dates).
I need to know how many observations participants have before and after the diagnoses (if any).
And further, can this be illustrated by some plot? I have made spaghetti-plots for each participant to illustrate when in time they have the measurements made, but I don't know how to integrate the dates, so I can "mark" the timepoint.
I hope, you can help me.
I have revised the code:
data want;
merge
patients
visits
;
by patientid;
if first.patientid
then do;
count_before_dx1 = 0;
count_before_dx2 = 0;
count_after_dx1 = 0;
count_after_dx2 = 0;
end;
if visit_date lt dx1 then count_before_dx1 + 1;
if visit_date lt dx2 then count_before_dx2 + 1;
if visit_date ge dx1 then count_after_dx1 + 1;
if visit_date ge dx2 then count_after_dx2 + 1;
if last.patientid;
if dx1 = .
then do;
count_before_dx1 = .;
count_after_dx1 = .;
end;
if dx2 = .
then do;
count_before_dx2 = .;
count_after_dx2 = .;
end;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;
Please use the "little running man" button to post the code, otherwise the main posting window will destroy the formatting.
This is unclear. If you have 6 observations, you have 12 diagnosis dates.
Please provide simple sample data and desired output.
Same for the plot you'd like. Show us.
Hi ChrisNZ,
Does this help?
The desired output is a table, where I can get an overview of the participants and how many measurements, they have after each date.
A table with a summary of the total numbers of measurements after the dates, will also be helpfull.
Regarding the plot, I am actually not sure how to get it illustrated, but I figured a spaghetti-plot for peach patient with maybe a change in colour or mark of when the date of diagnosis has passed.
data have;
input patientid visit_date measure d_dx1 dx2;
datalines;
1 010118 3.4 010118 100118
1 050518 2.3 010118 100118
2 100217 7.3 050217 .
2 170217 7.2 050217 .
2 200217 7.1 050217 .
3 210217 5.4 210217 250217
4 020218 3.4 010118 020218
4 040218 3.2 010118 020218
5 040518 5.6 . 030518
5 050518 5.0 . 030518
;
run;
One of the first things to do when asking about "specific date" is to make sure that your variables are actually dates.
The code you show has a simple numeric. And from the values given it really isn't possible to tell if the values are in a ddmmyy or yymmdd layout.
In case no one has mentioned this: 2 digit years are double-plus ungood.
If your values are day month year then "greater than" is really going to need a SAS date value because the current values you show such as 050518 would not be less than 010119.
See if the formatted dates in this match your expectation. This is one example of how to read dates.
data have; input patientid visit_date :ddmmyy6. measure d_dx1 :ddmmyy6. dx2 :ddmmyy6.; format visit_date d_dx1 dx2 date9.; datalines; 1 010118 3.4 010118 100118 1 050518 2.3 010118 100118 2 100217 7.3 050217 . 2 170217 7.2 050217 . 2 200217 7.1 050217 . 3 210217 5.4 210217 250217 4 020218 3.4 010118 020218 4 040218 3.2 010118 020218 5 040518 5.6 . 030518 5 050518 5.0 . 030518 ; run;
Then tell use which value we are supposed to be comparing for "after a date". AS in by variable name and the rules concerned.
For the type of plot, with a marker at diagnosis, you would want to have a single value for the diagnosis date per patient and likely the measure corresponding to that date, assuming by spaghetti-plot you were plotting the measure value per visit date.
Dear ballardw,
i am sorry that i am not so familiar in working with SAS.
The format you added for the dates are the right ones - thank you.
I do not need to compare values, i need to get a count of how many have measurements (how many observations per patientid) before and on/after the two diagnosis dates (dx1 and dx2) respectively.
First as an overview of all participants with a line for each patient forexample:
patientid count before dx1 count after dx1 count after dx2
1 0 1 1
And then an overview of all (as in proc freq) (i just added some random numbers below):
n (patients) count before dx1 count after dx1 count after dx2
5 3 3 3
Regarding the plot: sounds like a good idea with your suggestion. Can you tell me how to make a corresponding measure variable via a SAS code?
Thanks for your help
Hi experts,
i am not comfortable uploading any dataset, but does any of you have a piece of code to count observations after a certain date?
My dataset contains of patient ids (with 1-6 observations), a measurement parameter and dates of diagnosis 1 and 2 (two variables with dates).
I need to know how many observations participants have before and after the diagnoses (if any).
And further, can this be illustrated by some plot? I have made spaghetti-plots for each participant to illustrate when in time they have the measurements made, but I don't know how to integrate the dates, so I can "mark" the timepoint.
The desired output is a table, where I can get an overview of the participants and how many measurements, they have after each date.
A table with a summary of the total numbers of measurements after the dates, will also be helpfull.
Regarding the plot, I am actually not sure how to get it illustrated, but I figured a spaghetti-plot for peach patient with maybe a change in colour or mark of when the date of diagnosis has passed.
data have;
input patientid visit_date measure d_dx1 dx2;
datalines;
1 010118 3.4 010118 100118
1 050518 2.3 010118 100118
2 100217 7.3 050217 .
2 170217 7.2 050217 .
2 200217 7.1 050217 .
3 210217 5.4 210217 250217
4 020218 3.4 010118 020218
4 040218 3.2 010118 020218
5 040518 5.6 . 030518
5 050518 5.0 . 030518
;
run;
I hope, you can help me.
>I hope, you can help me.
You need to help yourself first.
As mentioned these dates does not look like dates. Are they just a number to be interpreted as a date?
More importantly we still have no idea what result you want.
First, let's read your data in unambiguous manner, and create two datasets to avoid wasting space for redundant information:
data patients;
input patientid $ (dx1 dx2) (:yymmdd8.);
format dx1 dx2 yymmdd10.;
datalines;
1 20180101 20180101
2 20170205 .
3 20170221 20170225
4 20180101 20180202
5 . 20180503
;
data visits;
input patientid $ visit_date :yymmdd8. measure;
format visit_date yymmdd10.;
datalines;
1 20180101 3.4
1 20180505 2.3
2 20170210 7.3
2 20170217 7.2
2 20170220 7.1
3 20170221 5.4
4 20180202 3.4
4 20180204 3.2
5 20180504 5.6
5 20180505 5.0
;
Then you use BY processing in a DATA step to create the counts:
data want;
merge
patients
visits
;
by patientid;
if first.patientid
then do;
count_before_dx1 = 0;
count_after_dx1 = 0;
count_after_dx2 = 0;
end;
if dx1 ne .
then do;
if visit_date lt dx1
then count_before_dx1 + 1;
if visit_date gt dx1
then count_after_dx1 + 1;
end;
if visit_date gt dx2 and dx2 ne .
then count_after_dx2 + 1;
if last.patientid;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2;
run;
You may need to tweak the conditions to get the counts you want (le/ge instead of lt/gt, or omit the checks for missing).
After the subsetting IF, you can add logic for missing dx1/dx2.
A simple IF-THEN:
if dx1 = . then count_before_dx1 = .;
Likewise for the other counts.
I have revised the code:
data want;
merge
patients
visits
;
by patientid;
if first.patientid
then do;
count_before_dx1 = 0;
count_before_dx2 = 0;
count_after_dx1 = 0;
count_after_dx2 = 0;
end;
if visit_date lt dx1 then count_before_dx1 + 1;
if visit_date lt dx2 then count_before_dx2 + 1;
if visit_date ge dx1 then count_after_dx1 + 1;
if visit_date ge dx2 then count_after_dx2 + 1;
if last.patientid;
if dx1 = .
then do;
count_before_dx1 = .;
count_after_dx1 = .;
end;
if dx2 = .
then do;
count_before_dx2 = .;
count_after_dx2 = .;
end;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;
Please use the "little running man" button to post the code, otherwise the main posting window will destroy the formatting.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.