BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lone0708
Fluorite | Level 6

Hi experts, 

i am not comfortable uploading any dataset, but does any of you have a piece of code to count observations after a certain date?

My dataset contains of patient ids (with 1-6 observations), a measurement parameter and dates of diagnosis 1 and 2 (two variables with dates). 

I need to know how many observations participants have before and after the diagnoses (if any). 

And further, can this be illustrated by some plot? I have made spaghetti-plots for each participant to illustrate when in time they have the measurements made, but I don't know how to integrate the dates, so I can "mark" the timepoint. 

I hope, you can help me. 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I have revised the code:

data want;
merge
  patients
  visits
;
by patientid;
if first.patientid
then do;
  count_before_dx1 = 0;
  count_before_dx2 = 0;
  count_after_dx1 = 0;
  count_after_dx2 = 0;
end;
if visit_date lt dx1 then count_before_dx1 + 1;
if visit_date lt dx2 then count_before_dx2 + 1;
if visit_date ge dx1 then count_after_dx1 + 1;
if visit_date ge dx2 then count_after_dx2 + 1;
if last.patientid;
if dx1 = .
then do;
  count_before_dx1 = .;
  count_after_dx1 = .;
end;
if dx2 = .
then do;
  count_before_dx2 = .;
  count_after_dx2 = .;
end;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;

Please use the "little running man" button to post the code, otherwise the main posting window will destroy the formatting.

View solution in original post

14 REPLIES 14
ChrisNZ
Tourmaline | Level 20

This is unclear. If you have 6 observations, you have 12 diagnosis dates.

Please provide simple sample data and desired output.

Same for the plot you'd like. Show us.

 

lone0708
Fluorite | Level 6

Hi ChrisNZ,

Does this help?

The desired output is a table, where I can get an overview of the participants and how many measurements, they have after each date. 

A table with a summary of the total numbers of measurements after the dates, will also be helpfull. 

 

Regarding the plot, I am actually not sure how to get it illustrated, but I figured a spaghetti-plot for peach patient with maybe a change in colour or mark of when the date of diagnosis has passed. 


data
 have;

input patientid visit_date measure d_dx1 dx2;

datalines;

1  010118  3.4 010118 100118 

1  050518  2.3 010118 100118

2  100217  7.3 050217 .

2  170217  7.2 050217 .

2  200217  7.1 050217 .

3  210217  5.4 210217 250217

4  020218  3.4 010118 020218

4  040218  3.2 010118 020218

5  040518  5.6 .      030518

5  050518  5.0 .      030518

;

run;

ballardw
Super User

One of the first things to do when asking about "specific date" is to make sure that your variables are actually dates.

The code you show has a simple numeric. And from the values given it really isn't possible to tell if the values are in a ddmmyy or yymmdd layout.

 

In case no one has mentioned this: 2 digit years are double-plus ungood.

If your values are day month year then "greater than" is really going to need a SAS date value because the current values you show such as 050518 would not be less than 010119.

See if the formatted dates in this match your expectation. This is one example of how to read dates.

data have;
   input patientid visit_date :ddmmyy6. measure d_dx1 :ddmmyy6. dx2 :ddmmyy6.;
   format  visit_date d_dx1 dx2 date9.;
datalines;
1  010118  3.4 010118 100118 
1  050518  2.3 010118 100118
2  100217  7.3 050217 .
2  170217  7.2 050217 .
2  200217  7.1 050217 .
3  210217  5.4 210217 250217
4  020218  3.4 010118 020218
4  040218  3.2 010118 020218
5  040518  5.6 .      030518
5  050518  5.0 .      030518
;
run;

Then tell use which value we are supposed to be comparing for "after a date". AS in by variable name and the rules concerned.

For the type of plot, with a marker at diagnosis, you would want to have a single value for the diagnosis date per patient and likely the measure corresponding to that date, assuming by spaghetti-plot you were plotting the measure value per visit date.

 

lone0708
Fluorite | Level 6

Dear ballardw,

i am sorry that i am not so familiar in working with SAS.
The format you added for the dates are the right ones - thank you.

 

I do not need to compare values, i need to get a count of how many have measurements (how many observations per patientid) before and on/after the two diagnosis dates (dx1 and dx2) respectively.

First as an overview of all participants with a line for each patient forexample:

patientid     count before dx1  count after dx1 count after dx2

1                         0                            1                     1

 

And then an overview of all (as in proc freq) (i just added some random numbers below):

n (patients)   count before dx1  count after dx1   count after dx2

      5                      3                             3                          3

 

 

Regarding the plot: sounds like a good idea with your suggestion. Can you tell me how to make a corresponding measure variable via a SAS code?

Thanks for your help

 

lone0708
Fluorite | Level 6

Hi experts, 

i am not comfortable uploading any dataset, but does any of you have a piece of code to count observations after a certain date?

My dataset contains of patient ids (with 1-6 observations), a measurement parameter and dates of diagnosis 1 and 2 (two variables with dates). 

I need to know how many observations participants have before and after the diagnoses (if any). 

And further, can this be illustrated by some plot? I have made spaghetti-plots for each participant to illustrate when in time they have the measurements made, but I don't know how to integrate the dates, so I can "mark" the timepoint. 

 

The desired output is a table, where I can get an overview of the participants and how many measurements, they have after each date. 

A table with a summary of the total numbers of measurements after the dates, will also be helpfull. 

 

Regarding the plot, I am actually not sure how to get it illustrated, but I figured a spaghetti-plot for peach patient with maybe a change in colour or mark of when the date of diagnosis has passed. 


data
 have;

input patientid visit_date measure d_dx1 dx2;

datalines;

1  010118  3.4 010118 100118 

1  050518  2.3 010118 100118

2  100217  7.3 050217 .

2  170217  7.2 050217 .

2  200217  7.1 050217 .

3  210217  5.4 210217 250217

4  020218  3.4 010118 020218

4  040218  3.2 010118 020218

5  040518  5.6 .      030518

5  050518  5.0 .      030518

;

run;


I hope, you can help me. 

ChrisNZ
Tourmaline | Level 20

>I hope, you can help me. 

You need to help yourself first.

 

As mentioned these dates does not look like dates. Are they just a number to be interpreted as a date?

 

More importantly we still have no idea what result you want.

Kurt_Bremser
Super User

First, let's read your data in unambiguous manner, and create two datasets to avoid wasting space for redundant information:

data patients;
input patientid $ (dx1 dx2) (:yymmdd8.);
format dx1 dx2 yymmdd10.;
datalines;
1 20180101 20180101
2 20170205 .
3 20170221 20170225
4 20180101 20180202
5 . 20180503
;

data visits;
input patientid $ visit_date :yymmdd8. measure;
format visit_date yymmdd10.;
datalines;
1  20180101  3.4
1  20180505  2.3
2  20170210  7.3
2  20170217  7.2
2  20170220  7.1
3  20170221  5.4
4  20180202  3.4
4  20180204  3.2
5  20180504  5.6
5  20180505  5.0
;

Then you use BY processing in a DATA step to create the counts:

data want;
merge
  patients
  visits
;
by patientid;
if first.patientid
then do;
  count_before_dx1 = 0;
  count_after_dx1 = 0;
  count_after_dx2 = 0;
end;
if dx1 ne .
then do;
  if visit_date lt dx1
  then count_before_dx1 + 1;
  if visit_date gt dx1
  then count_after_dx1 + 1;
end;
if visit_date gt dx2 and dx2 ne .
then count_after_dx2 + 1;
if last.patientid;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2;
run;

You may need to tweak the conditions to get the counts you want (le/ge instead of lt/gt, or omit the checks for missing).

lone0708
Fluorite | Level 6
Amazing! Thank you Kurtbremser, it works!
okay - well, the only adjustment i need to do now, is if a person is missing a diagnosis. Right now the count occur as "0", but the diagnosis were never there, so how do i correct that to just "." without ruining the true "0"s ?
lone0708
Fluorite | Level 6
Thanks for your quick replies,
I can't make it work with the logic for missing dx1/dx2.
Is it possible for you to write it in the code, please?
lone0708
Fluorite | Level 6
Thank you very much for your patience 🙂 This saved me hours!
lone0708
Fluorite | Level 6
okay , now I am trying to also integrate count_before_dx2, but I can't find out where to put it in the code.
The output showed me "0" I the cells even though my code looks like this:

data want;
merge
patients
visits
;
by patientid;
if first.patientid
then do;
count_before_dx1 = 0;
count_before_dx2 = 0;
count_after_dx1 = 0;
count_after_dx2 = 0;
end;
if dx1 ne .
then do;
if visit_date lt dx1
then count_before_dx1 + 1;
if visit_date lt dx2
then count_before_dx2 + 1;
if visit_date ge dx1
then count_after_dx1 + 1;
end;
if visit_date lt dx2
then count_before_dx2 + 1;
if visit_date gt dx2 and dx2 ne .
then count_after_dx2 + 1;
if dx1 = . then count_before_dx1 = .;
if dx2 = . then count_before_dx2 = .;
if dx1 = . then count_after_dx1 = .;
if dx2 = . then count_after_dx2 = .;
if last.patientid;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;
Kurt_Bremser
Super User

I have revised the code:

data want;
merge
  patients
  visits
;
by patientid;
if first.patientid
then do;
  count_before_dx1 = 0;
  count_before_dx2 = 0;
  count_after_dx1 = 0;
  count_after_dx2 = 0;
end;
if visit_date lt dx1 then count_before_dx1 + 1;
if visit_date lt dx2 then count_before_dx2 + 1;
if visit_date ge dx1 then count_after_dx1 + 1;
if visit_date ge dx2 then count_after_dx2 + 1;
if last.patientid;
if dx1 = .
then do;
  count_before_dx1 = .;
  count_after_dx1 = .;
end;
if dx2 = .
then do;
  count_before_dx2 = .;
  count_after_dx2 = .;
end;
keep patientid count_before_dx1 count_after_dx1 count_after_dx2 count_before_dx2;
run;

Please use the "little running man" button to post the code, otherwise the main posting window will destroy the formatting.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1127 views
  • 0 likes
  • 4 in conversation