Hi
I would like to summarize measurements over time for each person using certain criteria with respect to timing. I am interested in two different scenarios.
I am attaching a hypothetical dataset sample and also how my output datasets (in each scenario) should look like.
Thank you!
First, many of us refuse to download Excel (or other Microsoft Office) documents as they can be security risks. Some of us (like me) refuse to download any documents at all. Data should be provided as working SAS data step code, which you can type in yourself (make sure the code works) or you can follow these instructions.
Next, I think your explanation of the problem is very general and not specific enough to write code for. You don't explain what the "two different scenarios" are and you don't explain what "certain criteria" means. Please explain and please be specific.
Your best bet would be to use `PROC MEANS`, assuming your measurements are numeric.
https://support.sas.com/resources/papers/proceedings10/135-2010.pdf
Here is what was included previously in the attachment!
Hypothetical dataset:
ID Date Diagnosis
1 9/1/2000 A
1 9/15/2000 A
1 11/3/2000 B
1 3/5/2001 A
1 4/5/2005 C
1 5/20/2005 B
2 6/6/2003 A
2 1/3/2005 A
2 2/7/2005 C
2 8/10/2012 C
Scenario 1: For each person, group together diagnoses within 1 year (365 days)
Output dataset should look like:
ID First date Last date Diagnosis
1 9/1/2000 3/5/2001 A,A,B,A
1 4/5/2005 C
2 6/6/2003 A
2 1/3/2005 2/7/2005 A,C
2 8/10/2012 C
Secnario 2: For each person, group together diagnoses allowing for a washout period of at least 5 years between consecutive dates
Output dataset should look like:
ID First date Last date Diagnosis
1 9/1/2000 5/20/2005 A,A,B,A,C,B
2 6/6/2003 2/7/2005 A,A,C
2 8/10/2012 C
Maybe not obvious but have to clarify, by "year" do you mean a calendar year or "365 days from the first date"?
Why is the desired output for the first scenario
1 4/5/2005 C
instead of
1 4/5/2005 5/20/2005 C,B
Just exactly how do you expect to USE these result data sets? Almost any use of a variable with multiple values in it is difficult at best and often impossible to use at all.
@Natalie111 wrote:
Here is what was included previously in the attachment!
Hypothetical dataset:
ID Date Diagnosis
1 9/1/2000 A
1 9/15/2000 A
1 11/3/2000 B
1 3/5/2001 A
1 4/5/2005 C
1 5/20/2005 B
2 6/6/2003 A
2 1/3/2005 A
2 2/7/2005 C
2 8/10/2012 C
Scenario 1: For each person, group together diagnoses within 1 year (365 days)
Output dataset should look like:
ID First date Last date Diagnosis
1 9/1/2000 3/5/2001 A,A,B,A
1 4/5/2005 C
2 6/6/2003 A
2 1/3/2005 2/7/2005 A,C
2 8/10/2012 C
Secnario 2: For each person, group together diagnoses allowing for a washout period of at least 5 years between consecutive dates
Output dataset should look like:
ID First date Last date Diagnosis
1 9/1/2000 5/20/2005 A,A,B,A,C,B
2 6/6/2003 2/7/2005 A,A,C
2 8/10/2012 C
So you are saying that no output record should span more than 1 year (i.e. 365 or 366 days).
Then (untested in absence of a working sample data step):
data want (keep=id first_date last_date diagnosis_list);
do until (last.id=1 or nxt_date>=intnx('year',first_date,1,'same'));
set have (keep=id);
by id;
merge have
have (firstobs=2 keep=date rename=(date=nxt_date));
format first_date last_date date9. ;
if first_date=. then first_date=date;
length diagnosis_list $12 /*Long enough to accomodate the longest list of diagnoses*/;
diagnosis_list=catx(',',diagnosis_list,diagnosis);
last_date=date;
end;
run;
Thank you, I will check if it is working for scenario I. I will also try to modify it to facilitate scenario II but any ideas there are welcome too.
Thanks again!
I tested the code and I get the following error message:
'Variable first_date is uninitialized.'
Can you maybe help me with that?
@Natalie111 wrote:
I tested the code and I get the following error message:
'Variable first_date is uninitialized.'
Can you maybe help me with that?
SHOW the LOG of the actual code you run. Without YOUR log we have no idea what you ran.
@Natalie111 wrote:
I can understand the challenge in using data summarized like this but my question relates to how doing so.
This does not answer the question about HOW you will use these results. The reason I bring this up (and probably the reason @ballardw asked in the first place) is that this re-arranging of data is often un-necessary and makes your future coding to use this data more difficult. And so I (and maybe @ballardw too) think doing this re-arranging is a mistake, and we can suggest alternative approaches that will be easier, if only we knew what the next steps you want to do.
Suppose I modify the data for ID=2 to this:
2 6/6/2003 A
2 1/3/2005 A
2 2/7/2005 C
2 1/8/2006 A
2 8/10/2012 C
Which observations get combined? 1/8/2006 is within 1 year of 2/7/2005, and 2/7/2005 is within 1 year of 1/3/2005, do they all get combined? Or not? Why? What is the desired output in this case?
Also, please give a clear answer to @ballardw 's question, which I repeat here: "Just exactly how do you expect to USE these result data sets? Almost any use of a variable with multiple values in it is difficult at best and often impossible to use at all." I too would like to know the answer, as I think performing this re-arrangement of data is not a good idea that makes the next step more difficult.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.