BookmarkSubscribeRSS Feed
Natalie111
Fluorite | Level 6

Hi

I would like to summarize measurements over time for each person using certain criteria with respect to timing. I am interested in two different scenarios. 

 

I am attaching a hypothetical dataset sample and also how my output datasets (in each scenario) should look like.

 

Thank you! 

11 REPLIES 11
PaigeMiller
Diamond | Level 26

First, many of us refuse to download Excel (or other Microsoft Office) documents as they can be security risks. Some of us (like me) refuse to download any documents at all. Data should be provided as working SAS data step code, which you can type in yourself (make sure the code works) or you can follow these instructions.

 

Next, I think your explanation of the problem is very general and not specific enough to write code for. You don't explain what the "two different scenarios" are and you don't explain what "certain criteria" means. Please explain and please be specific.

--
Paige Miller
maguiremq
SAS Super FREQ

Your best bet would be to use `PROC MEANS`, assuming your measurements are numeric. 

 

https://support.sas.com/resources/papers/proceedings10/135-2010.pdf

Natalie111
Fluorite | Level 6

Here is what was included previously in the attachment!

 

Hypothetical dataset:                                    

ID            Date      Diagnosis            

1              9/1/2000              A            

1              9/15/2000            A            

1              11/3/2000            B            

1              3/5/2001              A            

1              4/5/2005              C            

1              5/20/2005            B            

2              6/6/2003              A            

2              1/3/2005              A            

2              2/7/2005              C            

2              8/10/2012            C            

                                               

                                               

Scenario 1:          For each person, group together diagnoses within 1 year (365 days)                        

Output dataset should look like:                               

ID            First date             Last date              Diagnosis

1              9/1/2000              3/5/2001              A,A,B,A

1              4/5/2005                              C

2              6/6/2003                              A

2              1/3/2005              2/7/2005              A,C

2              8/10/2012                            C

                                               

                                               

Secnario 2:          For each person, group together diagnoses allowing for a washout period of at least 5 years between consecutive dates                           

Output dataset should look like:                               

ID            First date             Last date              Diagnosis

1              9/1/2000              5/20/2005            A,A,B,A,C,B

2              6/6/2003              2/7/2005              A,A,C

2              8/10/2012                            C

ballardw
Super User

Maybe not obvious but have to clarify, by "year" do you mean a calendar year or "365 days from the first date"?

 

Why is the desired output for the first scenario

1 4/5/2005 C

instead of

1 4/5/2005  5/20/2005 C,B

 

Just exactly how do you expect to USE these result data sets? Almost any use of a variable with multiple values in it is difficult at best and often impossible to use at all.

 


@Natalie111 wrote:

Here is what was included previously in the attachment!

 

Hypothetical dataset:                                    

ID            Date      Diagnosis            

1              9/1/2000              A            

1              9/15/2000            A            

1              11/3/2000            B            

1              3/5/2001              A            

1              4/5/2005              C            

1              5/20/2005            B            

2              6/6/2003              A            

2              1/3/2005              A            

2              2/7/2005              C            

2              8/10/2012            C            

                                               

                                               

Scenario 1:          For each person, group together diagnoses within 1 year (365 days)                        

Output dataset should look like:                               

ID            First date             Last date              Diagnosis

1              9/1/2000              3/5/2001              A,A,B,A

1              4/5/2005                              C

2              6/6/2003                              A

2              1/3/2005              2/7/2005              A,C

2              8/10/2012                            C

                                               

                                               

Secnario 2:          For each person, group together diagnoses allowing for a washout period of at least 5 years between consecutive dates                           

Output dataset should look like:                               

ID            First date             Last date              Diagnosis

1              9/1/2000              5/20/2005            A,A,B,A,C,B

2              6/6/2003              2/7/2005              A,A,C

2              8/10/2012                            C


 

Natalie111
Fluorite | Level 6
Regarding your first question, I mean 365 days after the first date (and not calendar year).

Yes, you are right it should be summarized as 1 4/5/2005 5/20/2005 C,B.

I can understand the challenge in using data summarized like this but my question relates to how doing so.
mkeintz
PROC Star

So you are saying that no output record should span more than 1 year (i.e. 365 or 366 days). 

 

Then (untested in absence of a working sample data step):

 

data want (keep=id first_date last_date diagnosis_list);
  do until (last.id=1 or nxt_date>=intnx('year',first_date,1,'same'));
    set have (keep=id);
    by id;
    merge have
          have (firstobs=2 keep=date rename=(date=nxt_date));
    
    format first_date last_date date9. ;
    if first_date=. then first_date=date;

    length diagnosis_list $12  /*Long enough to accomodate the longest list of diagnoses*/;
    diagnosis_list=catx(',',diagnosis_list,diagnosis);

    last_date=date;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Natalie111
Fluorite | Level 6

Thank you, I will check if it is working for scenario I. I will also try to modify it to facilitate scenario II but any ideas there are welcome too.

 

Thanks again!

Natalie111
Fluorite | Level 6

I tested the code and I get the following error message:

'Variable first_date is uninitialized.'

Can you maybe help me with that?

 

ballardw
Super User

@Natalie111 wrote:

I tested the code and I get the following error message:

'Variable first_date is uninitialized.'

Can you maybe help me with that?

 


SHOW the LOG of the actual code you run. Without YOUR log we have no idea what you ran.

PaigeMiller
Diamond | Level 26

@Natalie111 wrote:

I can understand the challenge in using data summarized like this but my question relates to how doing so.

This does not answer the question about HOW you will use these results. The reason I bring this up (and probably the reason @ballardw asked in the first place) is that this re-arranging of data is often un-necessary and makes your future coding to use this data more difficult. And so I (and maybe @ballardw too) think doing this re-arranging is a mistake, and we can suggest alternative approaches that will be easier, if only we knew what the next steps you want to do.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Suppose I modify the data for ID=2 to this:

2              6/6/2003              A            
2              1/3/2005              A            
2              2/7/2005              C            
2              1/8/2006              A
2              8/10/2012            C   

Which observations get combined? 1/8/2006 is within 1 year of 2/7/2005, and 2/7/2005 is within 1 year of 1/3/2005, do they all get combined? Or not? Why? What is the desired output in this case?

 

Also, please give a clear answer to @ballardw 's question, which I repeat here: "Just exactly how do you expect to USE these result data sets? Almost any use of a variable with multiple values in it is difficult at best and often impossible to use at all." I too would like to know the answer, as I think performing this re-arrangement of data is not a good idea that makes the next step more difficult.

 

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2052 views
  • 0 likes
  • 5 in conversation