BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ad30
Calcite | Level 5

Hello I  am a SAS newbie.

I have a dataset with patients with Unique IDs

For each Year in the Case_year column I want to calculate:

The count of distinct IDs where the Case_year is before or equal to that year - the count of distinct IDs where the Death_year is before or equal to that year

 

How do I do that?

Thanks

 

IDCase_yearDeath_Year
120102019
220112013
320102016
42013 
520142010
62011 
720122012
82013 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You did not answer the second part about how long you are following these cases that have not died.

 

To recreate your expected output you want to stop counting in 2014.

 

First it is actually easier to just paste in the code to create the data than go to the trouble of making a table.

data have;
  input ID Case_year Death_Year Age_at_case Gender :$1.;
cards;
1  2010  2019  40  M
2  2011  2013  42  F
3  2010  2016  50  M
4  2013     .  52  M
5  2014  2014  61  . 
6  2011     .  40  M
7  2012  2012  43  M
8  2013     .  56  F
;

You just need to expand the data to one record per year you want to count.

%let cutoff=2015;
data years ;
  set have;
  do year=case_year to min(death_year,&cutoff)-1;
    output;
  end;
run;

Results:

Table of year by Gender

year      Gender

Frequency|
Percent  |
Row Pct  |
Col Pct  |F       |M       |  Total
---------+--------+--------+
    2010 |      0 |      2 |      2
         |   0.00 |  10.00 |  10.00
         |   0.00 | 100.00 |
         |   0.00 |  12.50 |
---------+--------+--------+
    2011 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2012 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2013 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
    2014 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
Total           4       16       20
            20.00    80.00   100.00

 

View solution in original post

9 REPLIES 9
ballardw
Super User

How do you want the output to look? There are lots of ways to count and which one is "correct" depends on the desired result.

 

Missing values are less than anything. So you need to clarify for your problem if you mean "death_year less than or equal to case_year" or "death_year less than or equal to case_year and not missing death_year".

 

Note: if the ID value is actually unique, then you don't need to count ID, just records with the condition.

Ad30
Calcite | Level 5
Thank you for your reply
I want the output to look like this
year Prevalence
2010 2
2011 2
2012 3
2013 4
2014 5

And I want - "death_year less than or equal to case_year and NOT missing death_year"
Ad30
Calcite | Level 5
yearPrevalence
20102
20112
20123
20134
20145

This is what I want the output to look like

ballardw
Super User

@Ad30 wrote:
year Prevalence
2010 2
2011 2
2012 3
2013 4
2014 5

This is what I want the output to look like


Okay, now  you have to either:

1) explain where that 5 comes from for 2014 as your example only shows one case_year value of 2014,

2) provide output that matches the example input

3) provide input that will allow making that example output.

Ad30
Calcite | Level 5

 

Sorry about that.

Okay so if this if the input

IDCase_yearDeath_YearAge_at_caseGender
12010201940M
22011201342F
32010201650M
42013 52M
52014201461 
62011 40M
72012201243M
82013 56F

 

The output would be

YearPrevalence
20102
20114
20124
20135
20145

 

Also would it be possible to then get the ouput by year but also by gender and age?

YearGenderPrevalence
2010F0
2010M2
2011F 
2011M 
2012F 
2012M 
Tom
Super User Tom
Super User

Not sure I follow.  To have 2014 count as 5 then you must be treating the cases without DEATH_YEAR as ongoing. In that case what is the upper bound for those cases?  Is it this year?  last year?  Do you have some type of last follow up date instead that might be different for different cases?

 

 

Ad30
Calcite | Level 5

Yes cases without DEATH_YEAR are ongoing. They are still prevalent cases.

 

So for 2014 we would be counting 

the 8 (Cases highlighted in red in the Case_Year  column because they all occured in 2014 or earlier) -The 3 (Cases highlighted in blue in the Death_Year column because those patients died  in 2014 or before 2014 so they cannot be counted towards the cases still active in 2014). The result would be 8-3= 5

If a patient does not have a death year they are still alive so they still count towards the prevalent cases in the year of interest.

 

IDCase_yearDeath_YearAge_at_caseGender
12010201940M
22011201342F
32010201650M
42013 52M
52014201461 
62011 40M
72012201243M
82013 56F

 

Tom
Super User Tom
Super User

You did not answer the second part about how long you are following these cases that have not died.

 

To recreate your expected output you want to stop counting in 2014.

 

First it is actually easier to just paste in the code to create the data than go to the trouble of making a table.

data have;
  input ID Case_year Death_Year Age_at_case Gender :$1.;
cards;
1  2010  2019  40  M
2  2011  2013  42  F
3  2010  2016  50  M
4  2013     .  52  M
5  2014  2014  61  . 
6  2011     .  40  M
7  2012  2012  43  M
8  2013     .  56  F
;

You just need to expand the data to one record per year you want to count.

%let cutoff=2015;
data years ;
  set have;
  do year=case_year to min(death_year,&cutoff)-1;
    output;
  end;
run;

Results:

Table of year by Gender

year      Gender

Frequency|
Percent  |
Row Pct  |
Col Pct  |F       |M       |  Total
---------+--------+--------+
    2010 |      0 |      2 |      2
         |   0.00 |  10.00 |  10.00
         |   0.00 | 100.00 |
         |   0.00 |  12.50 |
---------+--------+--------+
    2011 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2012 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2013 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
    2014 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
Total           4       16       20
            20.00    80.00   100.00

 

Ad30
Calcite | Level 5

Thanks so much, that worked

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1971 views
  • 1 like
  • 3 in conversation