Solved: SAS loop a calculation over the year values in a column

Ad30 · Posted 06-17-2022 01:40 AM

Hello I am a SAS newbie.

I have a dataset with patients with Unique IDs

For each Year in the Case_year column I want to calculate:

The count of distinct IDs where the Case_year is before or equal to that year - the count of distinct IDs where the Death_year is before or equal to that year

How do I do that?

Thanks

ID	Case_year	Death_Year
1	2010	2019
2	2011	2013
3	2010	2016
4	2013
5	2014	2010
6	2011
7	2012	2012
8	2013

Tom · Posted 06-21-2022 10:57 PM

You did not answer the second part about how long you are following these cases that have not died.

To recreate your expected output you want to stop counting in 2014.

First it is actually easier to just paste in the code to create the data than go to the trouble of making a table.

data have;
  input ID Case_year Death_Year Age_at_case Gender :$1.;
cards;
1  2010  2019  40  M
2  2011  2013  42  F
3  2010  2016  50  M
4  2013     .  52  M
5  2014  2014  61  . 
6  2011     .  40  M
7  2012  2012  43  M
8  2013     .  56  F
;

You just need to expand the data to one record per year you want to count.

%let cutoff=2015;
data years ;
  set have;
  do year=case_year to min(death_year,&cutoff)-1;
    output;
  end;
run;

Results:

Table of year by Gender

year      Gender

Frequency|
Percent  |
Row Pct  |
Col Pct  |F       |M       |  Total
---------+--------+--------+
    2010 |      0 |      2 |      2
         |   0.00 |  10.00 |  10.00
         |   0.00 | 100.00 |
         |   0.00 |  12.50 |
---------+--------+--------+
    2011 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2012 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2013 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
    2014 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
Total           4       16       20
            20.00    80.00   100.00

View solution in original post

ballardw · Posted 06-17-2022 02:07 AM

How do you want the output to look? There are lots of ways to count and which one is "correct" depends on the desired result.

Missing values are less than anything. So you need to clarify for your problem if you mean "death_year less than or equal to case_year" or "death_year less than or equal to case_year and not missing death_year".

Note: if the ID value is actually unique, then you don't need to count ID, just records with the condition.

Ad30 · Posted 06-17-2022 02:36 AM

Thank you for your reply
I want the output to look like this
year Prevalence
2010 2
2011 2
2012 3
2013 4
2014 5

And I want - "death_year less than or equal to case_year and NOT missing death_year"

Ad30 · Posted 06-17-2022 02:36 AM

year	Prevalence
2010	2
2011	2
2012	3
2013	4
2014	5

This is what I want the output to look like

ballardw · Posted 06-17-2022 10:51 AM

@Ad30 wrote:

year Prevalence

2010 2

2011 2

2012 3

2013 4

2014 5

This is what I want the output to look like

Okay, now you have to either:

1) explain where that 5 comes from for 2014 as your example only shows one case_year value of 2014,

2) provide output that matches the example input

3) provide input that will allow making that example output.

Ad30 · Posted 06-21-2022 06:32 PM

Sorry about that.

Okay so if this if the input

ID	Case_year	Death_Year	Age_at_case	Gender
1	2010	2019	40	M
2	2011	2013	42	F
3	2010	2016	50	M
4	2013		52	M
5	2014	2014	61
6	2011		40	M
7	2012	2012	43	M
8	2013		56	F

The output would be

Year	Prevalence
2010	2
2011	4
2012	4
2013	5
2014	5

Also would it be possible to then get the ouput by year but also by gender and age?

Year	Gender	Prevalence
2010	F	0
2010	M	2
2011	F
2011	M
2012	F
2012	M

Tom · Posted 06-21-2022 06:45 PM

Not sure I follow. To have 2014 count as 5 then you must be treating the cases without DEATH_YEAR as ongoing. In that case what is the upper bound for those cases? Is it this year? last year? Do you have some type of last follow up date instead that might be different for different cases?

Ad30 · Posted 06-21-2022 08:30 PM

Yes cases without DEATH_YEAR are ongoing. They are still prevalent cases.

So for 2014 we would be counting

the 8 (Cases highlighted in red in the Case_Year column because they all occured in 2014 or earlier) -The 3 (Cases highlighted in blue in the Death_Year column because those patients died in 2014 or before 2014 so they cannot be counted towards the cases still active in 2014). The result would be 8-3= 5

If a patient does not have a death year they are still alive so they still count towards the prevalent cases in the year of interest.

ID	Case_year	Death_Year	Age_at_case	Gender
1	2010	2019	40	M
2	2011	2013	42	F
3	2010	2016	50	M
4	2013		52	M
5	2014	2014	61
6	2011		40	M
7	2012	2012	43	M
8	2013		56	F

Tom · Posted 06-21-2022 10:57 PM

You did not answer the second part about how long you are following these cases that have not died.

To recreate your expected output you want to stop counting in 2014.

First it is actually easier to just paste in the code to create the data than go to the trouble of making a table.

data have;
  input ID Case_year Death_Year Age_at_case Gender :$1.;
cards;
1  2010  2019  40  M
2  2011  2013  42  F
3  2010  2016  50  M
4  2013     .  52  M
5  2014  2014  61  . 
6  2011     .  40  M
7  2012  2012  43  M
8  2013     .  56  F
;

You just need to expand the data to one record per year you want to count.

%let cutoff=2015;
data years ;
  set have;
  do year=case_year to min(death_year,&cutoff)-1;
    output;
  end;
run;

Results:

Table of year by Gender

year      Gender

Frequency|
Percent  |
Row Pct  |
Col Pct  |F       |M       |  Total
---------+--------+--------+
    2010 |      0 |      2 |      2
         |   0.00 |  10.00 |  10.00
         |   0.00 | 100.00 |
         |   0.00 |  12.50 |
---------+--------+--------+
    2011 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2012 |      1 |      3 |      4
         |   5.00 |  15.00 |  20.00
         |  25.00 |  75.00 |
         |  25.00 |  18.75 |
---------+--------+--------+
    2013 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
    2014 |      1 |      4 |      5
         |   5.00 |  20.00 |  25.00
         |  20.00 |  80.00 |
         |  25.00 |  25.00 |
---------+--------+--------+
Total           4       16       20
            20.00    80.00   100.00

Ad30 · Posted 06-24-2022 01:31 AM

Thanks so much, that worked

SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Re: SAS loop a calculation over the year values in a column

Registration is open

Call for Content EXTENDED

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away