Hello I am a SAS newbie.
I have a dataset with patients with Unique IDs
For each Year in the Case_year column I want to calculate:
The count of distinct IDs where the Case_year is before or equal to that year - the count of distinct IDs where the Death_year is before or equal to that year
How do I do that?
Thanks
ID | Case_year | Death_Year |
1 | 2010 | 2019 |
2 | 2011 | 2013 |
3 | 2010 | 2016 |
4 | 2013 | |
5 | 2014 | 2010 |
6 | 2011 | |
7 | 2012 | 2012 |
8 | 2013 |
You did not answer the second part about how long you are following these cases that have not died.
To recreate your expected output you want to stop counting in 2014.
First it is actually easier to just paste in the code to create the data than go to the trouble of making a table.
data have;
input ID Case_year Death_Year Age_at_case Gender :$1.;
cards;
1 2010 2019 40 M
2 2011 2013 42 F
3 2010 2016 50 M
4 2013 . 52 M
5 2014 2014 61 .
6 2011 . 40 M
7 2012 2012 43 M
8 2013 . 56 F
;
You just need to expand the data to one record per year you want to count.
%let cutoff=2015;
data years ;
set have;
do year=case_year to min(death_year,&cutoff)-1;
output;
end;
run;
Results:
Table of year by Gender year Gender Frequency| Percent | Row Pct | Col Pct |F |M | Total ---------+--------+--------+ 2010 | 0 | 2 | 2 | 0.00 | 10.00 | 10.00 | 0.00 | 100.00 | | 0.00 | 12.50 | ---------+--------+--------+ 2011 | 1 | 3 | 4 | 5.00 | 15.00 | 20.00 | 25.00 | 75.00 | | 25.00 | 18.75 | ---------+--------+--------+ 2012 | 1 | 3 | 4 | 5.00 | 15.00 | 20.00 | 25.00 | 75.00 | | 25.00 | 18.75 | ---------+--------+--------+ 2013 | 1 | 4 | 5 | 5.00 | 20.00 | 25.00 | 20.00 | 80.00 | | 25.00 | 25.00 | ---------+--------+--------+ 2014 | 1 | 4 | 5 | 5.00 | 20.00 | 25.00 | 20.00 | 80.00 | | 25.00 | 25.00 | ---------+--------+--------+ Total 4 16 20 20.00 80.00 100.00
How do you want the output to look? There are lots of ways to count and which one is "correct" depends on the desired result.
Missing values are less than anything. So you need to clarify for your problem if you mean "death_year less than or equal to case_year" or "death_year less than or equal to case_year and not missing death_year".
Note: if the ID value is actually unique, then you don't need to count ID, just records with the condition.
year | Prevalence |
2010 | 2 |
2011 | 2 |
2012 | 3 |
2013 | 4 |
2014 | 5 |
This is what I want the output to look like
@Ad30 wrote:
year Prevalence 2010 2 2011 2 2012 3 2013 4 2014 5 This is what I want the output to look like
Okay, now you have to either:
1) explain where that 5 comes from for 2014 as your example only shows one case_year value of 2014,
2) provide output that matches the example input
3) provide input that will allow making that example output.
Sorry about that.
Okay so if this if the input
ID | Case_year | Death_Year | Age_at_case | Gender |
1 | 2010 | 2019 | 40 | M |
2 | 2011 | 2013 | 42 | F |
3 | 2010 | 2016 | 50 | M |
4 | 2013 | 52 | M | |
5 | 2014 | 2014 | 61 | |
6 | 2011 | 40 | M | |
7 | 2012 | 2012 | 43 | M |
8 | 2013 | 56 | F |
The output would be
Year | Prevalence |
2010 | 2 |
2011 | 4 |
2012 | 4 |
2013 | 5 |
2014 | 5 |
Also would it be possible to then get the ouput by year but also by gender and age?
Year | Gender | Prevalence |
2010 | F | 0 |
2010 | M | 2 |
2011 | F | |
2011 | M | |
2012 | F | |
2012 | M |
Not sure I follow. To have 2014 count as 5 then you must be treating the cases without DEATH_YEAR as ongoing. In that case what is the upper bound for those cases? Is it this year? last year? Do you have some type of last follow up date instead that might be different for different cases?
Yes cases without DEATH_YEAR are ongoing. They are still prevalent cases.
So for 2014 we would be counting
the 8 (Cases highlighted in red in the Case_Year column because they all occured in 2014 or earlier) -The 3 (Cases highlighted in blue in the Death_Year column because those patients died in 2014 or before 2014 so they cannot be counted towards the cases still active in 2014). The result would be 8-3= 5
If a patient does not have a death year they are still alive so they still count towards the prevalent cases in the year of interest.
ID | Case_year | Death_Year | Age_at_case | Gender |
1 | 2010 | 2019 | 40 | M |
2 | 2011 | 2013 | 42 | F |
3 | 2010 | 2016 | 50 | M |
4 | 2013 | 52 | M | |
5 | 2014 | 2014 | 61 | |
6 | 2011 | 40 | M | |
7 | 2012 | 2012 | 43 | M |
8 | 2013 | 56 | F |
You did not answer the second part about how long you are following these cases that have not died.
To recreate your expected output you want to stop counting in 2014.
First it is actually easier to just paste in the code to create the data than go to the trouble of making a table.
data have;
input ID Case_year Death_Year Age_at_case Gender :$1.;
cards;
1 2010 2019 40 M
2 2011 2013 42 F
3 2010 2016 50 M
4 2013 . 52 M
5 2014 2014 61 .
6 2011 . 40 M
7 2012 2012 43 M
8 2013 . 56 F
;
You just need to expand the data to one record per year you want to count.
%let cutoff=2015;
data years ;
set have;
do year=case_year to min(death_year,&cutoff)-1;
output;
end;
run;
Results:
Table of year by Gender year Gender Frequency| Percent | Row Pct | Col Pct |F |M | Total ---------+--------+--------+ 2010 | 0 | 2 | 2 | 0.00 | 10.00 | 10.00 | 0.00 | 100.00 | | 0.00 | 12.50 | ---------+--------+--------+ 2011 | 1 | 3 | 4 | 5.00 | 15.00 | 20.00 | 25.00 | 75.00 | | 25.00 | 18.75 | ---------+--------+--------+ 2012 | 1 | 3 | 4 | 5.00 | 15.00 | 20.00 | 25.00 | 75.00 | | 25.00 | 18.75 | ---------+--------+--------+ 2013 | 1 | 4 | 5 | 5.00 | 20.00 | 25.00 | 20.00 | 80.00 | | 25.00 | 25.00 | ---------+--------+--------+ 2014 | 1 | 4 | 5 | 5.00 | 20.00 | 25.00 | 20.00 | 80.00 | | 25.00 | 25.00 | ---------+--------+--------+ Total 4 16 20 20.00 80.00 100.00
Thanks so much, that worked
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.