Hey all,
I'm trying to create a flag variable that denotes whether a patient has been seen once per year over every year of interest (2012-2017). My dataset contains multiple observations per ID per year with the amount of obs dependent on how often they visited the hospital.
Ideally, I want the result to look something like this:
ID discharge_year years_visited
1 2015 1
1 2015 1
1 2016 2
1 2016 2
1 2017 3
2 2012 1
2 2013 2
2 2015 3
3 2012 1
3 2013 2
3 2014 3
3 2015 4
The goal is to exclude anybody whose years_visited is < 6. As of now I'm making dummy variables for each discharge year and trying to add them together but i'm clearly not telling SAS correctly that it will need to skip across observations within the same person to find these 6 possible years.
Any advice greatly appreciated.
Something fancy
data have;
input id genotype $ dsch_date :date9. discharge_diagnosis :$20. dsch_year;
format dsch_date date9.;
cards;
131 SS 31-Aug-15 . 2015
131 SS 5-Oct-15 . 2015
131 SS 14-Dec-15 . 2015
131 SS 29-Feb-16 . 2016
131 SS 16-Apr-16 . 2016
131 SS 2-May-16 . 2016
131 SS 6-Jun-16 . 2016
131 SS 8-Aug-16 . 2016
131 SS 10-Oct-16 . 2016
131 SS 16-Nov-16 . 2016
131 SS 12-Dec-16 . 2016
131 SS 13-Feb-17 . 2017
131 SS 17-Apr-17 . 2017
131 SS 23-Apr-17 FEVER/INFECTION 2017
131 SS 26-Apr-17 . 2017
131 SS 19-Jun-17 . 2017
131 SS 21-Aug-17 . 2017
131 SS 16-Oct-17 . 2017
131 SS 18-Dec-17 . 2017
295 SC 20-Aug-15 . 2015
670 SS 14-Aug-15 . 2015
670 SS 28-Sep-15 . 2015
670 SS 13-Oct-15 . 2015
670 SS 4-Dec-15 . 2015
670 SS 18-Jan-16 . 2016
670 SS 26-Apr-16 . 2016
;
data want;
do _n_=1 by 1 until(last.id);
do until(last.dsch_year);
set have;
by id dsch_year;
years=1;
years_visited=_n_;
if not first.dsch_year then call missing(years,years_visited);
output;
end;
end;
run;
You want the result to look as shown — got it. But you also have to show us the original data, the starting point, so we can help you create code that goes from what you have to what you want.
Apologies, I'm not quite understanding what you mean by that. This sample data is pretty similar to what I'm messing with currently except that the ID's are hospital records numbers (which I can't provide) and there are many more observations per ID since some people are in the hospital very often.
I think i'm completely missing the ball by trying to use a do loop. Here is some code I just tried to use instead. It appears to correctly assign a '1' to each new year but it isn't adding together all the 1's by corp_id and is instead just showing a 1 for years_visited wherever years=1.
data utilization2;
set utilization1;
by corp_id;
years=.;
years_visited=.;
if first.dsch_year then years=1;
if dsch_year gt lag(dsch_year) then years+1;
years_visited=sum(years);
run;
We want to see the original data. You have shown us the end result that you want.
Here's a portion of the data for one individual so ID would=1 for all rows. reading left column to right the variables are discharge_year, years, years_visited
2015 1 1
2015 . .
2015 . .
2016 1 1
2016 . .
2016 . .
2016 . .
2016 . .
2016 . .
2016 . .
2016 . .
2017 1 1
2017 . .
2017 . .
2017 . .
2017 . .
2017 . .
2017 . .
2017 . .
Ideally, the far right column (years_visited) changes from 1 -> 2 -> 3 as each new year is reached.
I think everyone wants to see utilization1 dataset (or even potentially earlier to see what the actual data looks like. Maybe a sample of some of this data would work too.
E.g.
data have;
input id discharge_year visit_year;
datalines;
1 2015 1
1 2015 1
1 2017 1
2 2015 1
2 2016 1
;
run;
Based off your code above however you might be able to try this?
data utilization2;
set utilization1;
by corp_id;
years_visited=.;
retain years;
if first.dsch_year then years=1;
else years+1;
run;
HI @amail94 To clarify what @PaigeMiller asks is
1. A sample of the dataset you have (INPUT)
2. The expected OUTPUT sample for the dataset input
3. An explanation of the business logic to derive the OUTPUT, make it relevant
Write briefly the above 3 in neat points and community would give you the solution you need. Once you get the solution, you could copy/paste and replicate to your original(official) data at your place of work/college etc. That's all it is. If you need clarification, of course the discussion can go further. I hope that helps?
HTH
Got it. Thanks for the clarification.
Here's the input (original data):
id | genotype | dsch_date | discharge_diagnosis | dsch_year |
131 | SS | 31Aug2015 | 2015 | |
131 | SS | 05Oct2015 | 2015 | |
131 | SS | 14Dec2015 | 2015 | |
131 | SS | 29Feb2016 | 2016 | |
131 | SS | 16Apr2016 | 2016 | |
131 | SS | 02May2016 | 2016 | |
131 | SS | 06Jun2016 | 2016 | |
131 | SS | 08Aug2016 | 2016 | |
131 | SS | 10Oct2016 | 2016 | |
131 | SS | 16Nov2016 | 2016 | |
131 | SS | 12Dec2016 | 2016 | |
131 | SS | 13Feb2017 | 2017 | |
131 | SS | 17Apr2017 | 2017 | |
131 | SS | 23Apr2017 | FEVER/INFECTION | 2017 |
131 | SS | 26Apr2017 | 2017 | |
131 | SS | 19Jun2017 | 2017 | |
131 | SS | 21Aug2017 | 2017 | |
131 | SS | 16Oct2017 | 2017 | |
131 | SS | 18Dec2017 | 2017 | |
295 | SC | 20Aug2015 | 2015 | |
670 | SS | 14Aug2015 | 2015 | |
670 | SS | 28Sep2015 | 2015 | |
670 | SS | 13Oct2015 | 2015 | |
670 | SS | 04Dec2015 | 2015 | |
670 | SS | 18Jan2016 | 2016 | |
670 | SS | 26Apr2016 | 2016 |
Here's the expected:
id | genotype | dsch_date | discharge_diagnosis | dsch_year | years | years_visited |
131 | SS | 31Aug2015 | 2015 | 1 | 1 | |
131 | SS | 05Oct2015 | 2015 | . | . | |
131 | SS | 14Dec2015 | 2015 | . | . | |
131 | SS | 29Feb2016 | 2016 | 1 | 2 | |
131 | SS | 16Apr2016 | 2016 | . | . | |
131 | SS | 02May2016 | 2016 | . | . | |
131 | SS | 06Jun2016 | 2016 | . | . | |
131 | SS | 08Aug2016 | 2016 | . | . | |
131 | SS | 10Oct2016 | 2016 | . | . | |
131 | SS | 16Nov2016 | 2016 | . | . | |
131 | SS | 12Dec2016 | 2016 | . | . | |
131 | SS | 13Feb2017 | 2017 | 1 | 3 | |
131 | SS | 17Apr2017 | 2017 | . | . | |
131 | SS | 23Apr2017 | FEVER/INFECTION | 2017 | . | . |
131 | SS | 26Apr2017 | 2017 | . | . | |
131 | SS | 19Jun2017 | 2017 | . | . | |
131 | SS | 21Aug2017 | 2017 | . | . | |
131 | SS | 16Oct2017 | 2017 | . | . | |
131 | SS | 18Dec2017 | 2017 | . | . | |
295 | SC | 20Aug2015 | 2015 | 1 | 1 | |
670 | SS | 14Aug2015 | 2015 | 1 | 1 | |
670 | SS | 28Sep2015 | 2015 | . | . | |
670 | SS | 13Oct2015 | 2015 | . | . | |
670 | SS | 04Dec2015 | 2015 | . | . | |
670 | SS | 18Jan2016 | 2016 | 1 | 2 | |
670 | SS | 26Apr2016 | 2016 | . | . |
data utilization2;
set utilization1;
by corp_id;
years=.;
years_visited=.;
if first.dsch_year then years=1;
if dsch_year gt lag(dsch_year) then years+1;
run;
Here is another variation of the code I tried to use. I guess the thinking is that I am telling SAS to assign a 1 to the first dsch_year of that ID then count each subsequent year, within that same ID, and sum them so that I see how many individual years a person was in the hospital.
data have;
input id genotype $ dsch_date :date9. discharge_diagnosis :$20. dsch_year;
format dsch_date date9.;
cards;
131 SS 31-Aug-15 . 2015
131 SS 5-Oct-15 . 2015
131 SS 14-Dec-15 . 2015
131 SS 29-Feb-16 . 2016
131 SS 16-Apr-16 . 2016
131 SS 2-May-16 . 2016
131 SS 6-Jun-16 . 2016
131 SS 8-Aug-16 . 2016
131 SS 10-Oct-16 . 2016
131 SS 16-Nov-16 . 2016
131 SS 12-Dec-16 . 2016
131 SS 13-Feb-17 . 2017
131 SS 17-Apr-17 . 2017
131 SS 23-Apr-17 FEVER/INFECTION 2017
131 SS 26-Apr-17 . 2017
131 SS 19-Jun-17 . 2017
131 SS 21-Aug-17 . 2017
131 SS 16-Oct-17 . 2017
131 SS 18-Dec-17 . 2017
295 SC 20-Aug-15 . 2015
670 SS 14-Aug-15 . 2015
670 SS 28-Sep-15 . 2015
670 SS 13-Oct-15 . 2015
670 SS 4-Dec-15 . 2015
670 SS 18-Jan-16 . 2016
670 SS 26-Apr-16 . 2016
;
data want;
set have;
by id dsch_year;
if first.id then do;
years=1;
years_visited=1;
end;
else if first.dsch_year then do;years=1; years_visited+1;end;
run;
Something fancy
data have;
input id genotype $ dsch_date :date9. discharge_diagnosis :$20. dsch_year;
format dsch_date date9.;
cards;
131 SS 31-Aug-15 . 2015
131 SS 5-Oct-15 . 2015
131 SS 14-Dec-15 . 2015
131 SS 29-Feb-16 . 2016
131 SS 16-Apr-16 . 2016
131 SS 2-May-16 . 2016
131 SS 6-Jun-16 . 2016
131 SS 8-Aug-16 . 2016
131 SS 10-Oct-16 . 2016
131 SS 16-Nov-16 . 2016
131 SS 12-Dec-16 . 2016
131 SS 13-Feb-17 . 2017
131 SS 17-Apr-17 . 2017
131 SS 23-Apr-17 FEVER/INFECTION 2017
131 SS 26-Apr-17 . 2017
131 SS 19-Jun-17 . 2017
131 SS 21-Aug-17 . 2017
131 SS 16-Oct-17 . 2017
131 SS 18-Dec-17 . 2017
295 SC 20-Aug-15 . 2015
670 SS 14-Aug-15 . 2015
670 SS 28-Sep-15 . 2015
670 SS 13-Oct-15 . 2015
670 SS 4-Dec-15 . 2015
670 SS 18-Jan-16 . 2016
670 SS 26-Apr-16 . 2016
;
data want;
do _n_=1 by 1 until(last.id);
do until(last.dsch_year);
set have;
by id dsch_year;
years=1;
years_visited=_n_;
if not first.dsch_year then call missing(years,years_visited);
output;
end;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.