Hey all,
I'm trying to create a flag variable that denotes whether a patient has been seen once per year over every year of interest (2012-2017). My dataset contains multiple observations per ID per year with the amount of obs dependent on how often they visited the hospital.
Ideally, I want the result to look something like this:
ID discharge_year years_visited
1 2015 1
1 2015 1
1 2016 2
1 2016 2
1 2017 3
2 2012 1
2 2013 2
2 2015 3
3 2012 1
3 2013 2
3 2014 3
3 2015 4
The goal is to exclude anybody whose years_visited is < 6. As of now I'm making dummy variables for each discharge year and trying to add them together but i'm clearly not telling SAS correctly that it will need to skip across observations within the same person to find these 6 possible years.
Any advice greatly appreciated.
Something fancy
data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;
data want;
 do _n_=1 by 1 until(last.id);
  do until(last.dsch_year);
   set have;
   by id dsch_year;
   years=1;
   years_visited=_n_;
   if not first.dsch_year then call missing(years,years_visited);
   output;
  end;
 end;
run;You want the result to look as shown — got it. But you also have to show us the original data, the starting point, so we can help you create code that goes from what you have to what you want.
Apologies, I'm not quite understanding what you mean by that. This sample data is pretty similar to what I'm messing with currently except that the ID's are hospital records numbers (which I can't provide) and there are many more observations per ID since some people are in the hospital very often.
I think i'm completely missing the ball by trying to use a do loop. Here is some code I just tried to use instead. It appears to correctly assign a '1' to each new year but it isn't adding together all the 1's by corp_id and is instead just showing a 1 for years_visited wherever years=1.
data utilization2;
	set utilization1;
	by corp_id;
	years=.;
	years_visited=.;
	if first.dsch_year then years=1;
	if dsch_year gt lag(dsch_year) then years+1;
	years_visited=sum(years);
run;
We want to see the original data. You have shown us the end result that you want.
Here's a portion of the data for one individual so ID would=1 for all rows. reading left column to right the variables are discharge_year, years, years_visited
 2015 1 1
 2015 . .
 2015 . .
 2016 1 1
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2017 1 1
 2017 . .
 2017 . .
 2017 . .
 2017 . .
 2017 . .
 2017 . .
 2017 . .
Ideally, the far right column (years_visited) changes from 1 -> 2 -> 3 as each new year is reached.
I think everyone wants to see utilization1 dataset (or even potentially earlier to see what the actual data looks like. Maybe a sample of some of this data would work too.
E.g.
data have;
	input id discharge_year visit_year;
	datalines; 
	1 2015 1
	1 2015 1
	1 2017 1
	2 2015 1
	2 2016 1
	;
run;Based off your code above however you might be able to try this?
data utilization2;
	set utilization1;
	by corp_id;
	years_visited=.;
        retain years;
	if first.dsch_year then years=1;
        else years+1;
run;HI @amail94 To clarify what @PaigeMiller asks is
1. A sample of the dataset you have (INPUT)
2. The expected OUTPUT sample for the dataset input
3. An explanation of the business logic to derive the OUTPUT, make it relevant
Write briefly the above 3 in neat points and community would give you the solution you need. Once you get the solution, you could copy/paste and replicate to your original(official) data at your place of work/college etc. That's all it is. If you need clarification, of course the discussion can go further. I hope that helps?
HTH
Got it. Thanks for the clarification.
Here's the input (original data):
| id | genotype | dsch_date | discharge_diagnosis | dsch_year | 
| 131 | SS | 31Aug2015 | 2015 | |
| 131 | SS | 05Oct2015 | 2015 | |
| 131 | SS | 14Dec2015 | 2015 | |
| 131 | SS | 29Feb2016 | 2016 | |
| 131 | SS | 16Apr2016 | 2016 | |
| 131 | SS | 02May2016 | 2016 | |
| 131 | SS | 06Jun2016 | 2016 | |
| 131 | SS | 08Aug2016 | 2016 | |
| 131 | SS | 10Oct2016 | 2016 | |
| 131 | SS | 16Nov2016 | 2016 | |
| 131 | SS | 12Dec2016 | 2016 | |
| 131 | SS | 13Feb2017 | 2017 | |
| 131 | SS | 17Apr2017 | 2017 | |
| 131 | SS | 23Apr2017 | FEVER/INFECTION | 2017 | 
| 131 | SS | 26Apr2017 | 2017 | |
| 131 | SS | 19Jun2017 | 2017 | |
| 131 | SS | 21Aug2017 | 2017 | |
| 131 | SS | 16Oct2017 | 2017 | |
| 131 | SS | 18Dec2017 | 2017 | |
| 295 | SC | 20Aug2015 | 2015 | |
| 670 | SS | 14Aug2015 | 2015 | |
| 670 | SS | 28Sep2015 | 2015 | |
| 670 | SS | 13Oct2015 | 2015 | |
| 670 | SS | 04Dec2015 | 2015 | |
| 670 | SS | 18Jan2016 | 2016 | |
| 670 | SS | 26Apr2016 | 2016 | 
Here's the expected:
| id | genotype | dsch_date | discharge_diagnosis | dsch_year | years | years_visited | 
| 131 | SS | 31Aug2015 | 2015 | 1 | 1 | |
| 131 | SS | 05Oct2015 | 2015 | . | . | |
| 131 | SS | 14Dec2015 | 2015 | . | . | |
| 131 | SS | 29Feb2016 | 2016 | 1 | 2 | |
| 131 | SS | 16Apr2016 | 2016 | . | . | |
| 131 | SS | 02May2016 | 2016 | . | . | |
| 131 | SS | 06Jun2016 | 2016 | . | . | |
| 131 | SS | 08Aug2016 | 2016 | . | . | |
| 131 | SS | 10Oct2016 | 2016 | . | . | |
| 131 | SS | 16Nov2016 | 2016 | . | . | |
| 131 | SS | 12Dec2016 | 2016 | . | . | |
| 131 | SS | 13Feb2017 | 2017 | 1 | 3 | |
| 131 | SS | 17Apr2017 | 2017 | . | . | |
| 131 | SS | 23Apr2017 | FEVER/INFECTION | 2017 | . | . | 
| 131 | SS | 26Apr2017 | 2017 | . | . | |
| 131 | SS | 19Jun2017 | 2017 | . | . | |
| 131 | SS | 21Aug2017 | 2017 | . | . | |
| 131 | SS | 16Oct2017 | 2017 | . | . | |
| 131 | SS | 18Dec2017 | 2017 | . | . | |
| 295 | SC | 20Aug2015 | 2015 | 1 | 1 | |
| 670 | SS | 14Aug2015 | 2015 | 1 | 1 | |
| 670 | SS | 28Sep2015 | 2015 | . | . | |
| 670 | SS | 13Oct2015 | 2015 | . | . | |
| 670 | SS | 04Dec2015 | 2015 | . | . | |
| 670 | SS | 18Jan2016 | 2016 | 1 | 2 | |
| 670 | SS | 26Apr2016 | 2016 | . | . | 
data utilization2;
	set utilization1;
	by corp_id;
	years=.;
	years_visited=.;
	if first.dsch_year then years=1;
	if dsch_year gt lag(dsch_year) then years+1;
run;Here is another variation of the code I tried to use. I guess the thinking is that I am telling SAS to assign a 1 to the first dsch_year of that ID then count each subsequent year, within that same ID, and sum them so that I see how many individual years a person was in the hospital.
data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;
data want;
 set have;
 by id dsch_year;
 if first.id  then do;
  years=1;
  years_visited=1;
 end;
 else if first.dsch_year then do;years=1; years_visited+1;end;
run;Something fancy
data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;
data want;
 do _n_=1 by 1 until(last.id);
  do until(last.dsch_year);
   set have;
   by id dsch_year;
   years=1;
   years_visited=_n_;
   if not first.dsch_year then call missing(years,years_visited);
   output;
  end;
 end;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
