BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
amail94
Fluorite | Level 6

Hey all,

 

I'm trying to create a flag variable that denotes whether a patient has been seen once per year over every year of interest (2012-2017). My dataset contains multiple observations per ID per year with the amount of obs dependent on how often they visited the hospital.

 

Ideally, I want the result to look something like this:

 

ID            discharge_year       years_visited

1              2015                        1

1              2015                        1 

1              2016                        2

1              2016                        2

1              2017                        3

2              2012                        1

2              2013                        2

2              2015                        3

3              2012                        1

3              2013                        2

3              2014                        3

3              2015                        4 

 

The goal is to exclude anybody whose years_visited is < 6. As of now I'm making dummy variables for each discharge year and trying to add them together but i'm clearly not telling SAS correctly that it will need to skip across observations within the same person to find these 6 possible years.

 

Any advice greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Something fancy

 


data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;

data want;
 do _n_=1 by 1 until(last.id);
  do until(last.dsch_year);
   set have;
   by id dsch_year;
   years=1;
   years_visited=_n_;
   if not first.dsch_year then call missing(years,years_visited);
   output;
  end;
 end;
run;

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

You want the result to look as shown — got it. But you also have to show us the original data, the starting point, so we can help you create code that goes from what you have to what you want.

--
Paige Miller
amail94
Fluorite | Level 6

Apologies, I'm not quite understanding what you mean by that. This sample data is pretty similar to what I'm messing with currently except that the ID's are hospital records numbers (which I can't provide) and there are many more observations per ID since some people are in the hospital very often.

 

I think i'm completely missing the ball by trying to use a do loop. Here is some code I just tried to use instead. It appears to correctly assign a '1' to each new year but it isn't adding together all the 1's by corp_id and is instead just showing a 1 for years_visited wherever years=1.

 

data utilization2;
	set utilization1;
	by corp_id;
	years=.;
	years_visited=.;
	if first.dsch_year then years=1;
	if dsch_year gt lag(dsch_year) then years+1;
	years_visited=sum(years);
run;

 

PaigeMiller
Diamond | Level 26

We want to see the original data. You have shown us the end result that you want.

--
Paige Miller
amail94
Fluorite | Level 6

Here's a portion of the data for one individual so ID would=1 for all rows. reading left column to right the variables are discharge_year, years, years_visited

 

 2015 1 1
 2015 . .
 2015 . .
 2016 1 1
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2016 . .
 2017 1 1
 2017 . .
 2017 . .
 2017 . .
 2017 . .
 2017 . .
 2017 . .
 2017 . .

 

Ideally, the far right column (years_visited) changes from 1 -> 2 -> 3 as each new year is reached.

Krueger
Pyrite | Level 9

I think everyone wants to see utilization1 dataset (or even potentially earlier to see what the actual data looks like. Maybe a sample of some of this data would work too.

 

E.g. 

 

data have;
	input id discharge_year visit_year;
	datalines; 
	1 2015 1
	1 2015 1
	1 2017 1
	2 2015 1
	2 2016 1
	;
run;

Based off your code above however you might be able to try this?

 

data utilization2;
	set utilization1;
	by corp_id;
	years_visited=.;
retain years; if first.dsch_year then years=1;
else years+1; run;
novinosrin
Tourmaline | Level 20

HI @amail94   To clarify what @PaigeMiller  asks is

 

1. A sample of the dataset you have (INPUT)

2. The expected OUTPUT sample for the dataset input

3. An explanation of the business logic to derive the OUTPUT, make it relevant

 

Write briefly the above 3 in neat points and community would give you the solution you need.  Once you get the solution, you could copy/paste and replicate to your original(official) data at your place of work/college etc. That's all it is. If you need clarification, of course the discussion can go further. I hope that helps?

 

HTH 

amail94
Fluorite | Level 6

Got it. Thanks for the clarification.

 

Here's the input (original data):

idgenotypedsch_datedischarge_diagnosisdsch_year
131SS31Aug2015 2015
131SS05Oct2015 2015
131SS14Dec2015 2015
131SS29Feb2016 2016
131SS16Apr2016 2016
131SS02May2016 2016
131SS06Jun2016 2016
131SS08Aug2016 2016
131SS10Oct2016 2016
131SS16Nov2016 2016
131SS12Dec2016 2016
131SS13Feb2017 2017
131SS17Apr2017 2017
131SS23Apr2017FEVER/INFECTION2017
131SS26Apr2017 2017
131SS19Jun2017 2017
131SS21Aug2017 2017
131SS16Oct2017 2017
131SS18Dec2017 2017
295SC20Aug2015 2015
670SS14Aug2015 2015
670SS28Sep2015 2015
670SS13Oct2015 2015
670SS04Dec2015 2015
670SS18Jan2016 2016
670SS26Apr2016 2016

 

Here's the expected:

idgenotypedsch_datedischarge_diagnosisdsch_yearyearsyears_visited
131SS31Aug2015 201511
131SS05Oct2015 2015..
131SS14Dec2015 2015..
131SS29Feb2016 201612
131SS16Apr2016 2016..
131SS02May2016 2016..
131SS06Jun2016 2016..
131SS08Aug2016 2016..
131SS10Oct2016 2016..
131SS16Nov2016 2016..
131SS12Dec2016 2016..
131SS13Feb2017 201713
131SS17Apr2017 2017..
131SS23Apr2017FEVER/INFECTION2017..
131SS26Apr2017 2017..
131SS19Jun2017 2017..
131SS21Aug2017 2017..
131SS16Oct2017 2017..
131SS18Dec2017 2017..
295SC20Aug2015 201511
670SS14Aug2015 201511
670SS28Sep2015 2015..
670SS13Oct2015 2015..
670SS04Dec2015 2015..
670SS18Jan2016 201612
670SS26Apr2016 2016..

 

data utilization2;
	set utilization1;
	by corp_id;
	years=.;
	years_visited=.;
	if first.dsch_year then years=1;
	if dsch_year gt lag(dsch_year) then years+1;
run;

Here is another variation of the code I tried to use. I guess the thinking is that I am telling SAS to assign a 1 to the first dsch_year of that ID then count each subsequent year, within that same ID, and sum them so that I see how many individual years a person was in the hospital.

novinosrin
Tourmaline | Level 20
data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;

data want;
 set have;
 by id dsch_year;
 if first.id  then do;
  years=1;
  years_visited=1;
 end;
 else if first.dsch_year then do;years=1; years_visited+1;end;
run;
novinosrin
Tourmaline | Level 20

Something fancy

 


data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;

data want;
 do _n_=1 by 1 until(last.id);
  do until(last.dsch_year);
   set have;
   by id dsch_year;
   years=1;
   years_visited=_n_;
   if not first.dsch_year then call missing(years,years_visited);
   output;
  end;
 end;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1264 views
  • 3 likes
  • 4 in conversation