Solved: Re: Do Loop Help, Can't get SAS to read through observations

amail94 · Posted 12-18-2019 02:11 PM

Hey all,

I'm trying to create a flag variable that denotes whether a patient has been seen once per year over every year of interest (2012-2017). My dataset contains multiple observations per ID per year with the amount of obs dependent on how often they visited the hospital.

Ideally, I want the result to look something like this:

ID discharge_year years_visited

1 2015 1

1 2016 2

1 2017 3

2 2012 1

2 2013 2

2 2015 3

3 2012 1

3 2013 2

3 2014 3

3 2015 4

The goal is to exclude anybody whose years_visited is < 6. As of now I'm making dummy variables for each discharge year and trying to add them together but i'm clearly not telling SAS correctly that it will need to skip across observations within the same person to find these 6 possible years.

Any advice greatly appreciated.

novinosrin · Posted 12-18-2019 04:00 PM

Something fancy


data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;

data want;
 do _n_=1 by 1 until(last.id);
  do until(last.dsch_year);
   set have;
   by id dsch_year;
   years=1;
   years_visited=_n_;
   if not first.dsch_year then call missing(years,years_visited);
   output;
  end;
 end;
run;

View solution in original post

PaigeMiller · Posted 12-18-2019 02:20 PM

You want the result to look as shown — got it. But you also have to show us the original data, the starting point, so we can help you create code that goes from what you have to what you want.

--
Paige Miller

amail94 · Posted 12-18-2019 02:32 PM

Apologies, I'm not quite understanding what you mean by that. This sample data is pretty similar to what I'm messing with currently except that the ID's are hospital records numbers (which I can't provide) and there are many more observations per ID since some people are in the hospital very often.

I think i'm completely missing the ball by trying to use a do loop. Here is some code I just tried to use instead. It appears to correctly assign a '1' to each new year but it isn't adding together all the 1's by corp_id and is instead just showing a 1 for years_visited wherever years=1.

data utilization2;
	set utilization1;
	by corp_id;
	years=.;
	years_visited=.;
	if first.dsch_year then years=1;
	if dsch_year gt lag(dsch_year) then years+1;
	years_visited=sum(years);
run;

PaigeMiller · Posted 12-18-2019 02:34 PM

We want to see the original data. You have shown us the end result that you want.

--
Paige Miller

amail94 · Posted 12-18-2019 02:40 PM

Here's a portion of the data for one individual so ID would=1 for all rows. reading left column to right the variables are discharge_year, years, years_visited

2015 1 1
2015 . .
2015 . .
2016 1 1
2016 . .
2016 . .
2016 . .
2016 . .
2016 . .
2016 . .
2016 . .
2017 1 1
2017 . .
2017 . .
2017 . .
2017 . .
2017 . .
2017 . .
2017 . .

Ideally, the far right column (years_visited) changes from 1 -> 2 -> 3 as each new year is reached.

Krueger · Posted 12-18-2019 02:54 PM

I think everyone wants to see utilization1 dataset (or even potentially earlier to see what the actual data looks like. Maybe a sample of some of this data would work too.

E.g.

data have;
	input id discharge_year visit_year;
	datalines; 
	1 2015 1
	1 2015 1
	1 2017 1
	2 2015 1
	2 2016 1
	;
run;

Based off your code above however you might be able to try this?

data utilization2;
	set utilization1;
	by corp_id;
	years_visited=.;
        retain years;
	if first.dsch_year then years=1;
        else years+1;
run;

novinosrin · Posted 12-18-2019 03:09 PM

HI @amail94 To clarify what @PaigeMiller asks is

1. A sample of the dataset you have (INPUT)

2. The expected OUTPUT sample for the dataset input

3. An explanation of the business logic to derive the OUTPUT, make it relevant

Write briefly the above 3 in neat points and community would give you the solution you need. Once you get the solution, you could copy/paste and replicate to your original(official) data at your place of work/college etc. That's all it is. If you need clarification, of course the discussion can go further. I hope that helps?

HTH

amail94 · Posted 12-18-2019 03:23 PM

Got it. Thanks for the clarification.

Here's the input (original data):

id	genotype	dsch_date	discharge_diagnosis	dsch_year
131	SS	31Aug2015		2015
131	SS	05Oct2015		2015
131	SS	14Dec2015		2015
131	SS	29Feb2016		2016
131	SS	16Apr2016		2016
131	SS	02May2016		2016
131	SS	06Jun2016		2016
131	SS	08Aug2016		2016
131	SS	10Oct2016		2016
131	SS	16Nov2016		2016
131	SS	12Dec2016		2016
131	SS	13Feb2017		2017
131	SS	17Apr2017		2017
131	SS	23Apr2017	FEVER/INFECTION	2017
131	SS	26Apr2017		2017
131	SS	19Jun2017		2017
131	SS	21Aug2017		2017
131	SS	16Oct2017		2017
131	SS	18Dec2017		2017
295	SC	20Aug2015		2015
670	SS	14Aug2015		2015
670	SS	28Sep2015		2015
670	SS	13Oct2015		2015
670	SS	04Dec2015		2015
670	SS	18Jan2016		2016
670	SS	26Apr2016		2016

Here's the expected:

id	genotype	dsch_date	discharge_diagnosis	dsch_year	years	years_visited
131	SS	31Aug2015		2015	1	1
131	SS	05Oct2015		2015	.	.
131	SS	14Dec2015		2015	.	.
131	SS	29Feb2016		2016	1	2
131	SS	16Apr2016		2016	.	.
131	SS	02May2016		2016	.	.
131	SS	06Jun2016		2016	.	.
131	SS	08Aug2016		2016	.	.
131	SS	10Oct2016		2016	.	.
131	SS	16Nov2016		2016	.	.
131	SS	12Dec2016		2016	.	.
131	SS	13Feb2017		2017	1	3
131	SS	17Apr2017		2017	.	.
131	SS	23Apr2017	FEVER/INFECTION	2017	.	.
131	SS	26Apr2017		2017	.	.
131	SS	19Jun2017		2017	.	.
131	SS	21Aug2017		2017	.	.
131	SS	16Oct2017		2017	.	.
131	SS	18Dec2017		2017	.	.
295	SC	20Aug2015		2015	1	1
670	SS	14Aug2015		2015	1	1
670	SS	28Sep2015		2015	.	.
670	SS	13Oct2015		2015	.	.
670	SS	04Dec2015		2015	.	.
670	SS	18Jan2016		2016	1	2
670	SS	26Apr2016		2016	.	.

data utilization2;
	set utilization1;
	by corp_id;
	years=.;
	years_visited=.;
	if first.dsch_year then years=1;
	if dsch_year gt lag(dsch_year) then years+1;
run;

Here is another variation of the code I tried to use. I guess the thinking is that I am telling SAS to assign a 1 to the first dsch_year of that ID then count each subsequent year, within that same ID, and sum them so that I see how many individual years a person was in the hospital.

novinosrin · Posted 12-18-2019 03:46 PM

data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;

data want;
 set have;
 by id dsch_year;
 if first.id  then do;
  years=1;
  years_visited=1;
 end;
 else if first.dsch_year then do;years=1; years_visited+1;end;
run;

novinosrin · Posted 12-18-2019 04:00 PM

Something fancy


data have;
input id	genotype $	dsch_date :date9.	discharge_diagnosis :$20.	dsch_year;
format dsch_date date9.;
cards;
131	SS	31-Aug-15	.	2015
131	SS	5-Oct-15	.	2015
131	SS	14-Dec-15	.	2015
131	SS	29-Feb-16	.	2016
131	SS	16-Apr-16	.	2016
131	SS	2-May-16	.	2016
131	SS	6-Jun-16	.	2016
131	SS	8-Aug-16	.	2016
131	SS	10-Oct-16	.	2016
131	SS	16-Nov-16	.	2016
131	SS	12-Dec-16	.	2016
131	SS	13-Feb-17	.	2017
131	SS	17-Apr-17	.	2017
131	SS	23-Apr-17	FEVER/INFECTION	2017
131	SS	26-Apr-17	.	2017
131	SS	19-Jun-17	.	2017
131	SS	21-Aug-17	.	2017
131	SS	16-Oct-17	.	2017
131	SS	18-Dec-17	.	2017
295	SC	20-Aug-15	.	2015
670	SS	14-Aug-15	.	2015
670	SS	28-Sep-15	.	2015
670	SS	13-Oct-15	.	2015
670	SS	4-Dec-15	.	2015
670	SS	18-Jan-16	.	2016
670	SS	26-Apr-16	.	2016
;

data want;
 do _n_=1 by 1 until(last.id);
  do until(last.dsch_year);
   set have;
   by id dsch_year;
   years=1;
   years_visited=_n_;
   if not first.dsch_year then call missing(years,years_visited);
   output;
  end;
 end;
run;

SAS Training: Just a Click Away