DATA Step, Macro, Functions and more

Need to call a SQL query in a data step

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Need to call a SQL query in a data step

[ Edited ]

Hi,

In order to do some data preparation, I need to run a program like the following, where I can use a SQL SELECT statement within a data step. I am not sure whether this is something doable is SAS or I need to use a higher-level language (e.g., Java) for that.

I appreciate if you could help me out.

 

//update: I want to assign different values to each hospital visit of a patient based on the time it has passed between their last discharge and the current visit. If they visit much earlier than the hitherto data set average, they will get a larger number showing they are riskier.

I have attached a sample data with desired output.

 

Thanks for your great support! 

 

data in;
   set out;
   by patient_sk;

   if first.patient_sk then prob = (select avg(readmission_time) from out where discharged_dt_tm <= date1); // date1 is the value of the date_var in the current record
   else prob = (admitted_dt_tm - lag(discharged_dt_tm)) / (select avg(readmission_time) from out where discharged_dt_tm <= date1);
run;

 


Accepted Solutions
Solution
‎08-25-2017 03:38 PM
Super User
Super User
Posts: 8,289

Re: Need to call a SQL query in a data step

Posted in reply to hamed_majidi_gmail_com

Let's change the names of the dataset to HAVE and WANT since your example code is using IN and OUT in a confusing way.

You should be able to just calculate the mean value BEFORE starting the data step.  If you really want to compare the value of Z to value of Z on the previous record then you need have a variable that orders the rows within the X groups.  Let's assume that you want to order them by DATE_VAR.(and that DATE_VAR is unique within values of X).

 

proc sql ;
create table middle as
  select a.*,(select mean(b.z) from have b where b.date_var <= a.date_var) as avg_z
  from have a
  order by x, date_var
;
quit;

Now you can use a data step to calculate your new Y variable.   Note that you should NOT use the LAG() function in conditionally executed code as it messes up how it keeps track of the previous values.

data want ;
   set middle;
   by x;
   y = divide(z - lag(z),avg_z );
   if first.x then y = avg_z ;
run;

View solution in original post


All Replies
Super User
Posts: 6,935

Re: Need to call a SQL query in a data step

Posted in reply to hamed_majidi_gmail_com

My SQL isn't strong enough to do this, but I think I can give you an approach, based on the strengths of SQL vs. a DATA step.

 

First, append the average z value (respecting the date conditions) to every observation.  PROC SQL should be able to do that.

 

Then go through a DATA step to compute y.

Occasional Contributor
Posts: 6

Re: Need to call a SQL query in a data step

Posted in reply to Astounding

Hi,

 

Thanks for helping. I like your idea, but I think that approach will still need a combination of Proc and data steps. The select statement is not one statement; it varies based on where in the data set the current processing is.

Super User
Posts: 24,026

Re: Need to call a SQL query in a data step

Posted in reply to hamed_majidi_gmail_com

Post sample data and expected output please. This is straightforward in multiple steps, but I assume your purpose here is to do it in one step?

Occasional Contributor
Posts: 6

Re: Need to call a SQL query in a data step

Thank you Reeza. I just updated the question with a sample.

 

I don't care in how many steps the job is done. If I can create the final data set, I will be happy!

Esteemed Advisor
Posts: 5,626

Re: Need to call a SQL query in a data step

[ Edited ]
Posted in reply to hamed_majidi_gmail_com

Is the average of z for date_ var <= date1 supposed to be taken over the whole datsaset or only within the current x group?

 

and are x groups sorted by date_var ?

PG
Occasional Contributor
Posts: 6

Re: Need to call a SQL query in a data step

Average of z should be taken over the whole data set (in fact a subset of the whole data set that matches with the WHERE clause).

 

Yes, x groups are sorted by date_var (in ascending order).

Solution
‎08-25-2017 03:38 PM
Super User
Super User
Posts: 8,289

Re: Need to call a SQL query in a data step

Posted in reply to hamed_majidi_gmail_com

Let's change the names of the dataset to HAVE and WANT since your example code is using IN and OUT in a confusing way.

You should be able to just calculate the mean value BEFORE starting the data step.  If you really want to compare the value of Z to value of Z on the previous record then you need have a variable that orders the rows within the X groups.  Let's assume that you want to order them by DATE_VAR.(and that DATE_VAR is unique within values of X).

 

proc sql ;
create table middle as
  select a.*,(select mean(b.z) from have b where b.date_var <= a.date_var) as avg_z
  from have a
  order by x, date_var
;
quit;

Now you can use a data step to calculate your new Y variable.   Note that you should NOT use the LAG() function in conditionally executed code as it messes up how it keeps track of the previous values.

data want ;
   set middle;
   by x;
   y = divide(z - lag(z),avg_z );
   if first.x then y = avg_z ;
run;
Occasional Contributor
Posts: 6

Re: Need to call a SQL query in a data step

Awesome! Thanks for sharing your knowledge.

Esteemed Advisor
Posts: 5,626

Re: Need to call a SQL query in a data step

Posted in reply to hamed_majidi_gmail_com

Here is the sample data for those who prefer csv data

 

ENCOUNTER_ID,PATIENT_SK,DISCHARGED_DT_TM,ADMITTED_DT_TM,readmission_time,prob(desired output)
410338226,10000174,2015-04-18T20:30:00,2015-04-13T13:42:00,,avg() of whole data set before 2015-04-18
410730615,10000174,2015-07-16T15:49:00,2015-07-13T15:43:00,85.80069444,85 / avg() before 2015-07-16
205494260,10000214,2012-07-28T18:17:00,2012-07-26T19:43:00,,avg() of whole data set before 2012-07-28
51023730,10000388,2007-02-17T19:18:00,2007-01-29T11:44:00,,avg() of whole data set before 2007-02-17
50974455,10000388,2007-03-07T19:35:00,2007-03-04T16:11:00,14.87013889,14.87 / avg() before 2007-03-07
51216578,10000388,2007-04-19T16:31:00,2007-04-09T06:12:00,32.44236111,32.44 / avg() before 2007-04-19
PG
Esteemed Advisor
Posts: 5,626

Re: Need to call a SQL query in a data step

Posted in reply to hamed_majidi_gmail_com

No simple way to do this in a single step. My closest approximation is to define a view:

 

data out;
infile datalines dsd;
informat DISCHARGED_DT_TM ADMITTED_DT_TM E8601DT.;
format DISCHARGED_DT_TM ADMITTED_DT_TM E8601DT19.;
input ENCOUNTER_ID PATIENT_SK DISCHARGED_DT_TM ADMITTED_DT_TM readmission_time;
z = 10*_n_; /* Some value */
datalines;
410338226,10000174,2015-04-18T20:30:00,2015-04-13T13:42:00,,avg() of whole data set before 2015-04-18
410730615,10000174,2015-07-16T15:49:00,2015-07-13T15:43:00,85.80069444,85 / avg() before 2015-07-16
205494260,10000214,2012-07-28T18:17:00,2012-07-26T19:43:00,,avg() of whole data set before 2012-07-28
51023730,10000388,2007-02-17T19:18:00,2007-01-29T11:44:00,,avg() of whole data set before 2007-02-17
50974455,10000388,2007-03-07T19:35:00,2007-03-04T16:11:00,14.87013889,14.87 / avg() before 2007-03-07
51216578,10000388,2007-04-19T16:31:00,2007-04-09T06:12:00,32.44236111,32.44 / avg() before 2007-04-19
;

proc sql;
create view outView as
select 
	a.PATIENT_SK,
	a.ADMITTED_DT_TM,
	mean(b.z) as avgz
from 
	out as a inner join
	out as b on b.ADMITTED_DT_TM <= a.ADMITTED_DT_TM
group by a.PATIENT_SK, a.ADMITTED_DT_TM;
quit;

data in;
merge out outview;
by PATIENT_SK ADMITTED_DT_TM;
lagz = lag(z);
if first.PATIENT_SK 
	then y = avgz;
	else y = (z-lagz) / avgz;
drop lagz avgz;
run;

proc sql; drop view outView; quit; 
PG
Occasional Contributor
Posts: 6

Re: Need to call a SQL query in a data step

Thank you so much PGStats. I really appreciate your help and sharing of your knowledge.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 228 views
  • 1 like
  • 5 in conversation