We’re smarter together. Learn from this collection of community knowledge and add your expertise.

How to survive Survival Analysis in SAS University Edition

by Regular Contributor on ‎02-19-2016 09:14 AM - edited on ‎02-19-2016 03:28 PM by Community Manager (3,197 Views)


Survival_analysis.jpg 

Survival Analysis (also known as Kaplan-Meier curve or Time-to-event analysis) is one of my favourite forms of analysis; this type of analysis can be used for most data that has a time-based component.  When used in context of patients at a hospital, this analysis is called Survival Analysis; in manufacturing, utilities, and anywhere else there is a start / end time, it is known as Time to Event analysis. 

 

Get the Data

I have wanted to do an article on Survival Analysis for a while, but I was unable to find a dataset that was ideal for what I wanted to cover.  Most datasets that are in healthcare are aggregated to protect the patients’ privacy, which makes it difficult to do this analysis. 

 

I did end up finding a perfect dataset – patient-level, but de-identified.  I must admit the dataset was far too large for me to process in SAS University Edition (it’s over 2 million rows), so I had to truncate it down to a more manageable size. FreeDataFriday_graphic.jpgI kept only the first site in the list, Albany Medical Center Hospital, which still amounted to 33,000 patients, giving us more than enough data to play with.  The 2012 data can be downloaded from here (you can also get the 2011 data). 

 

How to go about getting SAS University Edition

If you don’t already have University Edition, get it here and follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials. Additional resources are available in this article.

 

Getting the data ready

After I’ve imported the data, I see that the Length of Stay column is a text column rather than numeric; it turns out that the data has 0-119 days and then “120 +” – so I have to remove the plus sign in the raw data and then reimport it to SAS, and everything is fine.  I then do some preliminary exploration (I absolutely love playing with this type of dataset) and decide on a couple of variables I want to highlight here.

 

The Results

So let’s get to it – what you’ll notice is that this first example is very simple – 3 lines of code, nothing difficult at all!

 

Image1.png

 

Here’s the graph that is outputted (Note: there’s a couple of tables that are also included but I’ve excluded those for another post0. 

 

Image2.png

 

This graph starts at 1.0 or 100% of the patient population at Day 0, and everytime a patient is discharged (their length of stay ends) there is a step in the graph as the remaining number of patients gets smaller.  This continues until 120 days, which is the maximum number of days in the dataset. 

 

This first graph is good but it’s not really informative; we need to split our data into groups.  In PROC LIFETEST, the Group option will create one graph per group; I prefer having all the groups on one image to make comparisons easier.  The first group I look at is the most logical, Gender.  The initial step is to sort the data; the grouping won’t work if SAS doesn’t know precisely where the different levels start / end. 

 

Here’s the code:

 

Image3.png

 

You’ll note that I’ve put a NOTABLE in the PROC LIFETEST statement – this is to suppress those tables I mentioned earlier.  The next key point is that I’ve added the strata (on line 8) which will be our groups. 

 

Image4.png

 

SAS automatically assigns the colour and the strata are sorted alphabetically; the Males and Females, at least in this truncated dataset, have no clear difference. 

 

The next strata I wanted to use is the type_of_admission variable; I’ve updated the code accordingly.

 

Image5.png

 

The output is a little more complex as we now have 4 levels:

 

Image6.png

 

Digging a little more into the data, there’s a variable that indicates the severity of the illness – when I update my code and plot the graph, I get a very interesting output:

Image8.png

 

It’s very clear that the “Extreme” cases have significantly longer stays in hospital than the other three groups, which makes sense.  The one aspect to the graph that I feel has been missing are the numbers of patients in each groups, and I’ve updated the code from above to do this:

 

Image9.png

 

The plots=(survival(atrisk)) statement specifies the survival plot (which is the default, and the same we’ve seen above).  The (atrisk) option allows the table to be added, shown here:

 

image10.png

 

You can note a couple of aspects right away – first, all patients with a Minor incident are discharged somewhere between 25 and 50 days.  Second, that the Moderate group is the largest group of patients, with Extreme being the smallest.  The groups are alphabetised rather than categorical (Extreme, Major, Moderate, Minor would make more sense); this can be done and I’ll show you how in a later post.  The other aspect to these graphs that I’ve not mentioned is the title – “Product-limit survival estimates” is not really understandable, and this can be changed using the TITLE option that will be shown next week.

 

Now it’s your turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Need data for learning?

 

The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:

 

Image11.png

 

We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:

 

IMAGE12.png

 

Click Analytics U, then select "Subscribe" from the Options menu.

 

Happy Learning!

 

Comments
by SAS Employee jvdongen
on ‎03-06-2016 05:15 PM

Great article and a nice way to figure out how proc lifetest works and how it differs from R's survfit :-) I wonder though why you state "I must admit the dataset was far too large for me to process in SAS University Edition (it’s over 2 million rows), so I had to truncate it down to a more manageable size". SAS UE can handle datasets like this (1.5 GB, 2,5Mln observations), as long as you don't try to import it using the 'Upload' function which is indeed limited to 10MB. 

by Regular Contributor
on ‎03-06-2016 07:40 PM

Thanks for the comments!!  I am running SAS UE on a new Macbook with 250Gb free storage, 1.2Ghz Intel Core M, 8 Gb 1600 Mhz DDR3, and i was finding that letting SAS run for 45 minutes wasn't importing the data.  Having said that I was using the IMPORT DATA task, and to your comment wonder if that has different memory usage than another method.  Admittedly I don't have a lot of experience importing data that big into SAS UE, so would welcome further suggestions.

 

Thanks for reading and hope you have a great evening!

Chris

by SAS Employee jvdongen
on ‎03-07-2016 07:39 AM

Hi Chris,

 

Nice computer that new Macbook! Well, to tell you the truth: I cheated a bit... My machine has a lot more horsepower (i7 quad core w 32GB Ram and 2*512GB SSD), plus I used EG 7.1 to import the dataset, then created a library in SAS UE to point to the directory where I put it. But, when I look at the code that Import Data in UE creates vs the very specific instructions that EG generates there are quite some differences in how the data is handled. I'll definitely have a look later with one of my more experienced colleagues bcs I'm intrigued as well now. 

 

Jos

 

 

by Regular Contributor
on ‎03-07-2016 07:44 AM

Hey Jos!  Thanks for posting back.  That's definitely one of the "challenges" with SAS UE - most people are using it as a standalone application, so don't have the ability to have work arounds like this :-)  I use SAS full-time in my day-to-day job and if the first attempt doesn't work, there's always an alternative way to get the data in.  With UE, you're obviously more limited.  I'll play around as well if I have time today - looking forward to your results!!

 

(Are you going to SGF? Would love to meet and chat further about this, maybe do some side-by-side testing!)

Chris

by SAS Employee jvdongen
on ‎03-08-2016 03:59 PM

Hi Chris,

nope, no SGF for me this year. I did spend some time today though to do a UE only import of large data sets, and was surprised by the results. The complete data set you used took less than 15 seconds to process! (written out to disk, not the WORK lib). There are a few things to keep in mind though: UE doesn't always get the line delimiter correct so you'll have to know what it is (linefeed in this case). Notepad++ will show you the correct character btw. Secondly, right clicking a file in UE's folder browser and selecting 'Import' on a large file kills my Chrome instance. But, using 'Import' from the Utilities or selecting 'New->Import' via a right mouse click and then dragging the file on the screen works flawlessly. This is however still the 'quick & dirty' way since UE will set 'length_of_stay' as a numeric field (which it isn't so you'll end up with 2109 missing values) so to get the data in properly you need to first use the explicit (in)formatting within PROC IMPORT and then use a Data step to convert the 120+ values and make it a numeric colums. Or use another tool of choice to do that of course :-) 

 

Hope this helps! 

by Regular Contributor
on ‎03-09-2016 07:16 AM

Oh cool - so it must've been my computer or the way I was trying to import the data.  I'll definitely give it a try and see what I come up with; I'll also try it out on the desktop at home.  You do of course realise this means I have more data now to play with <evil grin>!

 

Hope you have a great day and don't work too hard!

Cheers

Chris

by Frequent Contributor
on ‎04-25-2016 11:48 AM

Hi Chris.

 

Just in case students and teachers are interested in your survival analysis are using SAS On Demand for Academics Studio instead of SAS U Studio: Here is a way that they can load the large data set into Studio, which is way more than the studio Server files and folders size upload limit of 10MBytes as you pointed out. I'm interested in ways to load large data teaching data to SAS On Demand cloud as I'm interested in teaching data mining with real big data. Thanks to Chris Hemedinger for the code template. Note that the CSV file is served from a HTTP server. I downloaded the data file from the HTTPS link you gave above and loaded it into one of my own HTTP servers. This is because both SAS U and SAS On Demand for Academics Studio can only access remote files via the HTTP method and not HTTPS, FTP, SFTP or WEBDAV methods. I set up this HTTP server using my gigabit fibre broadband modem's HTTP port forwarding and my MSI U-100 notebook running Lubuntu 15.10 with Apache 2.4.2. Note also the extreme guessingrows setting on the proc import - all 2544731 rows for the 2012 data. This insures against proc import errors but trades off import execution time. You can see in the log that it does take nearly 40 minutes using SAS On Demand cloud resources. Once a registered course instructor creates the data set in an On Demand for Academics course folder then all students registered for that course can explore survival analysis on this big data set. 

 

The SAS On Demand for Academics Studio program:

 

/* create a name for our downloaded ZIP */
%let ziploc = %sysfunc(getoption(work))&delim.datafile.zip;
filename download "&ziploc";
 
/* Download the data file from the Internet*/
proc http
 method='GET'
 url="http://121.99.204.89/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2012.csv"
 out=download;
run;

*FILENAME CSV "/folders/myfolders/sasuser.v94/teaching/nyhospitals/data/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2012.csv" TERMSTR=LF;

libname nyhsptls '/home/damien.mather/sasuser.v94/teaching/mart212/data';
run;

PROC IMPORT
 DATAFILE=download
 OUT=nyhsptls.discharges
 DBMS=CSV
 REPLACE
;
 guessingrows=2544731;
RUN;

/** Unassign the file reference.  **/

FILENAME CSV;

 

The SAS On Deand for Academics Studio log:

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 /* detect proper delim for UNIX vs. Windows */
57 %let delim=%sysfunc(ifc(%eval(&sysscp. = WIN),\,/));
58
59 /* create a name for our downloaded ZIP */
60 %let ziploc = %sysfunc(getoption(work))&delim.datafile.zip;
61 filename download "&ziploc";
62
63 /* Download the data file from the Internet*/
64 proc http
65 method='GET'
67 out=download;
68 run;
 
NOTE: PROCEDURE HTTP used (Total process time):
real time 2:18.24
user cpu time 7.37 seconds
system cpu time 1.11 seconds
memory 185.96k
OS Memory 33708.00k
Timestamp 04/25/2016 02:03:07 PM
Step Count 66 Switch Count 34
Page Faults 0
Page Reclaims 27
Page Swaps 0
Voluntary Context Switches 531766
Involuntary Context Switches 3
Block Input Operations 8
Block Output Operations 1967152
 
 
69
70 *FILENAME CSV
70 ! "/folders/myfolders/sasuser.v94/teaching/nyhospitals/data/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2012.csv"
70 ! TERMSTR=LF;
71
72 libname nyhsptls '/home/damien.mather/sasuser.v94/teaching/mart212/data';
NOTE: Libref NYHSPTLS was successfully assigned as follows:
Engine: V9
Physical Name: /home/damien.mather/sasuser.v94/teaching/mart212/data
73 run;
74
75 PROC IMPORT
76 DATAFILE=download
77 OUT=nyhsptls.discharges
78 DBMS=CSV
79 REPLACE
80 ;
81 guessingrows=2544731;
82 RUN;
 
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Name APR Severity of Illness Description truncated to APR Severity of Illness Descript.
Name Attending Provider License Number truncated to Attending Provider License Numbe.
Name Operating Provider License Number truncated to Operating Provider License Numbe.
Problems were detected with provided names. See LOG.
83 /**********************************************************************
84 * PRODUCT: SAS
85 * VERSION: 9.4
86 * CREATOR: External File Interface
87 * DATE: 25APR16
88 * DESC: Generated SAS Datastep Code
89 * TEMPLATE SOURCE: (None Specified.)
90 ***********************************************************************/
91 data NYHSPTLS.DISCHARGES ;
92 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
93 infile DOWNLOAD delimiter = ',' MISSOVER DSD firstobs=2 ;
94 informat "Health Service Area"N $14. ;
95 informat "Hospital County"N $11. ;
96 informat "Operating Certificate Number"N best32. ;
97 informat "Facility Id"N best32. ;
98 informat "Facility Name"N $70. ;
99 informat "Age Group"N $11. ;
100 informat "Zip Code - 3 digits"N $3. ;
101 informat Gender $1. ;
102 informat Race $22. ;
103 informat Ethnicity $17. ;
104 informat "Length of Stay"N $5. ;
105 informat "Admit Day of Week"N $3. ;
106 informat "Type of Admission"N $13. ;
107 informat "Patient Disposition"N $37. ;
108 informat "Discharge Year"N best32. ;
109 informat "Discharge Day of Week"N $3. ;
110 informat "CCS Diagnosis Code"N best32. ;
111 informat "CCS Diagnosis Description"N $23. ;
112 informat "CCS Procedure Code"N best32. ;
113 informat "CCS Procedure Description"N $25. ;
114 informat "APR DRG Code"N best32. ;
115 informat "APR DRG Description"N $91. ;
116 informat "APR MDC Code"N best32. ;
117 informat "APR MDC Description"N $102. ;
118 informat "APR Severity of Illness Code"N best32. ;
119 informat "APR Severity of Illness Descript"N $8. ;
120 informat "APR Risk of Mortality"N $8. ;
121 informat "APR Medical Surgical Description"N $14. ;
122 informat "Payment Typology 1"N $27. ;
123 informat "Payment Typology 2"N $27. ;
124 informat "Payment Typology 3"N $27. ;
125 informat "Attending Provider License Numbe"N best32. ;
126 informat "Operating Provider License Numbe"N best32. ;
127 informat "Other Provider License Number"N best32. ;
128 informat "Birth Weight"N best32. ;
129 informat "Abortion Edit Indicator"N $1. ;
130 informat "Emergency Department Indicator"N $1. ;
131 informat "Total Charges"N nlnum32. ;
132 informat "Total Costs"N nlnum32. ;
133 format "Health Service Area"N $14. ;
134 format "Hospital County"N $11. ;
135 format "Operating Certificate Number"N best12. ;
136 format "Facility Id"N best12. ;
137 format "Facility Name"N $70. ;
138 format "Age Group"N $11. ;
139 format "Zip Code - 3 digits"N $3. ;
140 format Gender $1. ;
141 format Race $22. ;
142 format Ethnicity $17. ;
143 format "Length of Stay"N $5. ;
144 format "Admit Day of Week"N $3. ;
145 format "Type of Admission"N $13. ;
146 format "Patient Disposition"N $37. ;
147 format "Discharge Year"N best12. ;
148 format "Discharge Day of Week"N $3. ;
149 format "CCS Diagnosis Code"N best12. ;
150 format "CCS Diagnosis Description"N $23. ;
151 format "CCS Procedure Code"N best12. ;
152 format "CCS Procedure Description"N $25. ;
153 format "APR DRG Code"N best12. ;
154 format "APR DRG Description"N $91. ;
155 format "APR MDC Code"N best12. ;
156 format "APR MDC Description"N $102. ;
157 format "APR Severity of Illness Code"N best12. ;
158 format "APR Severity of Illness Descript"N $8. ;
159 format "APR Risk of Mortality"N $8. ;
160 format "APR Medical Surgical Description"N $14. ;
161 format "Payment Typology 1"N $27. ;
162 format "Payment Typology 2"N $27. ;
163 format "Payment Typology 3"N $27. ;
164 format "Attending Provider License Numbe"N best12. ;
165 format "Operating Provider License Numbe"N best12. ;
166 format "Other Provider License Number"N best12. ;
167 format "Birth Weight"N best12. ;
168 format "Abortion Edit Indicator"N $1. ;
169 format "Emergency Department Indicator"N $1. ;
170 format "Total Charges"N nlnum12. ;
171 format "Total Costs"N nlnum12. ;
172 input
173 "Health Service Area"N $
174 "Hospital County"N $
175 "Operating Certificate Number"N
176 "Facility Id"N
177 "Facility Name"N $
178 "Age Group"N $
179 "Zip Code - 3 digits"N $
180 Gender $
181 Race $
182 Ethnicity $
183 "Length of Stay"N $
184 "Admit Day of Week"N $
185 "Type of Admission"N $
186 "Patient Disposition"N $
187 "Discharge Year"N
188 "Discharge Day of Week"N $
189 "CCS Diagnosis Code"N
190 "CCS Diagnosis Description"N $
191 "CCS Procedure Code"N
192 "CCS Procedure Description"N $
193 "APR DRG Code"N
194 "APR DRG Description"N $
195 "APR MDC Code"N
196 "APR MDC Description"N $
197 "APR Severity of Illness Code"N
198 "APR Severity of Illness Descript"N $
199 "APR Risk of Mortality"N $
200 "APR Medical Surgical Description"N $
201 "Payment Typology 1"N $
202 "Payment Typology 2"N $
203 "Payment Typology 3"N $
204 "Attending Provider License Numbe"N
205 "Operating Provider License Numbe"N
206 "Other Provider License Number"N
207 "Birth Weight"N
208 "Abortion Edit Indicator"N $
209 "Emergency Department Indicator"N $
210 "Total Charges"N
211 "Total Costs"N
212 ;
213 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
214 run;
 
NOTE: The infile DOWNLOAD is:
Filename=/saswork/SAS_work64030001163D_odaws02-prod-sg/SAS_workD6510001163D_odaws02-prod-sg/datafile.zip,
Owner Name=damien.mather,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=26Apr2016:02:03:03,
File Size (bytes)=1007175449
 
NOTE: 2544731 records were read from the infile DOWNLOAD.
The minimum record length was 251.
The maximum record length was 569.
NOTE: The data set NYHSPTLS.DISCHARGES has 2544731 observations and 39 variables.
NOTE: DATA statement used (Total process time):
real time 31.31 seconds
user cpu time 8.55 seconds
system cpu time 1.44 seconds
memory 16865.87k
OS Memory 47412.00k
Timestamp 04/25/2016 02:39:26 PM
Step Count 67 Switch Count 295
Page Faults 0
Page Reclaims 636
Page Swaps 0
Voluntary Context Switches 44972
Involuntary Context Switches 52831
Block Input Operations 0
Block Output Operations 3393296
 
 
2544731 rows created in NYHSPTLS.DISCHARGES from DOWNLOAD.
 
 
 
NOTE: NYHSPTLS.DISCHARGES data set was successfully created.
NOTE: The data set NYHSPTLS.DISCHARGES has 2544731 observations and 39 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 36:19.56
user cpu time 35:52.43
system cpu time 5.73 seconds
memory 16865.87k
OS Memory 47412.00k
Timestamp 04/25/2016 02:39:26 PM
Step Count 67 Switch Count 85
Page Faults 0
Page Reclaims 836635
Page Swaps 0
Voluntary Context Switches 45569
Involuntary Context Switches 55627
Block Input Operations 288
Block Output Operations 3393376
 
 
215
216 /** Unassign the file reference. **/
217
218 FILENAME CSV;
NOTE: Fileref CSV has been deassigned.
219
220
221 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
233
  User: damien.mather
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 /* detect proper delim for UNIX vs. Windows */
57 %let delim=%sysfunc(ifc(%eval(&sysscp. = WIN),\,/));
58
59 /* create a name for our downloaded ZIP */
60 %let ziploc = %sysfunc(getoption(work))&delim.datafile.zip;
61 filename download "&ziploc";
62
63 /* Download the data file from the Internet*/
64 proc http
65 method='GET'
67 out=download;
68 run;
 
NOTE: PROCEDURE HTTP used (Total process time):
real time 2:18.24
user cpu time 7.37 seconds
system cpu time 1.11 seconds
memory 185.96k
OS Memory 33708.00k
Timestamp 04/25/2016 02:03:07 PM
Step Count 66 Switch Count 34
Page Faults 0
Page Reclaims 27
Page Swaps 0
Voluntary Context Switches 531766
Involuntary Context Switches 3
Block Input Operations 8
Block Output Operations 1967152
 
 
69
70 *FILENAME CSV
70 ! "/folders/myfolders/sasuser.v94/teaching/nyhospitals/data/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2012.csv"
70 ! TERMSTR=LF;
71
72 libname nyhsptls '/home/damien.mather/sasuser.v94/teaching/mart212/data';
NOTE: Libref NYHSPTLS was successfully assigned as follows:
Engine: V9
Physical Name: /home/damien.mather/sasuser.v94/teaching/mart212/data
73 run;
74
75 PROC IMPORT
76 DATAFILE=download
77 OUT=nyhsptls.discharges
78 DBMS=CSV
79 REPLACE
80 ;
81 guessingrows=2544731;
82 RUN;
 
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Name APR Severity of Illness Description truncated to APR Severity of Illness Descript.
Name Attending Provider License Number truncated to Attending Provider License Numbe.
Name Operating Provider License Number truncated to Operating Provider License Numbe.
Problems were detected with provided names. See LOG.
83 /**********************************************************************
84 * PRODUCT: SAS
85 * VERSION: 9.4
86 * CREATOR: External File Interface
87 * DATE: 25APR16
88 * DESC: Generated SAS Datastep Code
89 * TEMPLATE SOURCE: (None Specified.)
90 ***********************************************************************/
91 data NYHSPTLS.DISCHARGES ;
92 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
93 infile DOWNLOAD delimiter = ',' MISSOVER DSD firstobs=2 ;
94 informat "Health Service Area"N $14. ;
95 informat "Hospital County"N $11. ;
96 informat "Operating Certificate Number"N best32. ;
97 informat "Facility Id"N best32. ;
98 informat "Facility Name"N $70. ;
99 informat "Age Group"N $11. ;
100 informat "Zip Code - 3 digits"N $3. ;
101 informat Gender $1. ;
102 informat Race $22. ;
103 informat Ethnicity $17. ;
104 informat "Length of Stay"N $5. ;
105 informat "Admit Day of Week"N $3. ;
106 informat "Type of Admission"N $13. ;
107 informat "Patient Disposition"N $37. ;
108 informat "Discharge Year"N best32. ;
109 informat "Discharge Day of Week"N $3. ;
110 informat "CCS Diagnosis Code"N best32. ;
111 informat "CCS Diagnosis Description"N $23. ;
112 informat "CCS Procedure Code"N best32. ;
113 informat "CCS Procedure Description"N $25. ;
114 informat "APR DRG Code"N best32. ;
115 informat "APR DRG Description"N $91. ;
116 informat "APR MDC Code"N best32. ;
117 informat "APR MDC Description"N $102. ;
118 informat "APR Severity of Illness Code"N best32. ;
119 informat "APR Severity of Illness Descript"N $8. ;
120 informat "APR Risk of Mortality"N $8. ;
121 informat "APR Medical Surgical Description"N $14. ;
122 informat "Payment Typology 1"N $27. ;
123 informat "Payment Typology 2"N $27. ;
124 informat "Payment Typology 3"N $27. ;
125 informat "Attending Provider License Numbe"N best32. ;
126 informat "Operating Provider License Numbe"N best32. ;
127 informat "Other Provider License Number"N best32. ;
128 informat "Birth Weight"N best32. ;
129 informat "Abortion Edit Indicator"N $1. ;
130 informat "Emergency Department Indicator"N $1. ;
131 informat "Total Charges"N nlnum32. ;
132 informat "Total Costs"N nlnum32. ;
133 format "Health Service Area"N $14. ;
134 format "Hospital County"N $11. ;
135 format "Operating Certificate Number"N best12. ;
136 format "Facility Id"N best12. ;
137 format "Facility Name"N $70. ;
138 format "Age Group"N $11. ;
139 format "Zip Code - 3 digits"N $3. ;
140 format Gender $1. ;
141 format Race $22. ;
142 format Ethnicity $17. ;
143 format "Length of Stay"N $5. ;
144 format "Admit Day of Week"N $3. ;
145 format "Type of Admission"N $13. ;
146 format "Patient Disposition"N $37. ;
147 format "Discharge Year"N best12. ;
148 format "Discharge Day of Week"N $3. ;
149 format "CCS Diagnosis Code"N best12. ;
150 format "CCS Diagnosis Description"N $23. ;
151 format "CCS Procedure Code"N best12. ;
152 format "CCS Procedure Description"N $25. ;
153 format "APR DRG Code"N best12. ;
154 format "APR DRG Description"N $91. ;
155 format "APR MDC Code"N best12. ;
156 format "APR MDC Description"N $102. ;
157 format "APR Severity of Illness Code"N best12. ;
158 format "APR Severity of Illness Descript"N $8. ;
159 format "APR Risk of Mortality"N $8. ;
160 format "APR Medical Surgical Description"N $14. ;
161 format "Payment Typology 1"N $27. ;
162 format "Payment Typology 2"N $27. ;
163 format "Payment Typology 3"N $27. ;
164 format "Attending Provider License Numbe"N best12. ;
165 format "Operating Provider License Numbe"N best12. ;
166 format "Other Provider License Number"N best12. ;
167 format "Birth Weight"N best12. ;
168 format "Abortion Edit Indicator"N $1. ;
169 format "Emergency Department Indicator"N $1. ;
170 format "Total Charges"N nlnum12. ;
171 format "Total Costs"N nlnum12. ;
172 input
173 "Health Service Area"N $
174 "Hospital County"N $
175 "Operating Certificate Number"N
176 "Facility Id"N
177 "Facility Name"N $
178 "Age Group"N $
179 "Zip Code - 3 digits"N $
180 Gender $
181 Race $
182 Ethnicity $
183 "Length of Stay"N $
184 "Admit Day of Week"N $
185 "Type of Admission"N $
186 "Patient Disposition"N $
187 "Discharge Year"N
188 "Discharge Day of Week"N $
189 "CCS Diagnosis Code"N
190 "CCS Diagnosis Description"N $
191 "CCS Procedure Code"N
192 "CCS Procedure Description"N $
193 "APR DRG Code"N
194 "APR DRG Description"N $
195 "APR MDC Code"N
196 "APR MDC Description"N $
197 "APR Severity of Illness Code"N
198 "APR Severity of Illness Descript"N $
199 "APR Risk of Mortality"N $
200 "APR Medical Surgical Description"N $
201 "Payment Typology 1"N $
202 "Payment Typology 2"N $
203 "Payment Typology 3"N $
204 "Attending Provider License Numbe"N
205 "Operating Provider License Numbe"N
206 "Other Provider License Number"N
207 "Birth Weight"N
208 "Abortion Edit Indicator"N $
209 "Emergency Department Indicator"N $
210 "Total Charges"N
211 "Total Costs"N
212 ;
213 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
214 run;
 
NOTE: The infile DOWNLOAD is:
Filename=/saswork/SAS_work64030001163D_odaws02-prod-sg/SAS_workD6510001163D_odaws02-prod-sg/datafile.zip,
Owner Name=damien.mather,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=26Apr2016:02:03:03,
File Size (bytes)=1007175449
 
NOTE: 2544731 records were read from the infile DOWNLOAD.
The minimum record length was 251.
The maximum record length was 569.
NOTE: The data set NYHSPTLS.DISCHARGES has 2544731 observations and 39 variables.
NOTE: DATA statement used (Total process time):
real time 31.31 seconds
user cpu time 8.55 seconds
system cpu time 1.44 seconds
memory 16865.87k
OS Memory 47412.00k
Timestamp 04/25/2016 02:39:26 PM
Step Count 67 Switch Count 295
Page Faults 0
Page Reclaims 636
Page Swaps 0
Voluntary Context Switches 44972
Involuntary Context Switches 52831
Block Input Operations 0
Block Output Operations 3393296
 
 
2544731 rows created in NYHSPTLS.DISCHARGES from DOWNLOAD.
 
 
 
NOTE: NYHSPTLS.DISCHARGES data set was successfully created.
NOTE: The data set NYHSPTLS.DISCHARGES has 2544731 observations and 39 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 36:19.56
user cpu time 35:52.43
system cpu time 5.73 seconds
memory 16865.87k
OS Memory 47412.00k
Timestamp 04/25/2016 02:39:26 PM
Step Count 67 Switch Count 85
Page Faults 0
Page Reclaims 836635
Page Swaps 0
Voluntary Context Switches 45569
Involuntary Context Switches 55627
Block Input Operations 288
Block Output Operations 3393376
 
 
215
216 /** Unassign the file reference. **/
217
218 FILENAME CSV;
NOTE: Fileref CSV has been deassigned.
219
220
221 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
233
  User: damien.mather
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 /* detect proper delim for UNIX vs. Windows */
57 %let delim=%sysfunc(ifc(%eval(&sysscp. = WIN),\,/));
58
59 /* create a name for our downloaded ZIP */
60 %let ziploc = %sysfunc(getoption(work))&delim.datafile.zip;
61 filename download "&ziploc";
62
63 /* Download the data file from the Internet*/
64 proc http
65 method='GET'
67 out=download;
68 run;
 
NOTE: PROCEDURE HTTP used (Total process time):
real time 2:18.24
user cpu time 7.37 seconds
system cpu time 1.11 seconds
memory 185.96k
OS Memory 33708.00k
Timestamp 04/25/2016 02:03:07 PM
Step Count 66 Switch Count 34
Page Faults 0
Page Reclaims 27
Page Swaps 0
Voluntary Context Switches 531766
Involuntary Context Switches 3
Block Input Operations 8
Block Output Operations 1967152
 
 
69
70 *FILENAME CSV
70 ! "/folders/myfolders/sasuser.v94/teaching/nyhospitals/data/Hospital_Inpatient_Discharges__SPARCS_De-Identified___2012.csv"
70 ! TERMSTR=LF;
71
72 libname nyhsptls '/home/damien.mather/sasuser.v94/teaching/mart212/data';
NOTE: Libref NYHSPTLS was successfully assigned as follows:
Engine: V9
Physical Name: /home/damien.mather/sasuser.v94/teaching/mart212/data
73 run;
74
75 PROC IMPORT
76 DATAFILE=download
77 OUT=nyhsptls.discharges
78 DBMS=CSV
79 REPLACE
80 ;
81 guessingrows=2544731;
82 RUN;
 
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Name APR Severity of Illness Description truncated to APR Severity of Illness Descript.
Name Attending Provider License Number truncated to Attending Provider License Numbe.
Name Operating Provider License Number truncated to Operating Provider License Numbe.
Problems were detected with provided names. See LOG.
83 /**********************************************************************
84 * PRODUCT: SAS
85 * VERSION: 9.4
86 * CREATOR: External File Interface
87 * DATE: 25APR16
88 * DESC: Generated SAS Datastep Code
89 * TEMPLATE SOURCE: (None Specified.)
90 ***********************************************************************/
91 data NYHSPTLS.DISCHARGES ;
92 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
93 infile DOWNLOAD delimiter = ',' MISSOVER DSD firstobs=2 ;
94 informat "Health Service Area"N $14. ;
95 informat "Hospital County"N $11. ;
96 informat "Operating Certificate Number"N best32. ;
97 informat "Facility Id"N best32. ;
98 informat "Facility Name"N $70. ;
99 informat "Age Group"N $11. ;
100 informat "Zip Code - 3 digits"N $3. ;
101 informat Gender $1. ;
102 informat Race $22. ;
103 informat Ethnicity $17. ;
104 informat "Length of Stay"N $5. ;
105 informat "Admit Day of Week"N $3. ;
106 informat "Type of Admission"N $13. ;
107 informat "Patient Disposition"N $37. ;
108 informat "Discharge Year"N best32. ;
109 informat "Discharge Day of Week"N $3. ;
110 informat "CCS Diagnosis Code"N best32. ;
111 informat "CCS Diagnosis Description"N $23. ;
112 informat "CCS Procedure Code"N best32. ;
113 informat "CCS Procedure Description"N $25. ;
114 informat "APR DRG Code"N best32. ;
115 informat "APR DRG Description"N $91. ;
116 informat "APR MDC Code"N best32. ;
117 informat "APR MDC Description"N $102. ;
118 informat "APR Severity of Illness Code"N best32. ;
119 informat "APR Severity of Illness Descript"N $8. ;
120 informat "APR Risk of Mortality"N $8. ;
121 informat "APR Medical Surgical Description"N $14. ;
122 informat "Payment Typology 1"N $27. ;
123 informat "Payment Typology 2"N $27. ;
124 informat "Payment Typology 3"N $27. ;
125 informat "Attending Provider License Numbe"N best32. ;
126 informat "Operating Provider License Numbe"N best32. ;
127 informat "Other Provider License Number"N best32. ;
128 informat "Birth Weight"N best32. ;
129 informat "Abortion Edit Indicator"N $1. ;
130 informat "Emergency Department Indicator"N $1. ;
131 informat "Total Charges"N nlnum32. ;
132 informat "Total Costs"N nlnum32. ;
133 format "Health Service Area"N $14. ;
134 format "Hospital County"N $11. ;
135 format "Operating Certificate Number"N best12. ;
136 format "Facility Id"N best12. ;
137 format "Facility Name"N $70. ;
138 format "Age Group"N $11. ;
139 format "Zip Code - 3 digits"N $3. ;
140 format Gender $1. ;
141 format Race $22. ;
142 format Ethnicity $17. ;
143 format "Length of Stay"N $5. ;
144 format "Admit Day of Week"N $3. ;
145 format "Type of Admission"N $13. ;
146 format "Patient Disposition"N $37. ;
147 format "Discharge Year"N best12. ;
148 format "Discharge Day of Week"N $3. ;
149 format "CCS Diagnosis Code"N best12. ;
150 format "CCS Diagnosis Description"N $23. ;
151 format "CCS Procedure Code"N best12. ;
152 format "CCS Procedure Description"N $25. ;
153 format "APR DRG Code"N best12. ;
154 format "APR DRG Description"N $91. ;
155 format "APR MDC Code"N best12. ;
156 format "APR MDC Description"N $102. ;
157 format "APR Severity of Illness Code"N best12. ;
158 format "APR Severity of Illness Descript"N $8. ;
159 format "APR Risk of Mortality"N $8. ;
160 format "APR Medical Surgical Description"N $14. ;
161 format "Payment Typology 1"N $27. ;
162 format "Payment Typology 2"N $27. ;
163 format "Payment Typology 3"N $27. ;
164 format "Attending Provider License Numbe"N best12. ;
165 format "Operating Provider License Numbe"N best12. ;
166 format "Other Provider License Number"N best12. ;
167 format "Birth Weight"N best12. ;
168 format "Abortion Edit Indicator"N $1. ;
169 format "Emergency Department Indicator"N $1. ;
170 format "Total Charges"N nlnum12. ;
171 format "Total Costs"N nlnum12. ;
172 input
173 "Health Service Area"N $
174 "Hospital County"N $
175 "Operating Certificate Number"N
176 "Facility Id"N
177 "Facility Name"N $
178 "Age Group"N $
179 "Zip Code - 3 digits"N $
180 Gender $
181 Race $
182 Ethnicity $
183 "Length of Stay"N $
184 "Admit Day of Week"N $
185 "Type of Admission"N $
186 "Patient Disposition"N $
187 "Discharge Year"N
188 "Discharge Day of Week"N $
189 "CCS Diagnosis Code"N
190 "CCS Diagnosis Description"N $
191 "CCS Procedure Code"N
192 "CCS Procedure Description"N $
193 "APR DRG Code"N
194 "APR DRG Description"N $
195 "APR MDC Code"N
196 "APR MDC Description"N $
197 "APR Severity of Illness Code"N
198 "APR Severity of Illness Descript"N $
199 "APR Risk of Mortality"N $
200 "APR Medical Surgical Description"N $
201 "Payment Typology 1"N $
202 "Payment Typology 2"N $
203 "Payment Typology 3"N $
204 "Attending Provider License Numbe"N
205 "Operating Provider License Numbe"N
206 "Other Provider License Number"N
207 "Birth Weight"N
208 "Abortion Edit Indicator"N $
209 "Emergency Department Indicator"N $
210 "Total Charges"N
211 "Total Costs"N
212 ;
213 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
214 run;
 
NOTE: The infile DOWNLOAD is:
Filename=/saswork/SAS_work64030001163D_odaws02-prod-sg/SAS_workD6510001163D_odaws02-prod-sg/datafile.zip,
Owner Name=damien.mather,Group Name=oda,
Access Permission=-rw-r--r--,
Last Modified=26Apr2016:02:03:03,
File Size (bytes)=1007175449
 
NOTE: 2544731 records were read from the infile DOWNLOAD.
The minimum record length was 251.
The maximum record length was 569.
NOTE: The data set NYHSPTLS.DISCHARGES has 2544731 observations and 39 variables.
NOTE: DATA statement used (Total process time):
real time 31.31 seconds
user cpu time 8.55 seconds
system cpu time 1.44 seconds
memory 16865.87k
OS Memory 47412.00k
Timestamp 04/25/2016 02:39:26 PM
Step Count 67 Switch Count 295
Page Faults 0
Page Reclaims 636
Page Swaps 0
Voluntary Context Switches 44972
Involuntary Context Switches 52831
Block Input Operations 0
Block Output Operations 3393296
 
 
2544731 rows created in NYHSPTLS.DISCHARGES from DOWNLOAD.
 
 
 
NOTE: NYHSPTLS.DISCHARGES data set was successfully created.
NOTE: The data set NYHSPTLS.DISCHARGES has 2544731 observations and 39 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 36:19.56
user cpu time 35:52.43
system cpu time 5.73 seconds
memory 16865.87k
OS Memory 47412.00k
Timestamp 04/25/2016 02:39:26 PM
Step Count 67 Switch Count 85
Page Faults 0
Page Reclaims 836635
Page Swaps 0
Voluntary Context Switches 45569
Involuntary Context Switches 55627
Block Input Operations 288
Block Output Operations 3393376
 
 
215
216 /** Unassign the file reference. **/
217
218 FILENAME CSV;
NOTE: Fileref CSV has been deassigned.
219
220
221 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
233
 

 

 

by Regular Contributor
on ‎04-25-2016 12:47 PM

Hey Damien!  This is fantastic - I've not played with the Cloud version, so thank you for this information.  Always great that people such  as yourself are willing to go out and see what else is possible; your time is appreciated!!

 

Have a great day :-)

Chris

Your turn
Sign In!

Want to write an article? Sign in with your profile.