Solved: Need to create fewer patient records from many records; is it correct ...

jcorlette · Posted 11-13-2019 11:33 AM

Hello, I have been a SAS user (part-time) for many years, but now am using it more often with larger data sets and complex programming. I am using SAS 9.4 TS Level 1M5 on X64_10PRO platform on Windows. I think this is a good place to post this, but if I should contact tech support or other help, please advise.

I have previously used arrays to change my data set from long to wide by MRN, now I need to have multiple rows by MRN and Date. A colleague helped me with the previous project and I am trying to understand if arrays will work with my current data or if I need to use different programming.

The data has more than 200 patients, with multiple visit dates which can have multiple values for any variable. The sample attached has 2 patients and contains all the variables for my data.

My desired outcome is to create a data set that has fewer rows of data organized by MRN then Date and having each row contain all associated information for that date. For each date, a patient may have multiple diagnosis codes from different providers, different patient classes, etc.

When I run the code below, I received these errors (same for each variable):

ERROR: The array aEnc_Code has been defined with zero elements.
ERROR: Too many array subscripts specified for array aEnc_Code.

I imported my data and then created data set Z and continued with the code below.
I used the former coding which gave me a single row of data for each MRN, but now I need multiple rows per MRN. I have been using seq_id to count how many dates a patient was seen for, but I don't know if it is still needed. So my code below already has a couple issues, but I was unsure where to change things so I am posting what I currently have.
Many thanks for any help that can be provided.

proc sort data= Z;
	by MRN Date;
run;

data trial;
	set Z;
	by MRN Date;
	if first.MRN then seq_id=0;
	seq_id+1;
run;

data Y (drop=Enc_Complete); /* don't need this all all records are COMPLETE */
	set trial;
	by MRN Date seq_id;
/*Not sure how to keep all variables I create since I don't know how many there will be, 
previously I had 1-25 after variables starting with Enc_Code */
	keep MRN Enc_Code Enc_Type Enc_Complete Dx_Code Dx_Name Date Prov_Name Dept_ID Dept_Name Hosp_Adm_Code
		Hosp_Adm_Name Pt_Class_Code Pt_Class_Name Surg_Loc_Name Discharge_Date;
/* Wanting to array both numeric and character variables */
		array aEnc_Code(*) ; 
		array aEnc_Type(*) $;
		array aDx_Code(*) ;
		array aDx_Name(*) $;
		array aDate(*) ;
		array aProv_Name(*) $;
		array aDept_ID(*) ;
		array aDept_Name(*) $;
		array aHosp_Adm_Code(*) ;
		array aHosp_Adm_Name (*) $;
		array aPt_Class_Code(*) ;
		array aPt_Class_Name(*) $;
		array aSurg_Loc_Name(*) $;
		array aDischarge_Date(*) ;
		if first.MRN then  /*previously tried first.Date but that didn't work either */
			do;
				do i = 1 to dim(aEnc_Code);
					aEnc_Code(i)= .; end;
				do i = 1 to dim(aEnc_Type);
					aEnc_Type(i) = ' '; end;
				do i = 1 to dim(aDx_Code);
					aDx_Code(i) = .; end;
				do i = 1 to dim(aDx_Name);
					aDx_Name(i) = ' '; end;
				do i = 1 to dim(aDate);
					aDate(i) = .; end;
				do i = 1 to dim(aProv_Name);
					aProv_Name(i) = ' '; end;
				do i = 1 to dim(aDept_ID);
					aDept_ID(i) = .; end;
				do i = 1 to dim(aDept_Name);
					aDept_Name(i) = ' '; end;
				do i = 1 to dim(aHosp_Adm_Code);
					aHosp_Adm_Code(i) = ; end;
				do i = 1 to dim(Hosp_Adm_Name);
					aHosp_Adm_Name(i) = ' ';end;
				do i = 1 to dim(aPt_Class_Code);
					aPt_Class_Code(i) = .; end;
				do i = 1 to dim(aPt_Class_Name);
					aPt_Class_Name(i) = ' '; end;
				do i = 1  to dim(aSurg_Loc_Name);
					aSurg_Loc_Name(i) = ' '; end;
				do i = 1 to dim(aDischarge_Date);
					aDischarge_Date(i) = .; end;
			end;
		end;
			aEnc_Code(seq_id) = Enc_Code;										
			aDx_Code(seq_id) = Dx_Code;
			aDate(seq_id) = Date;
			aDept_ID(seq_id) = Dept_ID;
			aHosp_Adm_Code(seq_id) = Hosp_Adm_Code;
			aPt_Class_Code(seq_id) = Pt_Class_Code;
			aDischarge_Date(seq_id) = Discharge_Date;
		if last.MRN then output;
	run;

art297 · Posted 11-14-2019 11:10 AM

Jill,

The code, below, imports your test data and then shows how to use the transpose macro (as I originally posted) to make your long file wide, as well as how to use the untranspose macro to make your wide file long (and, at the same time, add the seq_ID variable you mentioned in your initial post).

You can find the actual macros, papers, powerpoints and tip sheets at:

https://github.com/art297/transpose

and

https://github.com/art297/untranspose

Here is the code:

filename reffile '/folders/myfolders/Partial Dataset C.csv';

proc import datafile=reffile
	DBMS=CSV
	OUT=work.have;
	GETNAMES=YES;
RUN;


filename tr url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include tr ;

%transpose(data=have, out=want_wide, by=MRN Date, Guessingrows=1000,
  autovars=all, sort=yes)


filename ut url 'https://raw.githubusercontent.com/art297/untranspose/master/untranspose.sas';
%include ut ;

%untranspose(data=want_wide, out=want_long, by=MRN Date,
  id=seq_ID,
  var=Enc_Code Enc_Type Enc_Complete Dx_Code Dx_Name Prov_Name Dept_ID Dept_Name
  Hosp_Adm_Code Hosp_Adm_Name Pt_Class_Code Pt_Class_Name Surg_Loc_Name
  Discharge_Date)

Art, CEO, AnalystFinder.com

View solution in original post

art297 · Posted 11-13-2019 02:16 PM

Since no one else has responded, here are my initial thoughts.

Yes, you can accomplish such a task with arrays, but you have to know how arrays work and what you need to do in order to get them to work. The code you supplied, as you discovered, won't work.

My suggestion would be to use the transpose macro to make your file wide, and the untranspose macro to made your wide files long. Neither requires a knowledge of how to work with arrays.

It will be an hour or so before I will have time to show you how both will work, but here is a brief taster of how to make your long file wide:

FILENAME REFFILE '/folders/myfolders/Partial Dataset C.csv';

PROC IMPORT DATAFILE=REFFILE
	DBMS=CSV
	OUT=WORK.have;
	GETNAMES=YES;
RUN;


filename tr url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include tr ;

%transpose(data=have, out=want, by=MRN Date, Guessingrows=1000,
  autovars=all, sort=yes)

Test that and see if it does what you were trying to do. If it does, let me/us know and I'll provide a similar example for the untranspose macro.

Art, CEO, AnalystFinder.com

jcorlette · Posted 11-14-2019 10:27 AM

Art,

Thank you so much, this code works fantastically for my full data set. I really appreciate your assistance. If you have time to send the untranspose code, that would be helpful.

Jill

art297 · Posted 11-14-2019 11:10 AM

Jill,

The code, below, imports your test data and then shows how to use the transpose macro (as I originally posted) to make your long file wide, as well as how to use the untranspose macro to make your wide file long (and, at the same time, add the seq_ID variable you mentioned in your initial post).

You can find the actual macros, papers, powerpoints and tip sheets at:

https://github.com/art297/transpose

and

https://github.com/art297/untranspose

Here is the code:

filename reffile '/folders/myfolders/Partial Dataset C.csv';

proc import datafile=reffile
	DBMS=CSV
	OUT=work.have;
	GETNAMES=YES;
RUN;


filename tr url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include tr ;

%transpose(data=have, out=want_wide, by=MRN Date, Guessingrows=1000,
  autovars=all, sort=yes)


filename ut url 'https://raw.githubusercontent.com/art297/untranspose/master/untranspose.sas';
%include ut ;

%untranspose(data=want_wide, out=want_long, by=MRN Date,
  id=seq_ID,
  var=Enc_Code Enc_Type Enc_Complete Dx_Code Dx_Name Prov_Name Dept_ID Dept_Name
  Hosp_Adm_Code Hosp_Adm_Name Pt_Class_Code Pt_Class_Name Surg_Loc_Name
  Discharge_Date)

Art, CEO, AnalystFinder.com

jcorlette · Posted 11-15-2019 10:05 AM

Art,

Thank you for the help and the links for more information. This coding has helped immensely and I can move forward with my project.

Jill

ballardw · Posted 11-13-2019 04:10 PM

@jcorlette wrote:

My desired outcome is to create a data set that has fewer rows of data organized by MRN then Date and having each row contain all associated information for that date. For each date, a patient may have multiple diagnosis codes from different providers, different patient classes, etc.

What exactly are your next steps that requires this step?

You will never know how many of anything are associated with any given record because every record will have to have as many variables with mostly missing data.

And a process that handles 10 such may very well fail as soon as there is someone with 20 sets of these values because you couldn't make the code flexible to handle every increasingly wide records.

Reducing records in this manner really seems oriented towards a person reading it and not analysis or modeling. So perhaps you want to look at report procedures such as Proc Report or Tabulate which have different tools for displaying values in groups.

jcorlette · Posted 11-14-2019 10:31 AM

Ballardw,

Thanks for your input. I was able to use Art's code and it worked great. For this data, I am not trying to reduce it per se, but to reorganize it so I can merge with additional data. A report isn't what I need as I am creating a larger data set for our statistician to run. I pull patient data and merge with various study data sets and send on for analysis.

Jill

Need to create fewer patient records from many records; is it correct to use ARRAY?

Re: Need to create fewer patient records from many records; is it correct to use ARRAY?

Re: Need to create fewer patient records from many records; is it correct to use ARRAY?

Re: Need to create fewer patient records from many records; is it correct to use ARRAY?

Re: Need to create fewer patient records from many records; is it correct to use ARRAY?

Re: Need to create fewer patient records from many records; is it correct to use ARRAY?

Re: Need to create fewer patient records from many records; is it correct to use ARRAY?

Re: Need to create fewer patient records from many records; is it correct to use ARRAY?

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!