01-20-2015 03:18 PM
Hi, I am a beginner of SAS now.
Now I have a data which contains extremely long character variables. Which means the length of these variables are greater than 8. Meanwhile, there are blank and comma within this varable. What's worse, within the same column, there are different format of date.
I wonder how I should do. Actually I have an idea to use macro variables to track length of each variable and then use @ to locate where I should start to read for next variable. However once I have defined the length of several variables, I do not know the order that SAS read them.
I really appreciate your help.
Thank you very much.
01-20-2015 03:34 PM
My advice is not to jump on SAS Macro variables as the solution to your problem. I am not even sure I understand what your input data looks like. While your verbal description is OK, it is not clear. You said that you have "a data which contains extremely long character variables" but that doesn't convey whether you have a SAS data set, a raw ASCII text file of data or some other form of data (RDBMS, Excel, Microsoft Access), that you are trying to read/import.
Usually when you ask as question about reading data, it is useful to post a sample of the data and the code that you have tried to use. You say that you are importing -- how are you doing the import? With code? What code? Proc IMPORT? Data step program, Enterprise Guide task?
It would help everybody help you if you could post some of your data and provide information about where the data is coming from. Also provide an example of the code that you've tried to use. And, let us know the ultimate goal (you want a SAS data set so you can forecast something or you want a SAS dataset so you can run PROC GLM or you want a SAS dataset for a frequency report).
01-21-2015 09:56 AM
Thank you SOOOOOO much for your advice. I am posting the data now.
Sorry to confuse you, this is my first question in the community.:smileyblush:
01-20-2015 05:26 PM
Consider that there is nothing unusual (SAS input processing wise) about very long SAS CHARACTER variables - it's all about the LENGTH statement / variable definitions.
And with stated condition for different formatted dates in one column, one time-honored approach then is to use DATA / INFILE / INPUT processing to read each column as $CHARn. and then scrub the date-related / date-time-related columns using assignment statement(s) and the INPUT(....) function, as well as possibly "flexible" INFORMATs for "ANY" date / datetime format handling, though use those with care and with validation logic.
01-21-2015 10:13 AM
I am so sorry that the data is confidential. I could only keep the form of the data. I am not sure whether you could accept it.
Actually I found a way to solve my problem yesterday to some extend. That means I use excel to change the format of character variables. For example let "Diagnosis" become "Diagnosis". Then I will know when to read this variable.
However, when testing my code, I found although there is "N:" before "Complication" variable of the 5th observation, sas does not read it into sas data set.
Below is my code:
***Reading from Text file;
length Diagnosi $ 100
/*Procedur $ 10000*/
Anesthet $ 20
Complica $ 100
Notes $ 50
infile 'F:\Patient Info.txt' delimiter = '09'x MISSOVER truncover DSD lrecl=32767 firstobs=2 obs=10;
@'D:' Diagnosi $
/*@'P:' Procedur $*/
@'A:' Anesthet $
@'C:' Complica $
@'N:' Notes $
proc print data=PatientInfo;
01-21-2015 12:28 PM
To fix the length just set the variables as long as you need them to be. Set them way too big to begin with and you can adjust them later.
The DSD option should handle the embedded delimiters. Your sample file looks to have TAB as the delimiter.
You might have an issue with the embedded new lines. See if using TERMSTR=CRLF fixes that. If not you can search this site and find examples of programs to pre-process the file to remove the embedded new lines so that file can be read as one record per observation.
infile 'F:\Patient Info.txt' dsd delimiter = '09'x truncover lrecl=32767
firstobs=2 TERMSTR=CRLF obs=5
informat DOB DOS anydtdte.;
format DOB DOS YYMMDD10.;
input PatientID -- Notes ;
proc sql noprint ;
01-21-2015 01:18 PM
Oh you solved me another question. Since I only set some of variables in length statement, the order of variables in the output dataset is not the same as the original one.
01-21-2015 01:56 PM
Why is the SAS-internal variable order of importance? Also, the DSD parameter purposefully addresses the condition where you have consecutive delimiters with intention of assigning a blank/null value to the corresponding SAS variable.