BookmarkSubscribeRSS Feed
Anovalyy
Calcite | Level 5

Hi, I am a beginner of SAS now.

Now I have a data which contains extremely long character variables. Which means the length of these variables are greater than 8. Meanwhile, there are blank and comma within this varable. What's worse, within the same column, there are different format of date.

I wonder how I should do. Actually I have an idea to use macro variables to track length of each variable and then use @ to locate where I should start to read for next variable. However once I have defined the length of several variables, I do not know the order that SAS read them.

I really appreciate your help.

Thank you very much.

8 REPLIES 8
Cynthia_sas
SAS Super FREQ

Hi:

  My advice is not to jump on SAS Macro variables as the solution to your problem. I am not even sure I understand what your input data looks like. While your verbal description is OK, it is not clear. You said that you have "a data which contains extremely long character variables" but that doesn't convey whether you have a SAS data set, a raw ASCII text file of data or some other form of data (RDBMS, Excel, Microsoft Access), that you are trying to read/import.

  Usually when you ask as question about reading data, it is useful to post a sample of the data and the code that you have tried to use. You say that you are importing -- how are you doing the import? With code? What code? Proc IMPORT? Data step program, Enterprise Guide task?

  It would help everybody help you if you could post some of your data and provide information about where the data is coming from. Also provide an example of the code that you've tried to use. And, let us know the ultimate goal (you want a SAS data set so you can forecast something or you want a SAS dataset so you can run PROC GLM or you want a SAS dataset for a frequency report).

cynthia

Anovalyy
Calcite | Level 5

Thank you SOOOOOO much for your advice. I am posting the data now.

Sorry to confuse you, this is my first question in the community.:smileyblush:

sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10

Consider that there is nothing unusual (SAS input processing wise) about very long SAS CHARACTER variables - it's all about the LENGTH statement / variable definitions.

And with stated condition for different formatted dates in one column, one time-honored approach then is to use DATA / INFILE / INPUT processing to read each column as $CHARn.  and then scrub the date-related / date-time-related columns using assignment statement(s) and the INPUT(....) function, as well as possibly "flexible" INFORMATs for "ANY" date / datetime format handling, though use those with care and with validation logic.

Scott Barry

SBBWorks, Inc.

Anovalyy
Calcite | Level 5

Thank you I will try.

Anovalyy
Calcite | Level 5

I am so sorry that the data is confidential. I could only keep the form of the data. I am not sure whether you could accept it.

Actually I found a way to solve my problem yesterday to some extend. That means I use excel to change the format of character variables. For example let "Diagnosis" become "D:Diagnosis". Then I will know when to read this variable.

However, when testing my code, I found although there is "N:" before "Complication" variable of the 5th observation, sas does not read it into sas data set.

Below is my code:

***Reading from Text file;

data PatientInfo;

      length Diagnosi $ 100

          /*Procedur $ 10000*/

          Anesthet $ 20

  Complica $ 100

  Notes $ 50

  ;

      infile 'F:\Patient Info.txt' delimiter = '09'x MISSOVER truncover DSD lrecl=32767 firstobs=2 obs=10;

      input

                PatienID

                DOB 

         DOS

                @'D:' Diagnosi $

  /*@'P:' Procedur $*/

                @'A:' Anesthet $

                PACUIn

                PACUOut

                Discharg

                @'C:' Complica $

                @'N:' Notes $

     ;

run;

proc print data=PatientInfo;

run;

Tom
Super User Tom
Super User

To fix the length just set the variables as long as you need them to be. Set them way too big to begin with and you can adjust them later.

The DSD option should handle the embedded delimiters.  Your sample file looks to have TAB as the delimiter.

You might have an issue with the embedded new lines.  See if using TERMSTR=CRLF fixes that.  If not you can search this site and find examples of programs to pre-process the file to remove the embedded new lines so that file can be read as one record per observation.

options compress=yes;

data PatientInfo;

length

    PatientID 8

    DOB  8

    DOS 8

    Diagnosis $1000

    Procedure $1000

    Anesthetic $1000

    Complication $1000

    Notes $1000

  ;

  infile 'F:\Patient Info.txt' dsd delimiter = '09'x truncover lrecl=32767

    firstobs=2 TERMSTR=CRLF obs=5

  ;

  informat DOB DOS anydtdte.;

  format DOB DOS YYMMDD10.;

  input PatientID -- Notes ;

run;

proc sql noprint ;

  select max(length(diagnosis))

       , max(length(procedure))

       , max(length(anesthetic))

       , max(length(complication))

       , max(length(notes))

    into :l_diagnosis

       , :l_procedure

       , :l_anesthetic

       , :l_complication

       , :l_notes

    from PatientInfo

   ;

quit;

Anovalyy
Calcite | Level 5

Oh you solved me another question. Since I only set some of variables in length statement, the order of variables in the output dataset is not the same as the original one.

sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10

Why is the SAS-internal variable order of importance?  Also, the DSD parameter purposefully addresses the condition where you have consecutive delimiters with intention of assigning a blank/null value to the corresponding SAS variable.

Scott Barry

SBBWorks, Inc.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1218 views
  • 3 likes
  • 4 in conversation