BookmarkSubscribeRSS Feed
sks521
Quartz | Level 8

Hi folks,

 

I am importing a data file using proc import. Although the log says the new data set has been created successfully, it also give a long message on the log window as follows;

 


1 proc import datafile = 'I:\cYorkshire Project\Supported self-care
1 ! intervention\Sarwat\Data\SarwatSCP18052019\Demographics.csv'
2 out = demo
3 dbms = csv;
4 run;

5 /**********************************************************************
6 * PRODUCT: SAS
7 * VERSION: 9.4
8 * CREATOR: External File Interface
9 * DATE: 26JUN19
10 * DESC: Generated SAS Datastep Code
11 * TEMPLATE SOURCE: (None Specified.)
12 ***********************************************************************/
13 data WORK.DEMO ;
14 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
15 infile 'I:\cYorkshire Project\Supported self-care
15 ! intervention\Sarwat\Data\SarwatSCP18052019\Demographics.csv' delimiter = ',' MISSOVER DSD
15 ! lrecl=32767 firstobs=2 ;
16 informat pkid best32. ;
17 informat Practice $24. ;
18 informat Age best32. ;
19 informat Status $8. ;
20 informat Ethnicity $62. ;
21 informat Marital_status $29. ;
22 informat Religion $22. ;
23 informat Sex $6. ;
24 informat PartPcode $4. ;
25 informat IMDScore best32. ;
26 informat FirstEFICalc best32. ;
27 informat LastEFICalc best32. ;
28 informat TotalDeficitsAtStart best32. ;
29 informat TotalDeficitsAtEnd best32. ;
30 informat FirstEfiScore $3. ;
31 informat LastEfiScore $8. ;
32 format pkid best12. ;
33 format Practice $24. ;
34 format Age best12. ;
35 format Status $8. ;
36 format Ethnicity $62. ;
37 format Marital_status $29. ;
38 format Religion $22. ;
39 format Sex $6. ;
40 format PartPcode $4. ;
41 format IMDScore best12. ;
42 format FirstEFICalc best12. ;
43 format LastEFICalc best12. ;
44 format TotalDeficitsAtStart best12. ;
45 format TotalDeficitsAtEnd best12. ;
46 format FirstEfiScore $3. ;
47 format LastEfiScore $8. ;
48 input
49 pkid
50 Practice $
51 Age
52 Status $
53 Ethnicity $
54 Marital_status $
55 Religion $
56 Sex $
57 PartPcode $
58 IMDScore
59 FirstEFICalc
60 LastEFICalc
61 TotalDeficitsAtStart
62 TotalDeficitsAtEnd
63 FirstEfiScore $
64 LastEfiScore $
65 ;
66 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
67 run;

NOTE: The infile 'I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Demographics.csv' is:

Filename=I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Demographics.csv,
RECFM=V,LRECL=32767,File Size (bytes)=67644,
Last Modified=17 June 2019 11:49:29,
Create Time=17 June 2019 11:49:30

NOTE: 409 records were read from the infile 'I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Demographics.csv'.
The minimum record length was 92.
The maximum record length was 219.
NOTE: The data set WORK.DEMO has 409 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.20 seconds
cpu time 0.06 seconds


409 rows created in WORK.DEMO from I:\cYorkshire Project\Supported self-care
intervention\Sarwat\Data\SarwatSCP18052019\Demographics.csv.

 

NOTE: WORK.DEMO data set was successfully created.
NOTE: The data set WORK.DEMO has 409 observations and 16 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.82 seconds
cpu time 0.14 seconds

 

 

 

I am not sure if there is something wrong my import procedure or is this something I can ignore please?

 

Appreciate your help and contribution towards this forum.

 

S

 

8 REPLIES 8
Amir
PROC Star

Unless I'm missing anything, all the "NOTE:" messages seem quite normal to me.

 

Was there a specific message in the whole log that you're asking about? If yes, then please specify the particular message.

 

Otherwise I don't see anything obviously wrong.

 

Regards,

Amir.

sks521
Quartz | Level 8

Nothing specific but I haven't come across this kind of log before that's why thought of consulting the forum.

 

Thanks for your prompt response.

Kurt_Bremser
Super User

It does not indicate any problem, instead it is a very valuable service the import procedure provides. When importing text files, import analyzes the file (according to the dbms= option) and creates a data step that actually reads the file, and this is what creates the log. This lets you copy the data step from the log and refine/correct it as needed.

proc import is not suited well for creating consistent, repeatable results, as it always makes guesses about the structure of the input, but it can be used as a starting point by those who are at the early stages of building their data step coding skills.

sks521
Quartz | Level 8

Hi Kurt,

 

What do you suggest as a better alternative to Proc Import please?

 

Thanks

Kurt_Bremser
Super User

@sks521 wrote:

Hi Kurt,

 

What do you suggest as a better alternative to Proc Import please?

 

Thanks


Write your own data step.

In this particular case, you can take the data step code from the log (holding down the Alt key while marking the text lets you exclude the line numbers), and adapt it according to the documentation you received with the csv file (if you did not get such a documentation, insist on getting one; documentation is the difference between a process and a pile of ****).

You may find that import misinterpreted codes as numbers (codes are better kept as characters), or made character columns shorter than they should be (causing problems later on when datasets are combined), or did not import numbers as such because of some local format used.

sks521
Quartz | Level 8

That's really helpful Kurt! I appreciate.

ballardw
Super User

@sks521 wrote:

That's really helpful Kurt! I appreciate.


Also since you did not use a guessingrows option with Proc Import SAS only examined a few rows of data. Any field that is character may be suspect for truncated values. So either compare the length with supplied documentation as the expected length of variables or examine the values closely for truncated values. Increase the length of the associated informats to expected/needed length.

 

Another advantage with the data step is set "better" variable names and labels for better descriptions of variables.

If have some data where the proc import variable name came in as

CLIENT_EDUCATION_1_HS_GED_LAST_G

(after truncation to fit the 32 character limit). Since the variable basically held an indicator for High School Graduation or having a GED certificate I could use search an replace in the editor to change the variable name to HS_GED, which is much shorted for when I needed to use it in code and then add

label HS_GED = 'Education of High School diploma or GED';

for nicer labels in reports and to document just what that variable was intended for.

Another is some of the stupider sources of data I have worked with will have the, in addition to long text for column heading, exact same characters for the first 32 (or 40 or 50 or 60) characters. The rules with proc import can't deal with that well so you get a variable name like VAR23 which just indicates it was the 23rd column as a name based on the column heading can't be built. So you get to set your own variable name. I have had other data sources that repeated have a column labeled TOTAL or similar for different groups of values. Proc Import will end up with Total Total2 Total3 or similar to differentiate between the different columns. The data step would let you set more meaningful variable names like TotalPurchased, TotalSold, or TotalOrders so you can write better code. Remember you still have the option of labels to give more complete descriptions.

 

Lastly a data step can provide initial data checking using tools that Proc Import can't provide. Quite often some variables should only have a limited number of values. You could use a custom informat build with INVALUE to check on those values. Additionally that custom informat could change codes used to indicate not collected or missing values to missing such as NA for variables that should be numeric. Proc import might well create those values as character because of the presence of the NA or N/A or what have you. And data that uses as special numeric code to indicate missing, such as the ever popular 999999 or -99999 could be read directly to SAS missing.

Tom
Super User Tom
Super User

Write your own data step.  It is very easy to write code that is simpler and clearer than what PROC IMPORT generates.

data demo ;
  infile 'I:\cYorkshire Project\Supported self-care intervention\Sarwat\Data\SarwatSCP18052019\Demographics.csv'
    dsd truncover firstobs=2 
  ;
  length pkid 8 Practice $24 Age 8 Status $8 Ethnicity $62 Marital_status $29
         Religion $22 Sex $6 PartPcode $4 IMDScore 8 FirstEFICalc 8 LastEFICalc 8
         TotalDeficitsAtStart 8 TotalDeficitsAtEnd 8 FirstEfiScore $3 LastEfiScore $8
  ;
  input pkid -- LastEfiScore ;
run;

Plus then you can fix things like are the two EfiScore variables really character strings? 

And if so why does one have a max length of 3 bytes and the other 8 bytes? 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 754 views
  • 8 likes
  • 5 in conversation