DATA Step, Macro, Functions and more

Data not importing correctly from proc import command

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Data not importing correctly from proc import command

 

Hello all,

 

I'm a graduate student learning SAS and I have an issue regarding the proc import command.  I used the following command to load my data from a csv file: 

 

 

proc import out=sasuser.Load_Data
		datafile = "d:\ECE_592_DA\Homework1\Historical_Load_Data.csv"
		dbms=csv replace;
	getnames=No;
	datarow=1;
run;

The file has no header names so my data starts at row 1.  The first column includes names like Segment:G, Segment:GF, Segment:R, Segment:RL, Segment:RLH, and so on while the remaining columns contain load data and dates.  It's this first column that I am having issue with the above import procedure.  The log tells me the import was completed successfully and I can verify the data in the table.  Upon verifying column 1, I noticed that my data that is attached to Segment:GF is list as Segment:G, Segments RL, RLH all show up as Segment:R, not Segment:RL or Segment:RLH.  I've included my csv file so hopefully you will be able to duplicate what I am seeing. I am using SAS v9.4 Desktop.  Am I doing something wrong or what could be causing this issue?  


Accepted Solutions
Solution
‎01-25-2018 10:37 PM
Super User
Posts: 24,028

Re: Data not importing correctly from proc import command

PROC IMPORT is a guessing procedure, it has to guess at lengths and types and doesn't always do so correctly. 

 

There's an option called GUESSINGROWS that you can use to force PROC IMPORT to scan the entire column before determining the length, but it slows down the process for obvious reasons.  Other options are to use the code generated from PROC IMPORT - check your log after the import - and modify the code according to your needs, such as manually specifying names rather than the generic names you'll get after specifying getnames=NO. 

 

proc import out=sasuser.Load_Data
		datafile = "d:\ECE_592_DA\Homework1\Historical_Load_Data.csv"
		dbms=csv replace;
	getnames=No;
	datarow=1;
     guessingrows=max;
run;

atsprink wrote:

 

Hello all,

 

I'm a graduate student learning SAS and I have an issue regarding the proc import command.  I used the following command to load my data from a csv file: 

 

 

proc import out=sasuser.Load_Data
		datafile = "d:\ECE_592_DA\Homework1\Historical_Load_Data.csv"
		dbms=csv replace;
	getnames=No;
	datarow=1;
run;

The file has no header names so my data starts at row 1.  The first column includes names like Segment:G, Segment:GF, Segment:R, Segment:RL, Segment:RLH, and so on while the remaining columns contain load data and dates.  It's this first column that I am having issue with the above import procedure.  The log tells me the import was completed successfully and I can verify the data in the table.  Upon verifying column 1, I noticed that my data that is attached to Segment:GF is list as Segment:G, Segments RL, RLH all show up as Segment:R, not Segment:RL or Segment:RLH.  I've included my csv file so hopefully you will be able to duplicate what I am seeing. I am using SAS v9.4 Desktop.  Am I doing something wrong or what could be causing this issue?  


 

View solution in original post


All Replies
Solution
‎01-25-2018 10:37 PM
Super User
Posts: 24,028

Re: Data not importing correctly from proc import command

PROC IMPORT is a guessing procedure, it has to guess at lengths and types and doesn't always do so correctly. 

 

There's an option called GUESSINGROWS that you can use to force PROC IMPORT to scan the entire column before determining the length, but it slows down the process for obvious reasons.  Other options are to use the code generated from PROC IMPORT - check your log after the import - and modify the code according to your needs, such as manually specifying names rather than the generic names you'll get after specifying getnames=NO. 

 

proc import out=sasuser.Load_Data
		datafile = "d:\ECE_592_DA\Homework1\Historical_Load_Data.csv"
		dbms=csv replace;
	getnames=No;
	datarow=1;
     guessingrows=max;
run;

atsprink wrote:

 

Hello all,

 

I'm a graduate student learning SAS and I have an issue regarding the proc import command.  I used the following command to load my data from a csv file: 

 

 

proc import out=sasuser.Load_Data
		datafile = "d:\ECE_592_DA\Homework1\Historical_Load_Data.csv"
		dbms=csv replace;
	getnames=No;
	datarow=1;
run;

The file has no header names so my data starts at row 1.  The first column includes names like Segment:G, Segment:GF, Segment:R, Segment:RL, Segment:RLH, and so on while the remaining columns contain load data and dates.  It's this first column that I am having issue with the above import procedure.  The log tells me the import was completed successfully and I can verify the data in the table.  Upon verifying column 1, I noticed that my data that is attached to Segment:GF is list as Segment:G, Segments RL, RLH all show up as Segment:R, not Segment:RL or Segment:RLH.  I've included my csv file so hopefully you will be able to duplicate what I am seeing. I am using SAS v9.4 Desktop.  Am I doing something wrong or what could be causing this issue?  


 

Occasional Contributor
Posts: 7

Re: Data not importing correctly from proc import command

Thank you for this information.

Super User
Super User
Posts: 8,290

Re: Data not importing correctly from proc import command

Why would you use PROC IMPORT to read a file with essentially 3 columns?

Segment:G,1/1/2010,3.1262,3.0345,2.9819,2.968,3.0524,3.0172,3.0267,2.9349,2.7995,2.8859,2.9866,3.0194,3.0495,3.0582,3.0595,3.0283,3.1619,3.4842,3.5372,3.4599,3.5386,3.561,3.6349,3.4937
Segment:G,1/2/2010,3.3205,3.3003,3.2801,3.3764,3.4237,3.5017,3.5658,3.8261,3.8804,3.8389,4.1683,4.194,4.2089,4.0638,4.0448,4.0661,4.2173,4.5358,4.6468,4.5617,4.3966,4.2813,4.2437,4.193
Segment:G,1/3/2010,4.0464,4.0058,4.0333,4.0137,4.0445,4.0875,4.2227,4.4313,4.2739,4.1714,4.2434,4.2165,4.1589,3.9059,3.994,3.8639,4.0756,4.4202,4.3971,4.2478,4.1731,4.1198,4.0652,4.021
Segment:G,1/4/2010,3.8524,3.8817,3.8392,3.7621,3.896,4.0639,4.4892,4.7545,4.808,4.8887,4.9319,5.0466,4.9217,4.8207,4.8316,4.7003,4.6439,4.9401,4.6117,4.5236,4.3715,4.1594,3.9413,3.7188
Segment:G,1/5/2010,3.5521,3.4182,3.5102,3.4707,3.5777,3.6358,4.1973,4.6463,4.4113,4.5972,4.6898,4.7021,4.5553,4.4737,4.3487,4.5134,4.6464,4.7055,4.5561,4.4435,4.1417,4.0116,3.7921,3.5819
Segment:G,1/6/2010,3.4821,3.5018,3.5402,3.5301,3.5521,3.6039,3.9982,4.3366,4.1013,4.3492,4.5461,4.5474,4.5736,4.5179,4.5858,4.5057,4.5546,4.7998,4.4487,4.2267,4.1166,3.8399,3.6421,3.4553
Segment:G,1/7/2010,3.3963,3.3656,3.376,3.4465,3.4987,3.5828,4.0469,4.3599,4.323,4.4299,4.4774,4.1881,4.2088,3.9108,3.8289,3.7038,3.7239,3.9413,3.8928,3.5921,3.4761,3.3163,3.0708,2.9516
Segment:G,1/8/2010,3.0063,3.0158,3.0766,3.1083,3.1306,3.2829,3.6874,3.9473,3.843,4.0256,4.3515,4.4409,4.5379,4.296,4.3988,4.3083,4.2862,4.4261,4.4644,4.0014,3.9737,3.749,3.6013,3.5413
Segment:G,1/9/2010,3.4144,3.3907,3.3674,3.36,3.3245,3.3714,3.6477,3.7275,3.7499,3.7127,3.98,3.9556,3.7546,3.5818,3.5387,3.4196,3.3368,3.8862,3.8471,3.7017,3.726,3.6138,3.4975,3.5038

You have some type of ID variable, a date field and series of numbers.

data want ;
  length id $20 date 8 column value 8 ;
  infile 'd:\ECE_592_DA\Homework1\Historical_Load_Data.csv' dsd truncover ;
  informat date mmddyy.;
  format date yymmdd10. ;
  input id date @ ;
  do column=1 by 1 until (missing(value));
     input value @;
     if not missing(value) then output;
  end;
run;

So you get a nice dataset like this:

Obs       id              date    column     value

  1    Segment:G    2010-01-01       1      3.1262
  2    Segment:G    2010-01-01       2      3.0345
  3    Segment:G    2010-01-01       3      2.9819
  4    Segment:G    2010-01-01       4      2.9680
  5    Segment:G    2010-01-01       5      3.0524
  6    Segment:G    2010-01-01       6      3.0172
  7    Segment:G    2010-01-01       7      3.0267
  8    Segment:G    2010-01-01       8      2.9349
  9    Segment:G    2010-01-01       9      2.7995
 10    Segment:G    2010-01-01      10      2.8859
Occasional Contributor
Posts: 7

Re: Data not importing correctly from proc import command

Hi Tom,

To be honest I understand maybe two lines of your code.  I only used the proc import command because I saw it on a basic video for importing SAS that our professor who works for SAS showed us in class.  I guess I could have simply used the File->Import feature as well.  I am used to  MATLAB  as an engineer so learning SAS is definitely a struggle as the documentation doesn't read as easy as MATLAB's.  I'm sure that's because I have way more experience with it versus SAS though. 

 

Super User
Super User
Posts: 8,290

Re: Data not importing correctly from proc import command

[ Edited ]

atsprink wrote:

Hi Tom,

To be honest I understand maybe two lines of your code.  I only used the proc import command because I saw it on a basic video for importing SAS that our professor who works for SAS showed us in class.  I guess I could have simply used the File->Import feature as well.  I am used to  MATLAB  as an engineer so learning SAS is definitely a struggle as the documentation doesn't read as easy as MATLAB's.  I'm sure that's because I have way more experience with it versus SAS though. 

 


Start by reading the basics of SAS concepts.  https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001331149.htm

So that you understand what a dataset is and what a variable is.  

 

Learn how to read in data. Here is an introductory page. The second example code block is for data similar to yours.

https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a003209907.htm

 

Here is a more basic data step, but one that creates a dataset with many columns.  It will look more like what your CSV file would look like if you opened it in Excel, but that structure with multiple measures stored in multiple variables will be much harder to work with.

data want ;
  length id $20 date 8 col1-col24 8 ;
  infile 'd:\ECE_592_DA\Homework1\Historical_Load_Data.csv' dsd truncover ;
  informat date mmddyy.;
  format date yymmdd10. ;
  input id date col1-col24 ;
run;

It was trivial for me to look at your file and decide that a length of $20 would probably work for the first column and that you had 24 columns of numbers.  But since this is your data you probably knew that already without having to look at the file.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 752 views
  • 0 likes
  • 3 in conversation