BookmarkSubscribeRSS Feed
Jarakata
Calcite | Level 5

Hello all,

I would like to ask how to import data to .csv when the variable names are present in first three rows

(i.e. the line 1, line 2 and line 3 contains variable names, data starts from line 4 and continuous up to line 6585)

How I can use proc import for such data to have variable names correctly imported?

Thank you very much

5 REPLIES 5
PGStats
Opal | Level 21

Give us a (small) example of the input CSV data.

PG
GraphGuy
Meteorite | Level 14

Is it possible that the tool you're using to view the csv data file is just 'wrapping' the long lines onto what looks like 3 lines?

jimbarbour
Meteorite | Level 14

Well, to get the data, the basic answer is to have FIRSTOBS=4 coded on your Infile statement.  Proc Import generates a Data step.  The Infile in the Data step needs to have FIRSTOBS=4 coded so that the first three records are not treated as data.

 

Here's a simple Proc Import example:

PROC IMPORT
	DATAFILE	=	'I:\commercial\development\CSI_Suspects\pgm\Research\Test_Only.csv'
	OUT			=	WORK.My_Test_Data
	DBMS		=	CSV
	REPLACE
	;
RUN;

If you look at the log, you should see the following generated Data step:

/**********************************************************************
*   PRODUCT:   SAS
*   VERSION:   9.4
*   CREATOR:   External File Interface
*   DATE:      09SEP20
*   DESC:      Generated SAS Datastep Code
*   TEMPLATE SOURCE:  (None Specified.)
***********************************************************************/
data WORK.MY_TEST_DATA    ;
	infile 'I:\commercial\development\CSI_Suspects\pgm\Research\Test_Only.csv' 
		delimiter = ',' 
		lrecl=32767
		firstobs=2 
		MISSOVER 
		DSD
		;
	informat CPT_Code $5. ;
	informat OPSI_Description $11. ;
	informat Short_Desc $35. ;
	informat Long_Desc $48. ;
	informat Sub_Class $5. ;
	informat dw_updt_dt $10. ;
	informat Sheet $9. ;

	format CPT_Code $5. ;
	format OPSI_Description $11. ;
	format Short_Desc $35. ;
	format Long_Desc $48. ;
	format Sub_Class $5. ;
	format dw_updt_dt $10. ;
	format Sheet $9. ;

	input
		CPT_Code  $
		OPSI_Description  $
		Short_Desc  $
		Long_Desc  $
		Sub_Class  $
		dw_updt_dt  $
		Sheet  $
		;
RUN;

I've cleaned the code up slightly from how it would look in the log so I can read it, but that's basically what will be generated.  Notice that the FIRSTOBS parameter is set to 2.

 

What I would do is copy the generated code from the log, paste the code into a new editor window, change FIRSTOBS to 4, save the code as a .sas file, and then run the code. 

 

Now, you have the correct data.  However, you want the variable names to be gathered from the first three rows.  That's a bit trickier.  Let me think about that.

 

Jim

Kurt_Bremser
Super User
  • open the file with a text editor
  • no, that does not include Excel, but things like the Windows Editor or Notepad++
  • mark the lines including the headers and some data lines
  • copy (ctrl-C on Windows, command-V on a Mac)
  • click the </> button here on the communities
  • paste (ctrl-V on Windows, command-V on a Mac)

We will then be able to see exactly what is in your file, and make educated suggestions

jimbarbour
Meteorite | Level 14

@Jarakata,

 

There may be a more elegant way to do this, but here's a working solution:

1.  First, code and run the Proc Import as I suggested in my first reply.

2.  Then, copy the generated data step into an editor window.  You'll want to create two copies of the Data step.

3.  With the first copy of the Data step, you'll need to pull out the informats and the Input section and place them inside a Macro definition as shown below.  Notice that each variable ends with an "&i" in the below example.  The Macro will execute a %DO loop three times so as to read the first three lines of your data to get the variable names.  I deleted the format statements since they are not necessary here.  I also increased any small definitions in the informats to $32 so as not to truncate any titles.  The Macro ends with a series of Call Symputx commands to create macro variables that are a concatenation of the first three lines for each column. 

 

There are some requirements here:

a)   There can be no embedded blanks in the variable names

b)  There can only be numbers, underscores, and letters in the variable names and the first character cannot be a number (I'm using v7 names).

c)  The variable names can only have a maximum of 32 characters when all three lines are assembled.  If you have more than 32 characters, you'll probably have to truncate to 32 as I did with the second variable in the below example (see the SYMPUTX command).

%MACRO	Generate_Input_Statements;
	%LOCAL	i;
	%DO	i	=	1	%TO	3;
		informat CPT_Code&i $32. ;
		informat OPSI_Description&i $32. ;
		informat Short_Desc&i $35. ;
		informat Long_Desc&i $48. ;
		informat Sub_Class&i $32. ;
		informat dw_updt_dt&i $32. ;
		informat Sheet&i $32. ;

		input
			CPT_Code&i  		$
			OPSI_Description&i	$
			Short_Desc&i  		$
			Long_Desc&i  		$
			Sub_Class&i  		$
			dw_updt_dt&i  		$
			Sheet&i  			$
			;
	%END;

	CALL	SYMPUTX('CPT_Code', CATS(CPT_Code1, '_', CPT_Code2, '_', CPT_Code3), 'G');
	CALL	SYMPUTX('OPSI_Description', SUBSTR(CATS(OPSI_Description1, '_', OPSI_Description2, '_', OPSI_Description3), 1, 32), 'G');
	CALL	SYMPUTX('Short_Desc', CATS(Short_Desc1, '_', Short_Desc2, '_', Short_Desc3), 'G');
	CALL	SYMPUTX('Long_Desc', CATS(Long_Desc1, '_', Long_Desc2, '_', Long_Desc3), 'G');
	CALL	SYMPUTX('Sub_Class', CATS(Sub_Class1, '_', Sub_Class2, '_', Sub_Class3), 'G');
	CALL	SYMPUTX('dw_updt_dt', CATS(dw_updt_dt1, '_', dw_updt_dt2, '_', dw_updt_dt3), 'G');
	CALL	SYMPUTX('Sheet', CATS(Sheet1, '_', Sheet2, '_', Sheet3), 'G');
%MEND	Generate_Input_Statements;

4.  Them call the macro with a little Data step like so:

data _NULL_;
	infile 'I:\commercial\development\CSI_Suspects\pgm\Research\Test_Only.csv' 
		delimiter = ',' 
		lrecl=32767
		firstobs=1
		MISSOVER 
		DSD
		;

	%Generate_Input_Statements;

	STOP;
RUN;

Notice that firstobs must be 1 in order to capture all three of the first three lines.  I'm also specifying Data _NULL_ since I'm just creating macro variables to hold the variable names from the first three rows.

 

5.  Then use the second copy I asked you to get from the SAS log (see step 2, above) and modify all the variable names to be macro variables by putting an ampersand in front of each of the variable names.  The macro variable names will be translated by SAS into the variable names created by the first data step.

/**********************************************************************
*   PRODUCT:   SAS
*   VERSION:   9.4
*   CREATOR:   External File Interface
*   DATE:      09SEP20
*   DESC:      Generated SAS Datastep Code
*   TEMPLATE SOURCE:  (None Specified.)
***********************************************************************/
data WORK.MY_TEST_DATA    ;
	infile 'I:\commercial\development\CSI_Suspects\pgm\Research\Test_Only.csv' 
		delimiter = ',' 
		lrecl=32767
		firstobs=2 
		MISSOVER 
		DSD
		;
	informat &CPT_Code $5. ;
	informat &OPSI_Description $11. ;
	informat &Short_Desc $35. ;
	informat &Long_Desc $48. ;
	informat &Sub_Class $5. ;
	informat &dw_updt_dt $10. ;
	informat &Sheet $9. ;

	format &CPT_Code $5. ;
	format &OPSI_Description $11. ;
	format &Short_Desc $35. ;
	format &Long_Desc $48. ;
	format &Sub_Class $5. ;
	format &dw_updt_dt $10. ;
	format &Sheet $9. ;

	input
		&CPT_Code 			$
		&OPSI_Description  	$
		&Short_Desc  		$
		&Long_Desc  		$
		&Sub_Class  		$
		&dw_updt_dt  		$
		&Sheet  			$
		;
RUN;

Now, when we run, the variable names in the output SAS data set are a concatenation of the first three lines for each column.  In the example below, I used some nonsensical, repetitious variable names, but, even though they look funny in this example, these are the correct data names based on the first three columns.

 Get_Var_Names_First_Three_Rows_2020-09-09_15-57-22.jpg

 

Jim

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 723 views
  • 0 likes
  • 5 in conversation