DATA Step, Macro, Functions and more

malfunctionin

Reply
Occasional Contributor
Posts: 15

malfunctionin

Hi,

hi,

when I'm importing data from csv file, then some of the entries are not being imported properly. for example, the original value of a variable was 941, but in imported file in SAS, that 941 has become only 9. can you tell me what type of problem is this and how can I fix it? I have upgraded still have the same problem. I have used follwoing command:

 

PROC IMPORT OUT= seleg

     DATAFILE="G:\seleg.csv"

     DBMS=csv REPLACe;

     GETNAMES=YES;

     DATAROW=2;

     Run

 

the file I have imported is a very big file with a large no. of variables. many variables towards the column no. 1676 had similar values with reduced legnth of the column than the original. Would you pls. help me solve this problem? 

Super User
Super User
Posts: 6,502

Re: malfunctionin

[ Edited ]

PROC IMPORT has to guess how the variables are defined in a CSV file since there is no metadata other than the column headings.

Usually it just looks at the first few rows and tries to figure out the variable type and length.  If the later data is of a different type or longer length then it can get truncated.  Also if the there are a lot of variables or really long values then it might truncate the input lines which also might explain your problems.

It is trivial to write your own data step code to read a CSV file.  You just need to define you variable names and types. One way to start is to run PROC IMPORT in interactive SAS and recall the code it generates and fix it.  Or you could copy the names from the first line of the file and use them to define the variables. Or you could just make your own variable names so you can take a look at the data. With 1672 variable they probably do not have very meaningful names anyway.

 

For example if your variables are all numbers then your data step can be as simple as this:

 

data want;
  infile 'myfile.csv' dsd firstobs=2 truncover lrecl=1000000 ;
  input var1-var1672 ;
run;

 

Occasional Contributor
Posts: 15

Re: malfunctionin

thanks a lot, but that is changing the variable names, but for my analysis, is it possible to keep the original variable name? would you pls tell me how to do that keeping the original variable name unchanged?

Super User
Super User
Posts: 6,502

Re: malfunctionin

If you know the names the variables then use them in writing the data step to read them. If you have structured metadata then you can use SAS to generate the SAS code to the read the data.  You can try changing the guessingrows option on PROC IMPORT, but if the problem is that your input lines are longer then 32K characters then PROC IMPORT will probably not work for you.

 

As I said I normally just read the first row with the names and paste them into the program editor and then convert them to a LENGTH statement to set the type and length of the variables then I just use INPUT first -- last as the input statement.

 

For example if the first line of your CSV file looks like:

 

id,age,weight,height

Then you can copy that into the program editor and change it to something like:

length id $20 age weight height 8;

Then your data step can be 

data want;
  infile 'myfile.csv' dsd truncover firstobs=2 lrecl=100000 ;
  length id $20 age weight height 8 ;
  input id -- height;
run;

if you have 1600 variables and they are mixed randomly between character and numeric then this will involve a lot of decision making and typing, but you will total control over the resulting data set that is created.  If they are all numeric or all character (and you can just set them all to the same length) then the typing is minimal since all you need to do is convert the commas to spaces and the LENGHT key word to the beginning and the appropriate length to the end.

Super User
Posts: 10,516

Re: malfunctionin


mahfuz_pops wrote:

thanks a lot, but that is changing the variable names, but for my analysis, is it possible to keep the original variable name? would you pls tell me how to do that keeping the original variable name unchanged?


If by "original variable name" you mean the column headings in your CSV the answer is "maybe". If your column headings have spaces in them; characters other than letters, numbers or the _ character; start with a number; or are longer than 32 characters, then no. But you can associate a label with the variable that is longer and not quite a picky as to content to contain that information.

 

When you run Proc Import for a CSV file SAS actually creates data step code to read the file. If you are using Base SAS then the Log will have the code. You can copy that to the editor and modify it (save it!). Likely you will find an issue with an informat statement, such that your variable with 941 values being read as 9 was read with a Charcter informat of $1 or some such. Possibly because the first rows for that variable were blank. You may adjust the informat statements to match your expectations, change variable names and add label statements.

 

If you have a very large list of variables you can use Excel or similar to create the label syntax.

1) copy the SAS Input statement from the generated data step and paste into the first column of a spreadsheet, remove any $ that maybe there.

2) Open the CSV, or copy the variable row only into a separate CSV file and open in Excel.

3) Copy the cells with the variables

4) Paste transpose into the column next to column with the SAS input variables.

5) Create a formula in a separate column to place and = and quotes around the second column (Use double quotes in case any of your variable have a ' in them). The last column should look like MyVariable = "This is the label for my variable"

 

6) Add a Label statement to the generated data step:

Label

 <paste the column from the spread sheet just created>

;

 

Super User
Posts: 9,682

Re: malfunctionin

Add guessingrow= option to let sas guess this variable length correctly .

 

 

PROC IMPORT OUT= seleg
     DATAFILE="G:\seleg.csv"
     DBMS=csv REPLACe;
     GETNAMES=YES;
     guessingrows=32767;
     Run; 
Ask a Question
Discussion stats
  • 5 replies
  • 220 views
  • 0 likes
  • 4 in conversation