Help using Base SAS procedures

Length of variable differs across records

Reply
N/A
Posts: 0

Length of variable differs across records

Hi,

I am hoping someone will be able to give me a solution to the following problem:

I have a dataset that I have to read into Stata. Each record has 47 variables, most of them characters, and are delimited by a tab. The problem is that I have no way to determine the length of each variable, i.e., a character variable x may have length 100 (with embedded spaces) in the first record/observation, but may have length of 250 in another record while only 57 in another record. So, if I assign this variable a length of 250, then SAS will go on reading the data into the same variable in case of a record where the variable was only 50 characters long, for example.

The total number of records is more than 350,000 so that it's impossible for me to read in the data observation by observation.

Is there any command/option along with the input statement that helps me deal with this problem?

Thanks a lot!
Maggi
Respected Advisor
Posts: 4,173

Re: Length of variable differs across records

Posted in reply to deleted_user
Hi Maggi

I'm now confused. Do you intend to read your raw data with STATA or with SAS?

With SAS:
There are different styles of how you can read raw data (very well explained in the SAS doc).

You need "list input". The SAS code might then look something like this:

DATA WORK.testdata;
INFILE 'C:\temp\testdata.txt'
DLM='09'x
Truncover
DSD ;
INPUT
var1 : $CHAR3.
var2 : $CHAR17.
var3 : $CHAR5. ;
RUN;

'09'x is the ASCII hex representation for a tab. If your operating system is not Windows then another hex value might be needed.


If you're using SAS EG then you could also use the SAS import wizard (file/import data).
In screen2 choose "Delimited fields" and "Tab"
In screen3 possibly modify the variable attributes to what's needed for your data.

And last but not least:
Strings in your raw data might have different lengths, SAS variables don't. The lenght of your SAS variable must be at minimum the maximum length of the strings you want to read (store in the specific SAS variable).

HTH
Patrick

Message was edited by: Patrick
Super User
Posts: 10,023

Re: Length of variable differs across records

Just as Mr.Patrick said.But there is another option 'expandtabs' which can change tab to blank.
Use Patrick's code;

[pre]
DATA WORK.testdata;
INFILE 'C:\temp\testdata.txt' expandtabs Truncover ;
INPUT
var1 : $CHAR3.
var2 : $CHAR17.
var3 : $CHAR5. ;
RUN;

[/pre]


Ksharp

Message was edited by: Ksharp
N/A
Posts: 0

Re: Length of variable differs across records

Hi Patrick and Ksharp,

Sorry, I meant SAS.
The problem is that I do not know the length of the variables, their mins or maxs. The variables may be 50 characters in one observation, 150 in another and 390 in another! So there is no sure way of assigning lengths to variables while inputting them.

I tried your methods but they are, unfortunately, not working as I need them to work!
Any suggestions??
Thanks!
Maggi
Super User
Posts: 10,023

Re: Length of variable differs across records

Posted in reply to deleted_user
Just as you said ,The function lenghtc() would be useful,it returned the length of storage of variable.
[pre]
data _null_;
set sashelp.class;
len_sex=lengthc(sex);
len_name=lengthc(name);
put 'NOTE:' len_sex= /
'NOTE:' len_name=;
run;
[/pre]


Ksharp
Valued Guide
Posts: 2,177

Re: Length of variable differs across records

Posted in reply to deleted_user
maggi

at the stage when you don't know what lengths you might find, and you must allow for very wide and narrow values, the system and data set option COMPRESS=YES is just what you need!
That allows you to set length for your charcter string variables to $32767.
However, the trailing space will be compressed to almost nothing.

good
peterC
Super User
Posts: 10,023

Re: Length of variable differs across records

Posted in reply to deleted_user
Hi.Misunderstood what your said.
Actually . I think Patrick's method can work.But It is hard to decide without your origin data.
And Maybe you can use LENGTH statement to solve your problem, which has effect as modifier ' : ' ;

[pre]
data temp;
length comment $ 20000;
input id comment $;
datalines;
1 tuieorejejepudf
2 diuet
3 troerpere
4 skjfasiwoewojkfoewroegerelgkerg
;run;
proc print noobs;run;
[/pre]


Ksharp
Super User
Posts: 10,023

Re: Length of variable differs across records

Posted in reply to deleted_user
Hi.Misunderstood what your said.
Actually . I think Patrick's method can work.But It is hard to decide without your origin data.
And Maybe you can use LENGTH statement to solve your problem, which has effect as modifier ' : ' ;

[pre]
data temp;
length comment $ 20000;
input id comment $;
datalines;
1 tuieorejejepudf
2 diuet
3 troerpere
4 skjfasiwoewojkfoewroegerelgkerg
;run;
proc print noobs;run;
[/pre]


Ksharp
Ask a Question
Discussion stats
  • 7 replies
  • 134 views
  • 0 likes
  • 4 in conversation