BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

I am hoping someone will be able to give me a solution to the following problem:

I have a dataset that I have to read into Stata. Each record has 47 variables, most of them characters, and are delimited by a tab. The problem is that I have no way to determine the length of each variable, i.e., a character variable x may have length 100 (with embedded spaces) in the first record/observation, but may have length of 250 in another record while only 57 in another record. So, if I assign this variable a length of 250, then SAS will go on reading the data into the same variable in case of a record where the variable was only 50 characters long, for example.

The total number of records is more than 350,000 so that it's impossible for me to read in the data observation by observation.

Is there any command/option along with the input statement that helps me deal with this problem?

Thanks a lot!
Maggi
7 REPLIES 7
Patrick
Opal | Level 21
Hi Maggi

I'm now confused. Do you intend to read your raw data with STATA or with SAS?

With SAS:
There are different styles of how you can read raw data (very well explained in the SAS doc).

You need "list input". The SAS code might then look something like this:

DATA WORK.testdata;
INFILE 'C:\temp\testdata.txt'
DLM='09'x
Truncover
DSD ;
INPUT
var1 : $CHAR3.
var2 : $CHAR17.
var3 : $CHAR5. ;
RUN;

'09'x is the ASCII hex representation for a tab. If your operating system is not Windows then another hex value might be needed.


If you're using SAS EG then you could also use the SAS import wizard (file/import data).
In screen2 choose "Delimited fields" and "Tab"
In screen3 possibly modify the variable attributes to what's needed for your data.

And last but not least:
Strings in your raw data might have different lengths, SAS variables don't. The lenght of your SAS variable must be at minimum the maximum length of the strings you want to read (store in the specific SAS variable).

HTH
Patrick

Message was edited by: Patrick
Ksharp
Super User
Just as Mr.Patrick said.But there is another option 'expandtabs' which can change tab to blank.
Use Patrick's code;

[pre]
DATA WORK.testdata;
INFILE 'C:\temp\testdata.txt' expandtabs Truncover ;
INPUT
var1 : $CHAR3.
var2 : $CHAR17.
var3 : $CHAR5. ;
RUN;

[/pre]


Ksharp

Message was edited by: Ksharp
deleted_user
Not applicable
Hi Patrick and Ksharp,

Sorry, I meant SAS.
The problem is that I do not know the length of the variables, their mins or maxs. The variables may be 50 characters in one observation, 150 in another and 390 in another! So there is no sure way of assigning lengths to variables while inputting them.

I tried your methods but they are, unfortunately, not working as I need them to work!
Any suggestions??
Thanks!
Maggi
Ksharp
Super User
Just as you said ,The function lenghtc() would be useful,it returned the length of storage of variable.
[pre]
data _null_;
set sashelp.class;
len_sex=lengthc(sex);
len_name=lengthc(name);
put 'NOTE:' len_sex= /
'NOTE:' len_name=;
run;
[/pre]


Ksharp
Peter_C
Rhodochrosite | Level 12
maggi

at the stage when you don't know what lengths you might find, and you must allow for very wide and narrow values, the system and data set option COMPRESS=YES is just what you need!
That allows you to set length for your charcter string variables to $32767.
However, the trailing space will be compressed to almost nothing.

good
peterC
Ksharp
Super User
Hi.Misunderstood what your said.
Actually . I think Patrick's method can work.But It is hard to decide without your origin data.
And Maybe you can use LENGTH statement to solve your problem, which has effect as modifier ' : ' ;

[pre]
data temp;
length comment $ 20000;
input id comment $;
datalines;
1 tuieorejejepudf
2 diuet
3 troerpere
4 skjfasiwoewojkfoewroegerelgkerg
;run;
proc print noobs;run;
[/pre]


Ksharp
Ksharp
Super User
Hi.Misunderstood what your said.
Actually . I think Patrick's method can work.But It is hard to decide without your origin data.
And Maybe you can use LENGTH statement to solve your problem, which has effect as modifier ' : ' ;

[pre]
data temp;
length comment $ 20000;
input id comment $;
datalines;
1 tuieorejejepudf
2 diuet
3 troerpere
4 skjfasiwoewojkfoewroegerelgkerg
;run;
proc print noobs;run;
[/pre]


Ksharp

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 990 views
  • 0 likes
  • 4 in conversation