hello,
i am having an issue creating a data-set imported from an external file with no delimiters (not even space). I need to create it within sas with added spaces between the variables. Here is what i have so far { data _null_;
set xxxx.class;
file 'C:\Users\xxxxx\Desktop\test9.txt';
put (name sex age height weight) (+(-1));
run; }
i need to get from that to a data-set with delimiters (specifically space between variables). I appreciate any help.
{this raw data is completely made up.]
i finally came across the answer.
data a;
infile 'C:\Users\___\Desktop\test1.txt' truncover;
input info $30.;
name = put(reverse(substr(reverse(info),findc(reverse(info),'F''M')+1)),$8.);
sex = put(substr(info, length(name) +1, 1),$1.);
age = input(substr(info, anydigit(info),2),2.);
ht=substr(info, sum(length(name),4), 4);
if ht>=100 then ht=substr(ht,1,2);
height=input(ht,4.);
weight= input(substr(info, sum(length(name),4,length(ht))),5.);
drop info ht;
run;
Can you define ANY logic to be used which defines how to split up the string from source and map SAS variables against?
data <dataset>;
set <source data>;
file 'name and location of the external file';
put <variables list>;
run;
data <dataset>;
infile 'name and location of the external file';
input <variables list>;
run;
Using your made up example, you can figure out the boundary between sex and age because age starts with the first number in the record. If you assume sex is one character long, you can then work out where the name boundary is, which is the position of the first number minus 2.
However there is no way of working out the boundaries between the numeric fields age height and weight unless you make assumptions about how long each of these is. If age height and weight are always the same length then you can find the boundaries.
If they are of varying length it is impossible to do this. So if your real data consists of a series of numbers of varying length (between records) with no delimiters then there is no way you can find the boundaries unless as Patrick says there are "other" logic or rules you can apply.
If your text file does not have delimiters you will not be able to read it into multiple variables unless you have some rules for how to split the lines into variables. SAS can very easily read a file without delimiters if the variables appear in fixed columns.
For example:
data _null_;
set sashelp.class ;
put (name sex age height weight)( $7. $1. 2. 2. 5.1);
run;
Alfred M1469112.5
Alice F1357 84.0
BarbaraF1365 98.0
Carol F1463102.5
Henry M1464102.5
James M1257 83.0
Jane F1260 84.5
Janet F1563112.5
JeffreyM1363 84.0
John M1259 99.5
Joyce F1151 50.5
Judy F1464 90.0
Louise F1256 77.0
Mary F1567112.0
Philip M1672150.0
Robert M1265128.0
Ronald M1567133.0
Thomas M1158 85.0
WilliamM1567112.0
As Tom pointed out, this problem doesn't have a general solution unless you use fixed columns. If you have at least some minimal formatting rules you might be able to parse variable length data lines with regular expressions, using a separate capture buffer for each field :
data _null_;
set sashelp.class;
file "&sasforum.\datasets\sasFellow331.txt";
format age z2. height 4.1 weight 5.1; /* Use fixed point formats */
put (name sex age height weight) (+(-1));
run;
data want;
if not prxID then prxID + prxparse("/(\w+)([MF])(\d\d)(\d+\.\d)(\d+\.\d)/");
infile "&sasforum.\datasets\sasFellow331.txt" truncover;
input txt $100.;
length name $8 sex $1;
if prxmatch(prxID, txt) then do;
call prxposn(prxID, 1, position, length);
name = substr(txt, position, length);
call prxposn(prxID, 2, position, length);
sex = substr(txt, position, length);
call prxposn(prxID, 3, position, length);
age = input(substr(txt, position, length),best.);
call prxposn(prxID, 4, position, length);
height = input(substr(txt, position, length),best.);
call prxposn(prxID, 5, position, length);
weight = input(substr(txt, position, length),best.);
output;
end;
drop prxID txt position length;
run;
proc print data=want noobs; run;
PG
i finally came across the answer.
data a;
infile 'C:\Users\___\Desktop\test1.txt' truncover;
input info $30.;
name = put(reverse(substr(reverse(info),findc(reverse(info),'F''M')+1)),$8.);
sex = put(substr(info, length(name) +1, 1),$1.);
age = input(substr(info, anydigit(info),2),2.);
ht=substr(info, sum(length(name),4), 4);
if ht>=100 then ht=substr(ht,1,2);
height=input(ht,4.);
weight= input(substr(info, sum(length(name),4,length(ht))),5.);
drop info ht;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.