BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SasFellow331
Calcite | Level 5

hello,

i am having an issue creating a data-set imported from an external file with no delimiters (not even space). I need to create it within sas with added spaces between the variables. Here is what i have so far { data _null_;                                                                                                                           

set xxxx.class;                                                                                                                       

file 'C:\Users\xxxxx\Desktop\test9.txt';                                                                                              

put (name sex age height weight) (+(-1));                                                                                              

run;  }

i need to get from that to a data-set with delimiters (specifically space between variables). I appreciate any help.

{this raw data is completely made up.]

1 ACCEPTED SOLUTION

Accepted Solutions
SasFellow331
Calcite | Level 5

i finally came across the answer.

data a;

infile 'C:\Users\___\Desktop\test1.txt'  truncover;

input info $30.;

name = put(reverse(substr(reverse(info),findc(reverse(info),'F''M')+1)),$8.);

sex = put(substr(info, length(name) +1, 1),$1.);

age = input(substr(info, anydigit(info),2),2.);

ht=substr(info, sum(length(name),4), 4);

if ht>=100 then ht=substr(ht,1,2);

height=input(ht,4.);

weight= input(substr(info, sum(length(name),4,length(ht))),5.);

drop info ht;

run;

View solution in original post

6 REPLIES 6
Patrick
Opal | Level 21

Can you define ANY logic to be used which defines how to split up the string from source and map SAS variables against?

SasFellow331
Calcite | Level 5
  • The raw data I had provided was my starting off point, and i need to recreate it through both data export and import statements. The the export logic is as follows (which i had previously stated)

          data <dataset>;

          set <source data>;

          file 'name and location of the external file';

          put <variables list>;

          run;

  • as for the import statement, it should follow this format;

          data <dataset>;

          infile 'name and location of the external file';

          input <variables list>;

          run;

SASKiwi
PROC Star

Using your made up example, you can figure out the boundary between sex and age because age starts with the first number in the record. If you assume sex is one character long, you can then work out where the name boundary is, which is the position of the first number minus 2.

However there is no way of working out the boundaries between the numeric fields age height and weight unless you make assumptions about how long each of these is. If age height and weight are always the same length then you can find the boundaries.

If they are of varying length it is impossible to do this. So if your real data consists of a series of numbers of varying length (between records) with no delimiters then there is no way you can find the boundaries unless as Patrick says there are "other" logic or rules you can apply.  

Tom
Super User Tom
Super User

If your text file does not have delimiters you will not be able to read it into multiple variables unless you have some rules for how to split the lines into variables.  SAS can very easily read a file without delimiters if the variables appear in fixed columns.

For example:

data _null_;

  set sashelp.class ;

  put (name sex age height weight)( $7. $1. 2. 2. 5.1);

run;

Alfred M1469112.5

Alice  F1357 84.0

BarbaraF1365 98.0

Carol  F1463102.5

Henry  M1464102.5

James  M1257 83.0

Jane   F1260 84.5

Janet  F1563112.5

JeffreyM1363 84.0

John   M1259 99.5

Joyce  F1151 50.5

Judy   F1464 90.0

Louise F1256 77.0

Mary   F1567112.0

Philip M1672150.0

Robert M1265128.0

Ronald M1567133.0

Thomas M1158 85.0

WilliamM1567112.0

PGStats
Opal | Level 21

As Tom pointed out, this problem doesn't have a general solution unless you use fixed columns. If you have at least some minimal formatting rules you might be able to parse variable length data lines with regular expressions, using a separate capture buffer for each field :

data _null_;                                                                                                                          

set sashelp.class;                                                                                                                      

file "&sasforum.\datasets\sasFellow331.txt";  

format age z2. height 4.1 weight 5.1; /* Use fixed point formats */

put (name sex age height weight) (+(-1));                                                                                             

run;

data want;

if not prxID then prxID + prxparse("/(\w+)([MF])(\d\d)(\d+\.\d)(\d+\.\d)/");

infile "&sasforum.\datasets\sasFellow331.txt" truncover;

input txt $100.;

length name $8 sex $1;

if prxmatch(prxID, txt) then do;

    call prxposn(prxID, 1, position, length);

    name = substr(txt, position, length);

    call prxposn(prxID, 2, position, length);

    sex = substr(txt, position, length);

    call prxposn(prxID, 3, position, length);

    age = input(substr(txt, position, length),best.);

    call prxposn(prxID, 4, position, length);

    height = input(substr(txt, position, length),best.);

    call prxposn(prxID, 5, position, length);

    weight = input(substr(txt, position, length),best.);

    output;

    end;

drop prxID txt position length;

run;

proc print data=want noobs; run;

PG

PG
SasFellow331
Calcite | Level 5

i finally came across the answer.

data a;

infile 'C:\Users\___\Desktop\test1.txt'  truncover;

input info $30.;

name = put(reverse(substr(reverse(info),findc(reverse(info),'F''M')+1)),$8.);

sex = put(substr(info, length(name) +1, 1),$1.);

age = input(substr(info, anydigit(info),2),2.);

ht=substr(info, sum(length(name),4), 4);

if ht>=100 then ht=substr(ht,1,2);

height=input(ht,4.);

weight= input(substr(info, sum(length(name),4,length(ht))),5.);

drop info ht;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1054 views
  • 3 likes
  • 5 in conversation