DATA Step, Macro, Functions and more

Importing raw files with inconsistent records and assigning column names to only a few

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Importing raw files with inconsistent records and assigning column names to only a few

I am currently trying to import a txt file into sas most of the line have a column name followed by the respective values,

EXCEPT for this... the column name is on one line followed by another 4 values which are a subset of the first line as shown below for Arm_Yield, what I need to do is to assign respective subcolumn names to each record.

 

How do I get SAS to assign a value based on the specific row, given I have to import multiples of these txt files and the values are constantly changing. Good thing is they are always on the same line for each file.

 

The picture on the left is what I am currently at, On the right is what I want to have.

 

Currently i am just using a scan function to split the lines into column and input value.

 

Thanks

 SAS 9.4

Capture.PNG


Accepted Solutions
Solution
‎08-25-2017 05:45 AM
Trusted Advisor
Posts: 1,826

Re: Importing raw files with inconsistent records and assigning column names to only a few

[ Edited ]
Posted in reply to Sharvthegreat

I see, "else" names are a combination of last col_name as prefix and the suffix.

you can retain the last col_name as prefix. 

 

Here is the code:

data want;
 set have;
       retain i prefix; 
       array suffix $ ch1-ch4 {'A' 'B' 'C' 'D'}; /* line syntax was fixed */
drop i prefix ch1-ch4;
if substr(left(raw_line),1,1) = '<' then do; col_name = scan(raw_line,1,'<>'); prefix = col_name; col_value = scan(raw_line,2,'<>'); i=0; end; else do; i+1; col_name = compress(prefix || suffix(i)); col_value = raw_line; end; run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,826

Re: Importing raw files with inconsistent records and assigning column names to only a few

[ Edited ]
Posted in reply to Sharvthegreat

Some of your raw lines start with '<' then you can assign:

     

if substr(left(raw_line),1,1) = '<' then do;
   col_name = scan(raw_line,1,'<>');
   col_value = scan(raw_line,2,'<>');
end;

Sas will not accept colomn name like '36.84%' as it is nonvalid sas name.

In such case you may name the variables as var1 var2 ... and assign the '36.84%' as a label to it.

I see there are various values: 36.84% 60.48% 58.58% 67.47% 43.37% ...etc. - which seems to be unpredictable amount of variables

and cann't tell if any of them apears once or more times.

 

I undestand that the right table is the answer/solution to this situation, then, proposed code is

data want;
 set have;
       retain i; drop i;
       array suffix $ _temporary_ {'A' 'B' 'C' 'D'}; /* I'm not sure about exact order/syntax */

if substr(left(raw_line),1,1) = '<' then do;
   col_name = scan(raw_line,1,'<>');
   col_value = scan(raw_line,2,'<>');
   i=0;
end;
else do;
   i+1;
   col_name = 'Arm1_Yield_' || suffix(i);
   col_value = raw_line;
end;
run;

 

New Contributor
Posts: 2

Re: Importing raw files with inconsistent records and assigning column names to only a few

[ Edited ]

Hi Shmuel,
Thanks for the reply,
However, apologies on my part, i forgot to mention that I have raw lines which dont have < such as:

Temperature : 56

I am currently using many IF statements for my scan function. The only thing unique about these rows that do not have column names are that they contain a number with a %.

for the code here

else do;
i+1;
col_name = 'Arm1_Yield_' || suffix(i);
col_value = raw_line;
end;

I have also columns for Arm2_Yield , All_Yield etc as seen in the picture above,
at this rate I would have duplicate column names in my final data set. I need SAS to recognize these names and then assign the approriate suffix respectively.

Solution
‎08-25-2017 05:45 AM
Trusted Advisor
Posts: 1,826

Re: Importing raw files with inconsistent records and assigning column names to only a few

[ Edited ]
Posted in reply to Sharvthegreat

I see, "else" names are a combination of last col_name as prefix and the suffix.

you can retain the last col_name as prefix. 

 

Here is the code:

data want;
 set have;
       retain i prefix; 
       array suffix $ ch1-ch4 {'A' 'B' 'C' 'D'}; /* line syntax was fixed */
drop i prefix ch1-ch4;
if substr(left(raw_line),1,1) = '<' then do; col_name = scan(raw_line,1,'<>'); prefix = col_name; col_value = scan(raw_line,2,'<>'); i=0; end; else do; i+1; col_name = compress(prefix || suffix(i)); col_value = raw_line; end; run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 98 views
  • 0 likes
  • 2 in conversation