I am currently trying to import a txt file into sas most of the line have a column name followed by the respective values,
EXCEPT for this... the column name is on one line followed by another 4 values which are a subset of the first line as shown below for Arm_Yield, what I need to do is to assign respective subcolumn names to each record.
How do I get SAS to assign a value based on the specific row, given I have to import multiples of these txt files and the values are constantly changing. Good thing is they are always on the same line for each file.
The picture on the left is what I am currently at, On the right is what I want to have.
Currently i am just using a scan function to split the lines into column and input value.
Thanks
SAS 9.4
I see, "else" names are a combination of last col_name as prefix and the suffix.
you can retain the last col_name as prefix.
Here is the code:
data want;
set have;
retain i prefix;
array suffix $ ch1-ch4 {'A' 'B' 'C' 'D'}; /* line syntax was fixed */
drop i prefix ch1-ch4;
if substr(left(raw_line),1,1) = '<' then do;
col_name = scan(raw_line,1,'<>');
prefix = col_name;
col_value = scan(raw_line,2,'<>');
i=0;
end;
else do;
i+1;
col_name = compress(prefix || suffix(i));
col_value = raw_line;
end;
run;
Some of your raw lines start with '<' then you can assign:
if substr(left(raw_line),1,1) = '<' then do;
col_name = scan(raw_line,1,'<>');
col_value = scan(raw_line,2,'<>');
end;
Sas will not accept colomn name like '36.84%' as it is nonvalid sas name.
In such case you may name the variables as var1 var2 ... and assign the '36.84%' as a label to it.
I see there are various values: 36.84% 60.48% 58.58% 67.47% 43.37% ...etc. - which seems to be unpredictable amount of variables
and cann't tell if any of them apears once or more times.
I undestand that the right table is the answer/solution to this situation, then, proposed code is
data want;
set have;
retain i; drop i;
array suffix $ _temporary_ {'A' 'B' 'C' 'D'}; /* I'm not sure about exact order/syntax */
if substr(left(raw_line),1,1) = '<' then do;
col_name = scan(raw_line,1,'<>');
col_value = scan(raw_line,2,'<>');
i=0;
end;
else do;
i+1;
col_name = 'Arm1_Yield_' || suffix(i);
col_value = raw_line;
end;
run;
Hi Shmuel,
Thanks for the reply,
However, apologies on my part, i forgot to mention that I have raw lines which dont have < such as:
Temperature : 56
I am currently using many IF statements for my scan function. The only thing unique about these rows that do not have column names are that they contain a number with a %.
for the code here
else do;
i+1;
col_name = 'Arm1_Yield_' || suffix(i);
col_value = raw_line;
end;
I have also columns for Arm2_Yield , All_Yield etc as seen in the picture above,
at this rate I would have duplicate column names in my final data set. I need SAS to recognize these names and then assign the approriate suffix respectively.
I see, "else" names are a combination of last col_name as prefix and the suffix.
you can retain the last col_name as prefix.
Here is the code:
data want;
set have;
retain i prefix;
array suffix $ ch1-ch4 {'A' 'B' 'C' 'D'}; /* line syntax was fixed */
drop i prefix ch1-ch4;
if substr(left(raw_line),1,1) = '<' then do;
col_name = scan(raw_line,1,'<>');
prefix = col_name;
col_value = scan(raw_line,2,'<>');
i=0;
end;
else do;
i+1;
col_name = compress(prefix || suffix(i));
col_value = raw_line;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.