Solved: Re: Need help importing a text file that has many variables and data i...

yo1 · Posted 01-09-2018 05:09 PM

Hello,

I am having trouble on finding out how to import a text file that has just one long column that contains the variable names and

data. This is what the text file looks like and it is also attached to this post.

Text file

i2n110C2es00.A / i2n110C1es00.A * 100

1901      2.0
1902      2.0
1903      2.0
1904      2.0
1905      2.0
1906      2.0
1907      2.0
1908      2.0
1909      2.0
1910      2.0
1911      2.0
1912      2.0
1913      2.0
1914      2.0
1915      2.0
1916      2.0
1917      2.0
1918      2.0
1919      2.0
1920      2.0
1921      2.0
1922      2.0
1923      2.0
1924      2.0
1925      2.0
1926      2.0
1927      2.0
1928      2.0
1929      2.0
1930      2.0
1931      2.0
1932      2.0
1933      2.1
1934      2.1
1935      2.0
1936      2.1
1937      2.0
1938      2.3
1939      2.1
1940      2.3
1941      2.3
1942      2.0
1943      2.1
1944      2.0
1945      2.2
1946      2.0
1947      2.1
1948      2.1
1949      2.0
1950      2.1
1951      2.0
1952      3.0
1953      3.1
1954      4.0
1955      4.2
1956      5.2
1957      5.0
1958      5.2
1959      5.8
1960      6.5
1961      6.3
1962      6.2
1963      6.3
1964      6.4
1965      6.2
1966      6.5
1967      6.5
1968      7.5
1969      7.3
1970      7.3
1971      7.2
1972      7.5
1973      7.3
1974      8.3
1975      9.3
1976      9.2
1977      9.2
1978      9.2
1979      9.3
1980      9.6
1981      9.8
1982     10.0
1983      9.7
1984      8.8
1985     11.7
1986     11.8
1987     13.0
1988     10.3
1989     10.6
1990     12.3
1991     12.7
1992     12.6
1993     14.8
1994     17.3
1995     19.8
1996     20.4
1997     19.0
1998     22.9
1999     26.5
2000     30.1
2001     33.7
2002     37.4
2003     37.3
2004     37.3
2005     37.3
2006     37.3
2007     37.3
2008     37.3
2009     37.3
2010     37.1
2011     37.1
2012     37.0
2013     37.0
2014     37.2

i2n113F2es00.A / i2n113F1es00.A * 100

1901     32.1
1902     32.1
1903     32.1
1904     32.1
1905     32.0
1906     32.1
1907     32.1
1908     32.1
1909     32.1
1910     32.1
1911     32.1
1912     31.6
1913     31.7
1914     32.1
1915     32.1
1916     31.7
1917     31.6
1918     32.0
1919     31.4
1920     32.4
1921     21.2
1922     22.6
1923     21.4
1924     24.6
1925     16.2
1926     16.3
1927     15.9
1928     16.2
1929     20.4
1930     20.7
1931     16.3
1932     12.9
1933     25.5
1934     27.0
1935     26.8
1936     27.5
1937     25.9
1938     35.4
1939     29.2
1940     35.4
1941     34.7
1942     46.5
1943     22.1
1944     40.0
1945     29.3
1946     22.4
1947      9.1
1948      6.8
1949      5.6
1950      5.2
1951      6.3
1952      6.3
1953      6.5
1954      6.6
1955      7.2
1956     10.4
1957      5.5
1958     13.5
1959     16.2
1960     16.2
1961     14.0
1962     14.1
1963     19.6
1964     23.0
1965     20.2
1966     27.8
1967     27.1
1968     22.2
1969     21.2
1970     21.9
1971     22.5
1972     33.6
1973     29.3
1974     31.1
1975     30.9
1976     29.6
1977     30.4
1978     29.5
1979     29.4
1980     29.2
1981     29.3
1982     26.1
1983     24.7
1984     27.8
1985     26.6
1986     26.0
1987     26.6
1988     25.0
1989     26.2
1990     24.2
1991     22.2
1992     21.4
1993     24.8
1994     28.2
1995     31.6
1996     33.1
1997     38.2
1998     41.4
1999     44.7
2000     47.9
2001     51.1
2002     54.3
2003     54.0
2004     53.8
2005     53.8
2006     54.5
2007     54.7
2008     72.5
2009     72.5
2010     73.1
2011     73.1
2012     73.2
2013     73.2
2014     73.2

i2n21102es00.A / i2n21101es00.A * 100

1901     92.2
1902     92.2
1903     92.2
1904     92.1
1905     92.1
1906     92.1
1907     92.1
1908     92.1
1909     92.1
1910     92.2
1911     92.2
1912     92.1
1913     92.1
1914     92.1
1915     92.1
1916     92.0
1917     92.0
1918     92.0
1919     92.0
1920     92.0
1921     92.0
1922     92.0
1923     92.0
1924     92.0
1925     90.8
1926     91.4
1927     91.4
1928     91.3
1929     91.4
1930     91.2
1931     90.8
1932     91.1
1933     90.8
1934     91.1
1935     91.2
1936     91.2
1937     91.4
1938     91.4
1939     91.3
1940     91.3
1941     91.4
1942     91.6
1943     91.4
1944     91.7
1945     91.7
1946     91.6
1947     91.4
1948     91.5
1949     91.4
1950     91.4
1951     91.5
1952     91.6
1953     91.6
1954     91.7
1955     91.7
1956     91.7
1957     91.7
1958     91.7
1959     91.7
1960     90.6
1961     90.7
1962     90.6
1963     90.5
1964     90.6
1965     90.6
1966     90.7
1967     90.7
1968     90.6
1969     90.6
1970     90.6
1971     90.4
1972     90.4
1973     90.4
1974     90.5
1975     83.3
1976     85.4
1977     78.4
1978     72.3
1979     84.5
1980     78.2
1981     71.2
1982     73.4
1983     82.4
1984     89.9
1985     92.4
1986     90.2
1987     86.9
1988     88.5
1989     90.1
1990     95.7
1991     95.7
1992     95.6
1993     95.2
1994     93.0
1995     92.1
1996     93.5
1997     94.8
1998     91.9
1999     90.8
2000     89.8
2001     88.7
2002     87.6
2003     87.6
2004     87.6
2005     87.6
2006     88.9
2007     89.2
2008     89.4
2009     89.4
2010     89.4
2011     89.4
2012     89.4
2013     89.4
2014     89.4

Below is the SAS code progress I made

proc import datafile="X:\CAP INC\example_data.txt"

out=mydata

replace;

delimiter='09'x ;

getnames=yes;

run;

data ab;

set mydata;

if var1='' then delete;

if substr(var1,1,1)='i' then group=1;

else group=2;

run;

It seems that grouping this file would work best but I haven't been able to figure it out.

The desired output is attached.

Any help or direction would be greatly appreciated.

Many thanks and Happy New Year.

art297 · Posted 01-09-2018 05:39 PM

My approach would be similar to @Reezas, but I'd do it all while initially inputting the data:

data have;
  infile '/folders/myfolders/example_data.txt';
  length group $7;
  retain group;
  input @;
  if _infile_ ne '' then do;
    if first(_infile_) eq 'i' then group=catt('KCFi',substr(_infile_,4,3));
    else do;
      input year var1;
      output;
    end;
  end;
run;

proc sort data=have out=need;
  by year group;
run;

proc transpose data=need out=want (drop=_:);
  by year;
  var var1;
  id group;
run;

Art, CEO, AnalystFinder.com

View solution in original post

yo1 · Posted 01-09-2018 05:10 PM

Attached here is an example of the text file I am trying to import and parse.

ballardw · Posted 01-09-2018 05:25 PM

If you mean that

i2n110C2es00.A and "i2n110C1es00.A * 100" are supposed to be variable names those are not normally valid SAS variable names. Characters such as period, space or asterisk are not typically valid.

Most of your first column values really look like a Year. Is that the actual case? If so then it may be possible to create a data set that contains variable such as Year, Value and a TEXT variable containing the text such as "i2n110C2es00.A / i2n110C1es00.A * 100" to identify usage of the numeric value in the second column. Would that be acceptable?

Or show an example of what the final data set should actually look like for the first 10 rows of data with the variable names, values and data types for the variables using a data step. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

yo1 · Posted 01-09-2018 05:35 PM

Hi. Yes its a time series and the values in that column are years. The variable name is long but it would be renamed later on. I know it has funny characters but I plan on removing the whole thing.

yo1 · Posted 01-09-2018 05:42 PM

This is what the data set should look like at the end.

data example;

input year KCFI525 KCFI550 KCFI610 KCFI622;
datalines;
1986 12.8 31.9 55.8 12.5
1987 12.8 31.9 55.8 12.5
1988 12.8 31.9 55.8 12.5
1989 12.8 31.9 55.8 12.5
1990 12.8 31.9 55.8 12.5
1991 12.8 31.9 55.8 12.5
1992 12.8 31.9 55.8 12.5
199312.8 31.9 55.8 12.5
;

Kurt_Bremser · Posted 01-10-2018 03:48 AM

@yo1 wrote:
This is what the data set should look like at the end.

data example;

input year KCFI525 KCFI550 KCFI610 KCFI622;
datalines;
1986 12.8 31.9 55.8 12.5
1987 12.8 31.9 55.8 12.5
1988 12.8 31.9 55.8 12.5
1989 12.8 31.9 55.8 12.5
1990 12.8 31.9 55.8 12.5
1991 12.8 31.9 55.8 12.5
1992 12.8 31.9 55.8 12.5
199312.8 31.9 55.8 12.5
;

And where do you get "525", "550", "610" and "622" from? I don't see those numbers/character sequences anywhere in your input file.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

yo1 · Posted 01-10-2018 09:08 AM

The identifiers of "525", "550", "610" and "622" are just a few of the variables contained in this file. The attached text file is just a small sample of a larger text file that I need to import. There are over 60 variables in this scenario and I didn't want to put them all here for brevity. Sorry about the confusion.

Kurt_Bremser · Posted 01-10-2018 09:13 AM

If you post example data and expected results that correlate to each other, it's much easier to come up with working code. Don't force us to make guesses.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

yo1 · Posted 01-10-2018 09:34 AM

Noted. Thanks for the feedback.

Reeza · Posted 01-09-2018 05:28 PM

Use a data step instead.

Check the length of the record. If it's too long then it has the variable name, read it in and retain it.

Then read in the values.

Then you can transpose it later on.

data have;
input @;
retain variable_name;

x=_infile_;
if length(x) = <> then variable_name = scan(x, 2, '/');
else input var1 var2;

run;

Reeza · Posted 01-09-2018 05:30 PM

And since it looks like you have blank lines that are not End of File you may need to add logic to handle that as well.

art297 · Posted 01-09-2018 05:39 PM

My approach would be similar to @Reezas, but I'd do it all while initially inputting the data:

data have;
  infile '/folders/myfolders/example_data.txt';
  length group $7;
  retain group;
  input @;
  if _infile_ ne '' then do;
    if first(_infile_) eq 'i' then group=catt('KCFi',substr(_infile_,4,3));
    else do;
      input year var1;
      output;
    end;
  end;
run;

proc sort data=have out=need;
  by year group;
run;

proc transpose data=need out=want (drop=_:);
  by year;
  var var1;
  id group;
run;

Art, CEO, AnalystFinder.com

mkeintz · Posted 01-10-2018 01:27 AM

Given the order of your data, this can be done in two DATA steps. The first data step reuses an ordered hash, to store and output each group in a separated data set, named need1, need2, etc. The second data step just merges all the NEED's.

data _null_;
  infile 'c:\temp\t.txt' end=eod;
  if _n_=1 then do;
    declare hash h (ordered:'A');
      h.definekey('year');
      h.definedata('year','val');
      h.definedone();
  end;

  input @;
  if not(indexc(_infile_,'/') or _infile_=' ') then input year val;
  if year^=. then rc=h.add();
 
  length group $7;
  retain group;
  if indexc(_infile_,'/') or eod then do;
    if h.num_items>0 then h.output(dataset:cats('need',group,'(rename=(val=',group,'))'));
    h.clear();
    group=catt('KCFi',substr(_infile_,4,3));
  end;
run;

data want;
  merge need: ;
  by year;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 01-10-2018 08:41 AM

data temp;
infile 'c:\temp\test.txt' truncover expandtabs;
input @;
length name name1 name2 $ 200;
retain name name1 name2;
if anyalpha(_infile_) then do;
 name=_infile_;
 name1=scan(scan(_infile_,1,'/'),1,'.');
 name2=scan(scan(_infile_,-1,'/'),1,'.');
 delete;
end;
else if not missing(_infile_) then input a b;
 else delete;
run;
proc sql;
create table unique as
 select distinct name,name1,name2 from temp;
quit;
data _null_;
 set unique end=last;
 if _n_=1 then call execute('data want;merge ');
 call execute(catt('temp(where=(name="',name,'") rename=(b=',name2,'))'));
 if last then call execute(';by a;drop name name1 name2;run;');
run;

yo1 · Posted 01-10-2018 10:35 AM

Hello everyone,

Massive thanks for your responses. They were both a learning lesson and constructive feedback. : )

The solution to this problem can be seen from PROC Star, mkeintz, and Ksharp code replies. Reeza you gave me a partial solution but it did show me a different approach to this problem, so you have my thanks!

I selected PROC star as the solution as it was the simplest to include into the main program I am working with.

Mkeintz SAS code using hash is awesome. However, my experience with hash is not at a level that I feel comfortable with (I have seen this method of coding in SAS at many SUG conferences). Thank you mkeintz.

Ksharp code works well but it appeared that I didn’t need the PROC SQL or the DATA NULL steps. Big thanks Ksharp!

On a side note, I had to modify the code for my purposes because the file was much, much longer than the example provided to this blog chain. Using PROC star’s code on the original data resulted in an ERROR message after the PROC step that stated “ERROR: The ID value "KCFi551" occurs twice in the same BY group.” ERROR: Too many bad BY groups.“

I had to insert the LET option for PROC TRANSPOSE, which allows duplicate values of an ID variable.

(http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1r2tjnp8ewe3sn1acnpn...) (http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n0mwa670j36v2an1ojkae...).

After doing that the program produced the desired output needed using the original text file.

Massive thanks everyone and Happy New Year!!!

Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Re: Need help importing a text file that has many variables and data in one long column.

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away