Hello,
I am having trouble on finding out how to import a text file that has just one long column that contains the variable names and
data. This is what the text file looks like and it is also attached to this post.
Text file
i2n110C2es00.A / i2n110C1es00.A * 100
1901 2.0
1902 2.0
1903 2.0
1904 2.0
1905 2.0
1906 2.0
1907 2.0
1908 2.0
1909 2.0
1910 2.0
1911 2.0
1912 2.0
1913 2.0
1914 2.0
1915 2.0
1916 2.0
1917 2.0
1918 2.0
1919 2.0
1920 2.0
1921 2.0
1922 2.0
1923 2.0
1924 2.0
1925 2.0
1926 2.0
1927 2.0
1928 2.0
1929 2.0
1930 2.0
1931 2.0
1932 2.0
1933 2.1
1934 2.1
1935 2.0
1936 2.1
1937 2.0
1938 2.3
1939 2.1
1940 2.3
1941 2.3
1942 2.0
1943 2.1
1944 2.0
1945 2.2
1946 2.0
1947 2.1
1948 2.1
1949 2.0
1950 2.1
1951 2.0
1952 3.0
1953 3.1
1954 4.0
1955 4.2
1956 5.2
1957 5.0
1958 5.2
1959 5.8
1960 6.5
1961 6.3
1962 6.2
1963 6.3
1964 6.4
1965 6.2
1966 6.5
1967 6.5
1968 7.5
1969 7.3
1970 7.3
1971 7.2
1972 7.5
1973 7.3
1974 8.3
1975 9.3
1976 9.2
1977 9.2
1978 9.2
1979 9.3
1980 9.6
1981 9.8
1982 10.0
1983 9.7
1984 8.8
1985 11.7
1986 11.8
1987 13.0
1988 10.3
1989 10.6
1990 12.3
1991 12.7
1992 12.6
1993 14.8
1994 17.3
1995 19.8
1996 20.4
1997 19.0
1998 22.9
1999 26.5
2000 30.1
2001 33.7
2002 37.4
2003 37.3
2004 37.3
2005 37.3
2006 37.3
2007 37.3
2008 37.3
2009 37.3
2010 37.1
2011 37.1
2012 37.0
2013 37.0
2014 37.2
i2n113F2es00.A / i2n113F1es00.A * 100
1901 32.1
1902 32.1
1903 32.1
1904 32.1
1905 32.0
1906 32.1
1907 32.1
1908 32.1
1909 32.1
1910 32.1
1911 32.1
1912 31.6
1913 31.7
1914 32.1
1915 32.1
1916 31.7
1917 31.6
1918 32.0
1919 31.4
1920 32.4
1921 21.2
1922 22.6
1923 21.4
1924 24.6
1925 16.2
1926 16.3
1927 15.9
1928 16.2
1929 20.4
1930 20.7
1931 16.3
1932 12.9
1933 25.5
1934 27.0
1935 26.8
1936 27.5
1937 25.9
1938 35.4
1939 29.2
1940 35.4
1941 34.7
1942 46.5
1943 22.1
1944 40.0
1945 29.3
1946 22.4
1947 9.1
1948 6.8
1949 5.6
1950 5.2
1951 6.3
1952 6.3
1953 6.5
1954 6.6
1955 7.2
1956 10.4
1957 5.5
1958 13.5
1959 16.2
1960 16.2
1961 14.0
1962 14.1
1963 19.6
1964 23.0
1965 20.2
1966 27.8
1967 27.1
1968 22.2
1969 21.2
1970 21.9
1971 22.5
1972 33.6
1973 29.3
1974 31.1
1975 30.9
1976 29.6
1977 30.4
1978 29.5
1979 29.4
1980 29.2
1981 29.3
1982 26.1
1983 24.7
1984 27.8
1985 26.6
1986 26.0
1987 26.6
1988 25.0
1989 26.2
1990 24.2
1991 22.2
1992 21.4
1993 24.8
1994 28.2
1995 31.6
1996 33.1
1997 38.2
1998 41.4
1999 44.7
2000 47.9
2001 51.1
2002 54.3
2003 54.0
2004 53.8
2005 53.8
2006 54.5
2007 54.7
2008 72.5
2009 72.5
2010 73.1
2011 73.1
2012 73.2
2013 73.2
2014 73.2
i2n21102es00.A / i2n21101es00.A * 100
1901 92.2
1902 92.2
1903 92.2
1904 92.1
1905 92.1
1906 92.1
1907 92.1
1908 92.1
1909 92.1
1910 92.2
1911 92.2
1912 92.1
1913 92.1
1914 92.1
1915 92.1
1916 92.0
1917 92.0
1918 92.0
1919 92.0
1920 92.0
1921 92.0
1922 92.0
1923 92.0
1924 92.0
1925 90.8
1926 91.4
1927 91.4
1928 91.3
1929 91.4
1930 91.2
1931 90.8
1932 91.1
1933 90.8
1934 91.1
1935 91.2
1936 91.2
1937 91.4
1938 91.4
1939 91.3
1940 91.3
1941 91.4
1942 91.6
1943 91.4
1944 91.7
1945 91.7
1946 91.6
1947 91.4
1948 91.5
1949 91.4
1950 91.4
1951 91.5
1952 91.6
1953 91.6
1954 91.7
1955 91.7
1956 91.7
1957 91.7
1958 91.7
1959 91.7
1960 90.6
1961 90.7
1962 90.6
1963 90.5
1964 90.6
1965 90.6
1966 90.7
1967 90.7
1968 90.6
1969 90.6
1970 90.6
1971 90.4
1972 90.4
1973 90.4
1974 90.5
1975 83.3
1976 85.4
1977 78.4
1978 72.3
1979 84.5
1980 78.2
1981 71.2
1982 73.4
1983 82.4
1984 89.9
1985 92.4
1986 90.2
1987 86.9
1988 88.5
1989 90.1
1990 95.7
1991 95.7
1992 95.6
1993 95.2
1994 93.0
1995 92.1
1996 93.5
1997 94.8
1998 91.9
1999 90.8
2000 89.8
2001 88.7
2002 87.6
2003 87.6
2004 87.6
2005 87.6
2006 88.9
2007 89.2
2008 89.4
2009 89.4
2010 89.4
2011 89.4
2012 89.4
2013 89.4
2014 89.4
Below is the SAS code progress I made
proc import datafile="X:\CAP INC\example_data.txt"
out=mydata
replace;
delimiter='09'x ;
getnames=yes;
run;
data ab;
set mydata;
if var1='' then delete;
if substr(var1,1,1)='i' then group=1;
else group=2;
run;
It seems that grouping this file would work best but I haven't been able to figure it out.
The desired output is attached.
Any help or direction would be greatly appreciated.
Many thanks and Happy New Year.
My approach would be similar to @Reezas, but I'd do it all while initially inputting the data:
data have; infile '/folders/myfolders/example_data.txt'; length group $7; retain group; input @; if _infile_ ne '' then do; if first(_infile_) eq 'i' then group=catt('KCFi',substr(_infile_,4,3)); else do; input year var1; output; end; end; run; proc sort data=have out=need; by year group; run; proc transpose data=need out=want (drop=_:); by year; var var1; id group; run;
Art, CEO, AnalystFinder.com
Attached here is an example of the text file I am trying to import and parse.
If you mean that
i2n110C2es00.A and "i2n110C1es00.A * 100" are supposed to be variable names those are not normally valid SAS variable names. Characters such as period, space or asterisk are not typically valid.
Most of your first column values really look like a Year. Is that the actual case? If so then it may be possible to create a data set that contains variable such as Year, Value and a TEXT variable containing the text such as "i2n110C2es00.A / i2n110C1es00.A * 100" to identify usage of the numeric value in the second column. Would that be acceptable?
Or show an example of what the final data set should actually look like for the first 10 rows of data with the variable names, values and data types for the variables using a data step. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
@yo1 wrote:
This is what the data set should look like at the end.
data example;
input year KCFI525 KCFI550 KCFI610 KCFI622;
datalines;
1986 12.8 31.9 55.8 12.5
1987 12.8 31.9 55.8 12.5
1988 12.8 31.9 55.8 12.5
1989 12.8 31.9 55.8 12.5
1990 12.8 31.9 55.8 12.5
1991 12.8 31.9 55.8 12.5
1992 12.8 31.9 55.8 12.5
199312.8 31.9 55.8 12.5
;
And where do you get "525", "550", "610" and "622" from? I don't see those numbers/character sequences anywhere in your input file.
If you post example data and expected results that correlate to each other, it's much easier to come up with working code. Don't force us to make guesses.
Use a data step instead.
Check the length of the record. If it's too long then it has the variable name, read it in and retain it.
Then read in the values.
Then you can transpose it later on.
data have;
input @;
retain variable_name;
x=_infile_;
if length(x) = <> then variable_name = scan(x, 2, '/');
else input var1 var2;
run;
My approach would be similar to @Reezas, but I'd do it all while initially inputting the data:
data have; infile '/folders/myfolders/example_data.txt'; length group $7; retain group; input @; if _infile_ ne '' then do; if first(_infile_) eq 'i' then group=catt('KCFi',substr(_infile_,4,3)); else do; input year var1; output; end; end; run; proc sort data=have out=need; by year group; run; proc transpose data=need out=want (drop=_:); by year; var var1; id group; run;
Art, CEO, AnalystFinder.com
Given the order of your data, this can be done in two DATA steps. The first data step reuses an ordered hash, to store and output each group in a separated data set, named need1, need2, etc. The second data step just merges all the NEED's.
data _null_;
infile 'c:\temp\t.txt' end=eod;
if _n_=1 then do;
declare hash h (ordered:'A');
h.definekey('year');
h.definedata('year','val');
h.definedone();
end;
input @;
if not(indexc(_infile_,'/') or _infile_=' ') then input year val;
if year^=. then rc=h.add();
length group $7;
retain group;
if indexc(_infile_,'/') or eod then do;
if h.num_items>0 then h.output(dataset:cats('need',group,'(rename=(val=',group,'))'));
h.clear();
group=catt('KCFi',substr(_infile_,4,3));
end;
run;
data want;
merge need: ;
by year;
run;
data temp;
infile 'c:\temp\test.txt' truncover expandtabs;
input @;
length name name1 name2 $ 200;
retain name name1 name2;
if anyalpha(_infile_) then do;
name=_infile_;
name1=scan(scan(_infile_,1,'/'),1,'.');
name2=scan(scan(_infile_,-1,'/'),1,'.');
delete;
end;
else if not missing(_infile_) then input a b;
else delete;
run;
proc sql;
create table unique as
select distinct name,name1,name2 from temp;
quit;
data _null_;
set unique end=last;
if _n_=1 then call execute('data want;merge ');
call execute(catt('temp(where=(name="',name,'") rename=(b=',name2,'))'));
if last then call execute(';by a;drop name name1 name2;run;');
run;
Hello everyone,
Massive thanks for your responses. They were both a learning lesson and constructive feedback. : )
The solution to this problem can be seen from PROC Star, mkeintz, and Ksharp code replies. Reeza you gave me a partial solution but it did show me a different approach to this problem, so you have my thanks!
I selected PROC star as the solution as it was the simplest to include into the main program I am working with.
Mkeintz SAS code using hash is awesome. However, my experience with hash is not at a level that I feel comfortable with (I have seen this method of coding in SAS at many SUG conferences). Thank you mkeintz.
Ksharp code works well but it appeared that I didn’t need the PROC SQL or the DATA NULL steps. Big thanks Ksharp!
On a side note, I had to modify the code for my purposes because the file was much, much longer than the example provided to this blog chain. Using PROC star’s code on the original data resulted in an ERROR message after the PROC step that stated “ERROR: The ID value "KCFi551" occurs twice in the same BY group.” ERROR: Too many bad BY groups.“
I had to insert the LET option for PROC TRANSPOSE, which allows duplicate values of an ID variable.
(http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1r2tjnp8ewe3sn1acnpn...) (http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n0mwa670j36v2an1ojkae...).
After doing that the program produced the desired output needed using the original text file.
Massive thanks everyone and Happy New Year!!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.