Hi,
I am a new SAS user. I need help importing 200 records in a .txt file into a SAS data set. The records are laid in this format in the .txt file:
1
ABC
158,957%
$195.6m
CP & S
2
XYZ
57,348%
$82.6m
F & B
3
LMN
55,460%
$85.1m
BP & S
1
ABC
158,957%
$195.6m
El Segundo, CA
2
XYZ
57,348%
$82.6m
El Paso, TX
3
LMN
55,460%
$85.1m
Arlington, VA
1
ABC
158,957%
$195.6m
227
2
XYZ
57,348%
$82.6m
191
3
LMN
55,460%
$85.1m
145
I would like the SAS dataset to look like this.
Rank | Company | 3 YR Growth | Revenue | Industry | Location | Employees |
---|---|---|---|---|---|---|
1 | ABC | 158,957% | $195.6m | CP & S | El Segundo, CA | 227 |
2 | XYZ | 57,348% | $82.6m | F & B | El Paso, TX | 191 |
3 | LMN | 55,460% | $85.1m | BP & S | Arlington, VA | 145 |
Thanks,
Tom
Hi Xia,
This is the right code :-D. Thanks for your help anyways.
WORK.TRIAL ;
'/sasdata/target_data/cont_improvement/datamart/Inc/1_200.txt' truncover;
a $80.;
'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt';
a ^= ' ' then put a;
;
one(keep=a b c d e)
keep=a b f g h)
keep=a b i j k);
'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt' truncover;
l = 1 to 200;
input a / b $20. / c $10. / d $10. / e $30. ;
output one;
;
l = 1 to 200;
input a / b $20. / f $10. / g $10. / h $30. ;
output two;
;
l = 1 to 200;
input a / b $20. / i $10. / j $10. / k $30. ;
output thr;
;
l ;
;
fou;
one two thr;
;
For the values
158,957% and $195.6m
do you want these to be numeric or character? 158957% = 1589.57 and 195.6m as 195600000?
Hi ballardw,
I would like those to be numeric please.
Thanks,
Thomas
I think you need to verify that the data as posted actually represents your .txt file as otherwise a generic solution is going to be VERY difficult without knowing how many rankings are involved (and if there are ties may be next to insoluble).
Note that this forum software will lose and add stuff occasionally depending on the source. Repeating some values and then adding having the fifth one change would be a very bizarre behavior.
Would it help if i uploaded a .txt file?
Thanks,
Tom.
Yes
Hi,
I have uploaded a .txt file. I hope this helps.
Thanks,
Thomas
As long as each value is on its own line that should be easy.
Note that the column headers in your desired output are not valid variable names. But you could use them as labels.
data want ;
infile 'myfile.txt' truncover ;
input rank
/ company $50.
/ growth comma32.
/ revenue_txt $32.
/ industry $50.
/ location $200.
/ employees
;
revenue = input(compress(revenue_txt,'m'),comma32.);
run;
Tom, if you take a closer look at the data example you'll see that rank, company, growth and revenue are repeated and then the last line of a block represents one of industry, location and employees.
This looks very much like output that was originally something like:
rank company growth Revenue industry
location
employees
The last 3 values stacked in a box. And then serialized to create one block of data for each of the stacked elements. The fun part is that those blocks are ordered by the rank. So all of the industry last are together, then all of the location, then the employees.
To get this on one record we have to read 5 lines together and then go skipping through the data much further down, and that distance would be determined by how many real "ranks" there were, at intervals of about #ranks*5+ some fudge depending on looking for location or employees.
At this point I would ask "How was that text file generated?"
I've seen similar come from copy/paste from the middle of an HTML table. And I haven't time to even approach a generic version assuming there are more than 3 ranks.
It is hard to look at this data, but it looks to me like in the example file there are 12 lines for each record except the last which only has 11. Every other line seems to be empty, it looks double spaced.
NOTE: 179 records were read from the infile XX('Test Data SAS comm.txt').
The minimum record length was 0.
The maximum record length was 14.
So the same company seems to appear three times.
So read it in. Sort it by RANK. Then you can transpose it.
data raw;
infile xx('Test Data SAS comm.txt') truncover end=eof;
row + 1;
input rank // company $50. // growth_txt $32. // @;
growth = input(growth_txt,comma32.)/100;
input revenue_txt $32. // @;
revenue = input(compress(revenue_txt,'m'),comma32.);
input str $200. //;
if not eof then input ;
format growth percent10. revenue comma10.1;
run;
proc sort;
by rank row;
run;
data want ;
do i=1 to 3 ;
set raw ;
if i=1 then Industry=str ;
if i=2 then Location=str ;
if i=3 then Employees=input(str,comma32.);
end;
drop row i str ;
run;
Hi Tom,
your code did not give the right output.
Looks good to me.
growth_ revenue_
Obs rank company txt growth txt revenue Industry Location Employees
1 1 ABC 158,957% 158957% $195.6m 195.6 CP & S El Segundo, CA 227
2 2 XYZ 57,348% 57348% $82.6m 82.6 F & B El Segundo, CA 191
3 3 LMN 55,460% 55460% $85.1m 85.1 BP & S Arlington, VA 145
4 4 RST 26,043% 26043% $35.3m 35.3 Software Palo Alto, CA 62
5 5 UVW 20,690% 20690% $77.7m 77.7 Tele Maynard, MA 92
Hi Tom,
How would you automate this step?
select * from inc5000.comp1_200
union
select * from inc5000.comp201_400
union
select * from inc5000.comp401_600
union
select * from inc5000.comp601_800
union
select * from inc5000.comp801_1000
union
select * from inc5000.comp1001_1200
union
select * from inc5000.comp1201_1400
union
select * from inc5000.comp1401_1600
union
select * from inc5000.comp1601_1800
union
select * from inc5000.comp1801_2000
union
select * from inc5000.comp2001_2200
union
select * from inc5000.comp2201_2400
union
select * from inc5000.comp2401_2600
union
select * from inc5000.comp2601_2800
union
select * from inc5000.comp2801_3000
union
select * from inc5000.comp3001_3200
union
select * from inc5000.comp3201_3400
union
select * from inc5000.comp3401_3600
union
select * from inc5000.comp3601_3800
union
select * from inc5000.comp3801_4000
union
select * from inc5000.comp4001_4200
union
select * from inc5000.comp4201_4400
union
select * from inc5000.comp4401_4600
union
select * from inc5000.comp4601_4800
union
select * from inc5000.comp4801_5000
;
quit;
Thanks,
Thomas
data want ;
set inc5000.comp: ;
run;
data have; infile cards truncover; length id $ 20; retain id; input value $80.; if not notdigit(strip(value)) then do;id=value;delete;end; cards; 1 ABC 158,957% $195.6m CP & S 2 XYZ 57,348% $82.6m F & B 3 LMN 55,460% $85.1m BP & S 1 ABC 158,957% $195.6m El Segundo, CA 2 XYZ 57,348% $82.6m El Paso, TX 3 LMN 55,460% $85.1m Arlington, VA 1 ABC 158,957% $195.6m 227 2 XYZ 57,348% $82.6m 191 3 LMN 55,460% $85.1m 145 ; run; proc sort data=have out=temp nodupkey; by id value;run; data temp; set temp; length vname $ 40; if findc(value,'$') then vname='revenue'; else if findc(value,'&') then vname='industry'; else if findc(value,'%') then vname='_3yr_growth'; else if findc(value,',') then vname='location'; else vname='company'; run; proc transpose data=temp out=want(drop=_:); by id; id vname; var value; run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.