Solved: Re: Importing data

gobejo · Posted 02-11-2015 12:51 PM

Hi,

I am a new SAS user. I need help importing 200 records in a .txt file into a SAS data set. The records are laid in this format in the .txt file:

1

ABC

158,957%

$195.6m

CP & S

2

XYZ

57,348%

$82.6m

F & B

3

LMN

55,460%

$85.1m

BP & S

1

ABC

158,957%

$195.6m

El Segundo, CA

2

XYZ

57,348%

$82.6m

El Paso, TX

3

LMN

55,460%

$85.1m

Arlington, VA

1

ABC

158,957%

$195.6m

227

2

XYZ

57,348%

$82.6m

191

3

LMN

55,460%

$85.1m

145

I would like the SAS dataset to look like this.

Rank	Company	3 YR Growth	Revenue	Industry	Location	Employees
1	ABC	158,957%	$195.6m	CP & S	El Segundo, CA	227
2	XYZ	57,348%	$82.6m	F & B	El Paso, TX	191
3	LMN	55,460%	$85.1m	BP & S	Arlington, VA	145

Thanks,

Tom

gobejo · Posted 02-17-2015 03:34 PM

Hi Xia,

This is the right code :-D. Thanks for your help anyways.

WORK.TRIAL ;

'/sasdata/target_data/cont_improvement/datamart/Inc/1_200.txt' truncover;

a $80.;

'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt';

a ^= ' ' then put a;

;

one(keep=a b c d e)

keep=a b f g h)

keep=a b i j k);

'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt' truncover;

l = 1 to 200;

input a / b $20. / c $10. / d $10. / e $30. ;

output one;

;

l = 1 to 200;

input a / b $20. / f $10. / g $10. / h $30. ;

output two;

;

l = 1 to 200;

input a / b $20. / i $10. / j $10. / k $30. ;

output thr;

;

l ;

;

fou;

one two thr;

;

View solution in original post

ballardw · Posted 02-11-2015 01:34 PM

For the values

158,957% and $195.6m

do you want these to be numeric or character? 158957% = 1589.57 and 195.6m as 195600000?

gobejo · Posted 02-11-2015 01:38 PM

Hi ballardw,

I would like those to be numeric please.

Thanks,

Thomas

ballardw · Posted 02-11-2015 01:46 PM

I think you need to verify that the data as posted actually represents your .txt file as otherwise a generic solution is going to be VERY difficult without knowing how many rankings are involved (and if there are ties may be next to insoluble).

Note that this forum software will lose and add stuff occasionally depending on the source. Repeating some values and then adding having the fifth one change would be a very bizarre behavior.

gobejo · Posted 02-11-2015 01:50 PM

Would it help if i uploaded a .txt file?

Thanks,

Tom.

ballardw · Posted 02-11-2015 01:59 PM

Yes

gobejo · Posted 02-11-2015 02:00 PM

Hi,

I have uploaded a .txt file. I hope this helps.

Thanks,

Thomas

Tom · Posted 02-11-2015 04:01 PM

As long as each value is on its own line that should be easy.

Note that the column headers in your desired output are not valid variable names. But you could use them as labels.

data want ;

infile 'myfile.txt' truncover ;

input rank

/ company $50.

/ growth comma32.

/ revenue_txt $32.

/ industry $50.

/ location $200.

/ employees

;

revenue = input(compress(revenue_txt,'m'),comma32.);

run;

ballardw · Posted 02-11-2015 04:17 PM

Tom, if you take a closer look at the data example you'll see that rank, company, growth and revenue are repeated and then the last line of a block represents one of industry, location and employees.

This looks very much like output that was originally something like:

rank company growth Revenue industry

location

employees

The last 3 values stacked in a box. And then serialized to create one block of data for each of the stacked elements. The fun part is that those blocks are ordered by the rank. So all of the industry last are together, then all of the location, then the employees.

To get this on one record we have to read 5 lines together and then go skipping through the data much further down, and that distance would be determined by how many real "ranks" there were, at intervals of about #ranks*5+ some fudge depending on looking for location or employees.

At this point I would ask "How was that text file generated?"

I've seen similar come from copy/paste from the middle of an HTML table. And I haven't time to even approach a generic version assuming there are more than 3 ranks.

Tom · Posted 02-11-2015 07:10 PM

It is hard to look at this data, but it looks to me like in the example file there are 12 lines for each record except the last which only has 11. Every other line seems to be empty, it looks double spaced.

NOTE: 179 records were read from the infile XX('Test Data SAS comm.txt').

The minimum record length was 0.

The maximum record length was 14.

So the same company seems to appear three times.

So read it in. Sort it by RANK. Then you can transpose it.

data raw;

infile xx('Test Data SAS comm.txt') truncover end=eof;

row + 1;

input rank // company $50. // growth_txt $32. // @;

growth = input(growth_txt,comma32.)/100;

input revenue_txt $32. // @;

revenue = input(compress(revenue_txt,'m'),comma32.);

input str $200. //;

if not eof then input ;

format growth percent10. revenue comma10.1;

run;

proc sort;

by rank row;

run;

data want ;

do i=1 to 3 ;

set raw ;

if i=1 then Industry=str ;

if i=2 then Location=str ;

if i=3 then Employees=input(str,comma32.);

end;

drop row i str ;

run;

gobejo · Posted 02-12-2015 01:40 PM

Hi Tom,

your code did not give the right output.

Tom · Posted 02-17-2015 11:50 PM

Looks good to me.

growth_ revenue_

Obs rank company txt growth txt revenue Industry Location Employees

1 1 ABC 158,957% 158957% $195.6m 195.6 CP & S El Segundo, CA 227

2 2 XYZ 57,348% 57348% $82.6m 82.6 F & B El Segundo, CA 191

3 3 LMN 55,460% 55460% $85.1m 85.1 BP & S Arlington, VA 145

4 4 RST 26,043% 26043% $35.3m 35.3 Software Palo Alto, CA 62

5 5 UVW 20,690% 20690% $77.7m 77.7 Tele Maynard, MA 92

gobejo · Posted 02-18-2015 03:38 PM

Hi Tom,

How would you automate this step?

select * from inc5000.comp1_200

union

select * from inc5000.comp201_400

union

select * from inc5000.comp401_600

union

select * from inc5000.comp601_800

union

select * from inc5000.comp801_1000

union

select * from inc5000.comp1001_1200

union

select * from inc5000.comp1201_1400

union

select * from inc5000.comp1401_1600

union

select * from inc5000.comp1601_1800

union

select * from inc5000.comp1801_2000

union

select * from inc5000.comp2001_2200

union

select * from inc5000.comp2201_2400

union

select * from inc5000.comp2401_2600

union

select * from inc5000.comp2601_2800

union

select * from inc5000.comp2801_3000

union

select * from inc5000.comp3001_3200

union

select * from inc5000.comp3201_3400

union

select * from inc5000.comp3401_3600

union

select * from inc5000.comp3601_3800

union

select * from inc5000.comp3801_4000

union

select * from inc5000.comp4001_4200

union

select * from inc5000.comp4201_4400

union

select * from inc5000.comp4401_4600

union

select * from inc5000.comp4601_4800

union

select * from inc5000.comp4801_5000

;

quit;

Thanks,

Thomas

Tom · Posted 02-18-2015 03:46 PM

data want ;

set inc5000.comp: ;

run;

Ksharp · Posted 02-12-2015 03:56 AM

data have;
infile cards truncover;
length id $ 20;
retain id;
input value $80.;
if not notdigit(strip(value)) then do;id=value;delete;end;
cards;
1
ABC
158,957%
$195.6m
CP & S
2
XYZ
57,348%
$82.6m
F & B
3
LMN
55,460%
$85.1m
BP & S
1
ABC
158,957%
$195.6m
El Segundo, CA
2
XYZ
57,348%
$82.6m
El Paso, TX
3
LMN
55,460%
$85.1m
Arlington, VA
1
ABC
158,957%
$195.6m
227
2
XYZ
57,348%
$82.6m
191
3
LMN
55,460%
$85.1m
145
;
run;
proc sort data=have out=temp nodupkey; by id value;run;
data temp;
 set temp;
 length vname $ 40;
 if findc(value,'$') then vname='revenue';
  else if  findc(value,'&') then vname='industry';
   else if findc(value,'%') then vname='_3yr_growth';
    else if  findc(value,',') then vname='location';
      else vname='company';
run;
proc transpose data=temp out=want(drop=_:);
by id;
id vname;
var value;
run;

Xia Keshan

Registration is open

SAS Training: Just a Click Away