Help using Base SAS procedures

Importing data

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

Importing data

Hi,

I am a new SAS user. I need help importing 200 records in a .txt file into a SAS data set. The records are laid in this format in the .txt file:

1

ABC

158,957%

$195.6m

CP & S

2

XYZ

57,348%

$82.6m

F & B

3

LMN

55,460%

$85.1m

BP & S

1

ABC

158,957%

$195.6m

El Segundo, CA

2

XYZ

57,348%

$82.6m

El Paso, TX

3

LMN

55,460%

$85.1m

Arlington, VA

1

ABC

158,957%

$195.6m

227

2

XYZ

57,348%

$82.6m

191

3

LMN

55,460%

$85.1m

145

I would like the SAS dataset to look like this.

RankCompany3 YR GrowthRevenueIndustryLocationEmployees
1ABC158,957%$195.6mCP & SEl Segundo, CA227
2XYZ57,348%$82.6mF & BEl Paso, TX191
3LMN55,460%$85.1mBP & SArlington, VA145

Thanks,

Tom


Accepted Solutions
Solution
‎02-17-2015 03:34 PM
Contributor
Posts: 28

Re: Importing data

Hi Xia,

This is the right code :-D. Thanks for your help anyways.

WORK.TRIAL ;


'/sasdata/target_data/cont_improvement/datamart/Inc/1_200.txt' truncover;


a $80.;


'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt';


a ^= ' ' then put a;


;



one(keep=a b c d e)


keep=a b f g h)


keep=a b i j k);


'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt' truncover;


l = 1 to 200;


input a / b $20. / c $10. / d $10. / e $30. ;


output one;


;


l = 1 to 200;


input a / b $20. / f $10. / g $10. / h $30. ;


output two;


;


l = 1 to 200;


input a / b $20. / i $10. / j $10. / k $30. ;


output thr;


;


l ;


;



fou;


one two thr;


;

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Importing data

For the values

158,957% and $195.6m

do you want these to be numeric or character? 158957% = 1589.57 and 195.6m as 195600000?

Contributor
Posts: 28

Re: Importing data

Hi ballardw,

I would like those to be numeric please.

Thanks,

Thomas

Super User
Posts: 11,343

Re: Importing data

I think you need to verify that the data as posted actually represents your .txt file as otherwise a generic solution is going to be VERY difficult without knowing how many rankings are involved (and if there are ties may be next to insoluble).

Note that this forum software will lose and add stuff occasionally depending on the source. Repeating some values and then adding having the fifth one change would be a very bizarre behavior.

Contributor
Posts: 28

Re: Importing data

Would it help if i uploaded a .txt file?

Thanks,

Tom.

Super User
Posts: 11,343

Re: Importing data

Yes

Contributor
Posts: 28

Re: Importing data

Hi,

I have uploaded a .txt file. I hope this helps.

Thanks,

Thomas

Attachment
Super User
Super User
Posts: 7,074

Re: Importing data

As long as each value is on its own line that should be easy.

Note that the column headers in your desired output are not valid variable names.  But you could use them as labels.

data want ;

  infile 'myfile.txt' truncover ;

  input rank

      / company $50.

      / growth comma32.

      /  revenue_txt $32.

      / industry $50.

      /  location $200.

      / employees

  ;

  revenue = input(compress(revenue_txt,'m'),comma32.);

run;

Super User
Posts: 11,343

Re: Importing data

Tom, if you take a closer look at the data example you'll see that rank, company, growth and revenue are repeated and then the last line of a block represents one of industry, location and employees.

This looks very much like output that was originally something like:

rank     company      growth     Revenue     industry

                                                                    location

                                                                      employees

The last 3 values stacked in a box. And then serialized to create one block of data for each of the stacked elements. The fun part is that those blocks are ordered by the rank. So all of the industry last are together, then all of the location, then the employees.

To get this on one record we have to read 5 lines together and then go skipping through the data much further down, and that distance would be determined by how many real "ranks" there were, at intervals of about #ranks*5+ some fudge depending on looking for location or employees.

At this point I would ask "How was that text file generated?"

I've seen similar come from copy/paste from the middle of an HTML table. And I haven't time to even approach a generic version assuming there are more than 3 ranks.

Super User
Super User
Posts: 7,074

Re: Importing data

It is hard to look at this data, but it looks to me like in the example file there are 12 lines for each record except the last which only has 11.  Every other line seems to be empty, it looks double spaced.

NOTE: 179 records were read from the infile XX('Test Data SAS comm.txt').

      The minimum record length was 0.

      The maximum record length was 14.

So the same company seems to appear three times.

So read it in. Sort it by RANK.  Then you can transpose it.

data raw;

  infile xx('Test Data SAS comm.txt') truncover end=eof;

  row + 1;

  input rank // company $50. // growth_txt $32. // @;

  growth = input(growth_txt,comma32.)/100;

  input revenue_txt $32. // @;

  revenue = input(compress(revenue_txt,'m'),comma32.);

  input str $200. //;

  if not eof then input ;

  format growth percent10. revenue comma10.1;

run;

proc sort;

  by rank row;

run;

data want ;

  do i=1 to 3 ;

    set raw ;

    if i=1 then Industry=str ;

    if i=2 then Location=str ;

    if i=3 then Employees=input(str,comma32.);

  end;

  drop row i str ;

run;

Contributor
Posts: 28

Re: Importing data

Hi Tom,

your code did not give the right output.

Super User
Super User
Posts: 7,074

Re: Importing data

Looks good to me.

                          growth_                   revenue_

Obs    rank    company      txt           growth      txt          revenue    Industry       Location       Employees

1       1       ABC      158,957%      158957%     $195.6m          195.6    CP & S      El Segundo, CA       227

2       2       XYZ      57,348%        57348%     $82.6m            82.6    F & B       El Segundo, CA       191

3       3       LMN      55,460%        55460%     $85.1m            85.1    BP & S      Arlington, VA        145

4       4       RST      26,043%        26043%     $35.3m            35.3    Software    Palo Alto, CA         62

5       5       UVW      20,690%        20690%     $77.7m            77.7    Tele        Maynard, MA           92

Contributor
Posts: 28

Re: Importing data

Hi Tom,

How would you automate this step?

select * from inc5000.comp1_200

union

select * from inc5000.comp201_400

union

select * from inc5000.comp401_600

union

select * from inc5000.comp601_800

union

select * from inc5000.comp801_1000

union

select * from inc5000.comp1001_1200

union

select * from inc5000.comp1201_1400

union

select * from inc5000.comp1401_1600

union

select * from inc5000.comp1601_1800

union

select * from inc5000.comp1801_2000

union

select * from inc5000.comp2001_2200

union

select * from inc5000.comp2201_2400

union

select * from inc5000.comp2401_2600

union

select * from inc5000.comp2601_2800

union

select * from inc5000.comp2801_3000

union

select * from inc5000.comp3001_3200

union

select * from inc5000.comp3201_3400

union

select * from inc5000.comp3401_3600

union

select * from inc5000.comp3601_3800

union

select * from inc5000.comp3801_4000

union

select * from inc5000.comp4001_4200

union

select * from inc5000.comp4201_4400

union

select * from inc5000.comp4401_4600

union

select * from inc5000.comp4601_4800

union

select * from inc5000.comp4801_5000

;

quit;

Thanks,

Thomas

Super User
Super User
Posts: 7,074

Re: Importing data

data want ;

set inc5000.comp: ;

run;

Super User
Posts: 10,041

Re: Importing data

data have;
infile cards truncover;
length id $ 20;
retain id;
input value $80.;
if not notdigit(strip(value)) then do;id=value;delete;end;
cards;
1
ABC
158,957%
$195.6m
CP & S
2
XYZ
57,348%
$82.6m
F & B
3
LMN
55,460%
$85.1m
BP & S
1
ABC
158,957%
$195.6m
El Segundo, CA
2
XYZ
57,348%
$82.6m
El Paso, TX
3
LMN
55,460%
$85.1m
Arlington, VA
1
ABC
158,957%
$195.6m
227
2
XYZ
57,348%
$82.6m
191
3
LMN
55,460%
$85.1m
145
;
run;
proc sort data=have out=temp nodupkey; by id value;run;
data temp;
 set temp;
 length vname $ 40;
 if findc(value,'$') then vname='revenue';
  else if  findc(value,'&') then vname='industry';
   else if findc(value,'%') then vname='_3yr_growth';
    else if  findc(value,',') then vname='location';
      else vname='company';
run;
proc transpose data=temp out=want(drop=_:);
by id;
id vname;
var value;
run;

Xia Keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 581 views
  • 2 likes
  • 4 in conversation