BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
gobejo
Calcite | Level 5

Hi,

I am a new SAS user. I need help importing 200 records in a .txt file into a SAS data set. The records are laid in this format in the .txt file:

1

ABC

158,957%

$195.6m

CP & S

2

XYZ

57,348%

$82.6m

F & B

3

LMN

55,460%

$85.1m

BP & S

1

ABC

158,957%

$195.6m

El Segundo, CA

2

XYZ

57,348%

$82.6m

El Paso, TX

3

LMN

55,460%

$85.1m

Arlington, VA

1

ABC

158,957%

$195.6m

227

2

XYZ

57,348%

$82.6m

191

3

LMN

55,460%

$85.1m

145

I would like the SAS dataset to look like this.

RankCompany3 YR GrowthRevenueIndustryLocationEmployees
1ABC158,957%$195.6mCP & SEl Segundo, CA227
2XYZ57,348%$82.6mF & BEl Paso, TX191
3LMN55,460%$85.1mBP & SArlington, VA145

Thanks,

Tom

1 ACCEPTED SOLUTION

Accepted Solutions
gobejo
Calcite | Level 5

Hi Xia,

This is the right code :-D. Thanks for your help anyways.

WORK.TRIAL ;


'/sasdata/target_data/cont_improvement/datamart/Inc/1_200.txt' truncover;


a $80.;


'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt';


a ^= ' ' then put a;


;



one(keep=a b c d e)


keep=a b f g h)


keep=a b i j k);


'/sasdata/target_data/cont_improvement/datamart/Inc/1_200_.txt' truncover;


l = 1 to 200;


input a / b $20. / c $10. / d $10. / e $30. ;


output one;


;


l = 1 to 200;


input a / b $20. / f $10. / g $10. / h $30. ;


output two;


;


l = 1 to 200;


input a / b $20. / i $10. / j $10. / k $30. ;


output thr;


;


l ;


;



fou;


one two thr;


;

View solution in original post

16 REPLIES 16
ballardw
Super User

For the values

158,957% and $195.6m

do you want these to be numeric or character? 158957% = 1589.57 and 195.6m as 195600000?

gobejo
Calcite | Level 5

Hi ballardw,

I would like those to be numeric please.

Thanks,

Thomas

ballardw
Super User

I think you need to verify that the data as posted actually represents your .txt file as otherwise a generic solution is going to be VERY difficult without knowing how many rankings are involved (and if there are ties may be next to insoluble).

Note that this forum software will lose and add stuff occasionally depending on the source. Repeating some values and then adding having the fifth one change would be a very bizarre behavior.

gobejo
Calcite | Level 5

Would it help if i uploaded a .txt file?

Thanks,

Tom.

ballardw
Super User

Yes

gobejo
Calcite | Level 5

Hi,

I have uploaded a .txt file. I hope this helps.

Thanks,

Thomas

Tom
Super User Tom
Super User

As long as each value is on its own line that should be easy.

Note that the column headers in your desired output are not valid variable names.  But you could use them as labels.

data want ;

  infile 'myfile.txt' truncover ;

  input rank

      / company $50.

      / growth comma32.

      /  revenue_txt $32.

      / industry $50.

      /  location $200.

      / employees

  ;

  revenue = input(compress(revenue_txt,'m'),comma32.);

run;

ballardw
Super User

Tom, if you take a closer look at the data example you'll see that rank, company, growth and revenue are repeated and then the last line of a block represents one of industry, location and employees.

This looks very much like output that was originally something like:

rank     company      growth     Revenue     industry

                                                                    location

                                                                      employees

The last 3 values stacked in a box. And then serialized to create one block of data for each of the stacked elements. The fun part is that those blocks are ordered by the rank. So all of the industry last are together, then all of the location, then the employees.

To get this on one record we have to read 5 lines together and then go skipping through the data much further down, and that distance would be determined by how many real "ranks" there were, at intervals of about #ranks*5+ some fudge depending on looking for location or employees.

At this point I would ask "How was that text file generated?"

I've seen similar come from copy/paste from the middle of an HTML table. And I haven't time to even approach a generic version assuming there are more than 3 ranks.

Tom
Super User Tom
Super User

It is hard to look at this data, but it looks to me like in the example file there are 12 lines for each record except the last which only has 11.  Every other line seems to be empty, it looks double spaced.

NOTE: 179 records were read from the infile XX('Test Data SAS comm.txt').

      The minimum record length was 0.

      The maximum record length was 14.

So the same company seems to appear three times.

So read it in. Sort it by RANK.  Then you can transpose it.

data raw;

  infile xx('Test Data SAS comm.txt') truncover end=eof;

  row + 1;

  input rank // company $50. // growth_txt $32. // @;

  growth = input(growth_txt,comma32.)/100;

  input revenue_txt $32. // @;

  revenue = input(compress(revenue_txt,'m'),comma32.);

  input str $200. //;

  if not eof then input ;

  format growth percent10. revenue comma10.1;

run;

proc sort;

  by rank row;

run;

data want ;

  do i=1 to 3 ;

    set raw ;

    if i=1 then Industry=str ;

    if i=2 then Location=str ;

    if i=3 then Employees=input(str,comma32.);

  end;

  drop row i str ;

run;

gobejo
Calcite | Level 5

Hi Tom,

your code did not give the right output.

Tom
Super User Tom
Super User

Looks good to me.

                          growth_                   revenue_

Obs    rank    company      txt           growth      txt          revenue    Industry       Location       Employees

1       1       ABC      158,957%      158957%     $195.6m          195.6    CP & S      El Segundo, CA       227

2       2       XYZ      57,348%        57348%     $82.6m            82.6    F & B       El Segundo, CA       191

3       3       LMN      55,460%        55460%     $85.1m            85.1    BP & S      Arlington, VA        145

4       4       RST      26,043%        26043%     $35.3m            35.3    Software    Palo Alto, CA         62

5       5       UVW      20,690%        20690%     $77.7m            77.7    Tele        Maynard, MA           92

gobejo
Calcite | Level 5

Hi Tom,

How would you automate this step?

select * from inc5000.comp1_200

union

select * from inc5000.comp201_400

union

select * from inc5000.comp401_600

union

select * from inc5000.comp601_800

union

select * from inc5000.comp801_1000

union

select * from inc5000.comp1001_1200

union

select * from inc5000.comp1201_1400

union

select * from inc5000.comp1401_1600

union

select * from inc5000.comp1601_1800

union

select * from inc5000.comp1801_2000

union

select * from inc5000.comp2001_2200

union

select * from inc5000.comp2201_2400

union

select * from inc5000.comp2401_2600

union

select * from inc5000.comp2601_2800

union

select * from inc5000.comp2801_3000

union

select * from inc5000.comp3001_3200

union

select * from inc5000.comp3201_3400

union

select * from inc5000.comp3401_3600

union

select * from inc5000.comp3601_3800

union

select * from inc5000.comp3801_4000

union

select * from inc5000.comp4001_4200

union

select * from inc5000.comp4201_4400

union

select * from inc5000.comp4401_4600

union

select * from inc5000.comp4601_4800

union

select * from inc5000.comp4801_5000

;

quit;

Thanks,

Thomas

Tom
Super User Tom
Super User

data want ;

set inc5000.comp: ;

run;

Ksharp
Super User
data have;
infile cards truncover;
length id $ 20;
retain id;
input value $80.;
if not notdigit(strip(value)) then do;id=value;delete;end;
cards;
1
ABC
158,957%
$195.6m
CP & S
2
XYZ
57,348%
$82.6m
F & B
3
LMN
55,460%
$85.1m
BP & S
1
ABC
158,957%
$195.6m
El Segundo, CA
2
XYZ
57,348%
$82.6m
El Paso, TX
3
LMN
55,460%
$85.1m
Arlington, VA
1
ABC
158,957%
$195.6m
227
2
XYZ
57,348%
$82.6m
191
3
LMN
55,460%
$85.1m
145
;
run;
proc sort data=have out=temp nodupkey; by id value;run;
data temp;
 set temp;
 length vname $ 40;
 if findc(value,'$') then vname='revenue';
  else if  findc(value,'&') then vname='industry';
   else if findc(value,'%') then vname='_3yr_growth';
    else if  findc(value,',') then vname='location';
      else vname='company';
run;
proc transpose data=temp out=want(drop=_:);
by id;
id vname;
var value;
run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1715 views
  • 2 likes
  • 4 in conversation