BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Damon1
Obsidian | Level 7

Hi Everyone, 

 

Been learning for about a week. So, i have a data set that follows the following pattern:

rank (numeric), company ($), country ($), sales (num), profits (num), assets (num), market value (num)

 

the data under the last 4 numeric variables are presented in billions of dollars, as "$1.5B" or "$200.1B", and vary in length 

 

Im using the following code:

DATA bigcos;
INFILE 'C:\Users\dlobsien\Documents\bigcompanies.txt';
INPUT ranking company_name $26. +1 country $18. sales_in_billions :dollar10. profits_in_billions :dollar10. assets_in_billions :dollar10. +1 profits_in_billions :dollar10.;
title companies data;
RUN;

——————————-

PROC PRINT data=bigcos (obs=15);
Run;

 

the first three variables go in fine, but it wont read anything after that with the Bs in the way (i assume that's the issue). whenever i add a line that says "compress(var, 'B')", after the input lines it gives me 0 observations—maybe I'm entering the syntax wrong or putting it in the wrong spot. 

 

I've also included the data set in case that is useful.

 

Any help would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Read the so called numeric data as char type first:

DATA bigcos;
INFILE 'C:\Users\dlobsien\Documents\bigcompanies.txt';
INPUT ranking company_name $26. 
      +1 country $18. sales $ profits $ assets $ ;
	
      drop sales profits assets ; /* the char type numeric variables */	
	  
      sales_in_billions  = input(compress(sales,'B'),dollar10.);
      profits_in_billions = inpt(compress(profits,'B'),dollar10.); 
      assets_in_billions  = input(compress(assets,'B'),dollar10.);
	  
      /* +1 profits_in_billions :dollar10.; <<< you can't have two variables with same name ! */

RUN;

View solution in original post

6 REPLIES 6
Shmuel
Garnet | Level 18

Read the so called numeric data as char type first:

DATA bigcos;
INFILE 'C:\Users\dlobsien\Documents\bigcompanies.txt';
INPUT ranking company_name $26. 
      +1 country $18. sales $ profits $ assets $ ;
	
      drop sales profits assets ; /* the char type numeric variables */	
	  
      sales_in_billions  = input(compress(sales,'B'),dollar10.);
      profits_in_billions = inpt(compress(profits,'B'),dollar10.); 
      assets_in_billions  = input(compress(assets,'B'),dollar10.);
	  
      /* +1 profits_in_billions :dollar10.; <<< you can't have two variables with same name ! */

RUN;
Damon1
Obsidian | Level 7

yeah this is what i did. i then used compress to take out the $,B and converted it to numeric in a new data set using the best. format 

Tom
Super User Tom
Super User

@Damon1 wrote:

yeah this is what i did. i then used compress to take out the $,B and converted it to numeric in a new data set using the best. format 


You convert from text to values using an INFORMAT, not a FORMAT.  BEST is a format but if you try to use it as in informat SAS will just go ahead and use the normal numeric informat instead.  If you use the COMMA informat there is no need to remove the dollar signs or commas as that informat will strip those for you.

Damon1
Obsidian | Level 7

thanks!

Tom
Super User Tom
Super User

Is it always a B?  If so then just ignore it.

First thing to do is figure out where the data is on the line. The LIST statement is good for that.

data _null_;
  infile "&path\bigcompanies.txt" obs=5;
  input;
  list;
run;
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+--
1         1    Exxon Mobil                  United States     $433.5B   $41.1B   $331.1B   $407.4B 88
2         2    JPMorgan Chase               United States     $110.8B     $19B $2,265.8B   $170.1B 88
3         3    General Electric             United States     $147.3B   $14.2B   $717.2B   $213.7B 88
4         4    Royal Dutch Shell            Netherlands       $470.2B   $30.9B   $340.5B   $227.6B 88
5         5    ICBC                         China              $82.6B   $25.1B $2,039.1B   $237.4B 88
NOTE: 5 records were read from the infile "xxx\bigcompanies.txt".

Then you can write a program to read the lines. Make sure to give each variable a UNIQUE name.

data bigcos (label='companies data');
  infile "&path\bigcompanies.txt" truncover;
  input ranking 1-5
        company_name $ 6-34
        country $ 35-49 
    @51 sales_in_billions dollar8. 
    @60 profits_in_billions dollar8. 
    @69 assets_in_billions dollar8. 
    @79 profits_in_billions2 dollar9.
  ;
  format sales_in_billions -- profits_in_billions2 dollar10.1;
run;
TomKari
Onyx | Level 15

I believe that your problem is that several of the company names include accents, which are probably taking up more than one byte if you're not reading them correctly. This is pushing your columns out of line.

 

This second comment is a little advanced, but file it away for later. When I'm processing formatted data like this, I like to test the variables to ensure that they're well formed. The PRX routines are an easy way to do this (not completely tested).

 

Tom

 

data TestData;
	length SalesC $15;
	input;
	SalesC = _infile_;
	cards;
$433.5B
$110.8B
147.3B
 $470.2B
 $82.6B
  $102B
$310.1B
$310.B
 143.7B
 $87.6B
 $145.9
$375.5B
$.3B
 $68.7B
run;

data Problems(drop=_:);
	retain _RX1;

	if _n_ = 1 then
		_RX1 = prxparse("~^[[:space:]]*\$[[:digit:]]+(\.[[:digit:]])?B[[:space:]]*$~");
	set TestData;

	if ^prxmatch(_RX1, SalesC) then
		output;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1922 views
  • 3 likes
  • 4 in conversation