Hi Everyone,
Been learning for about a week. So, i have a data set that follows the following pattern:
rank (numeric), company ($), country ($), sales (num), profits (num), assets (num), market value (num)
the data under the last 4 numeric variables are presented in billions of dollars, as "$1.5B" or "$200.1B", and vary in length
Im using the following code:
DATA bigcos;
INFILE 'C:\Users\dlobsien\Documents\bigcompanies.txt';
INPUT ranking company_name $26. +1 country $18. sales_in_billions :dollar10. profits_in_billions :dollar10. assets_in_billions :dollar10. +1 profits_in_billions :dollar10.;
title companies data;
RUN;
——————————-
PROC PRINT data=bigcos (obs=15);
Run;
the first three variables go in fine, but it wont read anything after that with the Bs in the way (i assume that's the issue). whenever i add a line that says "compress(var, 'B')", after the input lines it gives me 0 observations—maybe I'm entering the syntax wrong or putting it in the wrong spot.
I've also included the data set in case that is useful.
Any help would be greatly appreciated.
Read the so called numeric data as char type first:
DATA bigcos;
INFILE 'C:\Users\dlobsien\Documents\bigcompanies.txt';
INPUT ranking company_name $26.
+1 country $18. sales $ profits $ assets $ ;
drop sales profits assets ; /* the char type numeric variables */
sales_in_billions = input(compress(sales,'B'),dollar10.);
profits_in_billions = inpt(compress(profits,'B'),dollar10.);
assets_in_billions = input(compress(assets,'B'),dollar10.);
/* +1 profits_in_billions :dollar10.; <<< you can't have two variables with same name ! */
RUN;
Read the so called numeric data as char type first:
DATA bigcos;
INFILE 'C:\Users\dlobsien\Documents\bigcompanies.txt';
INPUT ranking company_name $26.
+1 country $18. sales $ profits $ assets $ ;
drop sales profits assets ; /* the char type numeric variables */
sales_in_billions = input(compress(sales,'B'),dollar10.);
profits_in_billions = inpt(compress(profits,'B'),dollar10.);
assets_in_billions = input(compress(assets,'B'),dollar10.);
/* +1 profits_in_billions :dollar10.; <<< you can't have two variables with same name ! */
RUN;
yeah this is what i did. i then used compress to take out the $,B and converted it to numeric in a new data set using the best. format
@Damon1 wrote:
yeah this is what i did. i then used compress to take out the $,B and converted it to numeric in a new data set using the best. format
You convert from text to values using an INFORMAT, not a FORMAT. BEST is a format but if you try to use it as in informat SAS will just go ahead and use the normal numeric informat instead. If you use the COMMA informat there is no need to remove the dollar signs or commas as that informat will strip those for you.
thanks!
Is it always a B? If so then just ignore it.
First thing to do is figure out where the data is on the line. The LIST statement is good for that.
data _null_;
infile "&path\bigcompanies.txt" obs=5;
input;
list;
run;
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-- 1 1 Exxon Mobil United States $433.5B $41.1B $331.1B $407.4B 88 2 2 JPMorgan Chase United States $110.8B $19B $2,265.8B $170.1B 88 3 3 General Electric United States $147.3B $14.2B $717.2B $213.7B 88 4 4 Royal Dutch Shell Netherlands $470.2B $30.9B $340.5B $227.6B 88 5 5 ICBC China $82.6B $25.1B $2,039.1B $237.4B 88 NOTE: 5 records were read from the infile "xxx\bigcompanies.txt".
Then you can write a program to read the lines. Make sure to give each variable a UNIQUE name.
data bigcos (label='companies data');
infile "&path\bigcompanies.txt" truncover;
input ranking 1-5
company_name $ 6-34
country $ 35-49
@51 sales_in_billions dollar8.
@60 profits_in_billions dollar8.
@69 assets_in_billions dollar8.
@79 profits_in_billions2 dollar9.
;
format sales_in_billions -- profits_in_billions2 dollar10.1;
run;
I believe that your problem is that several of the company names include accents, which are probably taking up more than one byte if you're not reading them correctly. This is pushing your columns out of line.
This second comment is a little advanced, but file it away for later. When I'm processing formatted data like this, I like to test the variables to ensure that they're well formed. The PRX routines are an easy way to do this (not completely tested).
Tom
data TestData;
length SalesC $15;
input;
SalesC = _infile_;
cards;
$433.5B
$110.8B
147.3B
$470.2B
$82.6B
$102B
$310.1B
$310.B
143.7B
$87.6B
$145.9
$375.5B
$.3B
$68.7B
run;
data Problems(drop=_:);
retain _RX1;
if _n_ = 1 then
_RX1 = prxparse("~^[[:space:]]*\$[[:digit:]]+(\.[[:digit:]])?B[[:space:]]*$~");
set TestData;
if ^prxmatch(_RX1, SalesC) then
output;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.