- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Background info on the assignment:
STAT 325 HOMEWORK 6
sql
Data description: Kroger warehouse, which has two hubs, supplies 5 supermarkets in local San Diego.
You may want to analyze the sale and logistic record of 2014 by using “proc sql” in SAS.
market.csv:
Column 1: Item ID;
Column 2: Item Name;
Column 3: Unit Price;
Column 4: Sale Amount;
Column 5: Store Branch;
warehouse.csv:
Column 1: Item ID;
Column 2: Item Name;
Column 3: Warehouse Hub Name;
Column 4: Store Branch;
Code:
/*Question 1*/
data market;
length item $20. branch $20. id $6. price $2. amount $15.;
infile '/home/u62108616/sasuser.v94/S325/market.csv' dsd truncover;
format price DOLLAR1.2 amount COMMA15.0;
run;
proc sql;
select*
from market;
quit;
Error Messages/Warnings:
NOTE 484-185: Format $DOLLAR was not found or could not be loaded.
ERROR 230-185: Invalid character format/informat decimal specified.
73 ! format price DOLLAR1.2 amount COMMA15.0;
_________
484
NOTE 484-185: Format $COMMA was not found or could not be loaded.
74 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.MARKET may be incomplete. When this step was stopped there were 0 observations and 5 variables.
WARNING: Data set WORK.MARKET was not replaced because this step was stopped.
Question 1: Read in the market.csv data file, apply the dollar w.d format and comma separated format to the variables
of price and amount
What am I doing wrong here and how should I fix it?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The LENGTH statement has NOTHING to do with FORMATS (of INFORMATS). It is for telling SAS what type of variable you want to create and how much space it takes to store it in the dataset.
Do not add periods to the lengths. Do not tell SAS to store only the first 3 bytes of the 8 bytes needed to represent a floating point number. Remember that SAS only has two types of variables: fixed length character strings and 64bit floating point numbers.
length item $20 branch $20 id $6 price amount 8;
It also looks like your input statement does not match the data in the text file.
It could be the CSV is corrupt or mall formed. Why would the ID values have periods in them?
As to why the AMOUNT variable is missing we probably need to see the LOG (please post the text of the log and not another photograph) and perhaps also the text of some of the lines from the CSV file. If you cannot figure out how to view the text of the CSV file then you can add the LIST statement to your SAS data step and SAS will display the lines read from the CSV file into the SAS log.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You have created variable AMOUNT as a character variable. It must be numeric in order to use the COMMA format.
By the way, there's really no reason for AMOUNT to be character, it really is numeric.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your LENGTH statement made AMOUNT a character variable. Change its length to 8 (no dollar sign). Same applies to other variables such as PRICE.
Also note, you didn't input any variables from the data. You should probably add an INPUT statement to the first DATA step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Current Output vs Intended Output:
Updated Code:
/*Question 1*/
data market;
length item $20. branch $20. id $6. price 3. amount 8.;
infile '/home/u62108616/sasuser.v94/S325/market.csv' dsd truncover;
input item $ branch $ id $ price amount;
format price DOLLAR10.2 amount COMMA15.0;
run;
proc sql;
select*
from market;
quit;
I know question 1 doesnt correlate to any of the tables produced in the intended output, however, it seems like my variables are still a bit messed up
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The LENGTH statement sets the storage length of variables. For numeric variables, leave it at 8, or you will lose precision unnecessarily.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also, your INPUT statement does not reflect the order of variables in the assignment. id should be last.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The LENGTH statement has NOTHING to do with FORMATS (of INFORMATS). It is for telling SAS what type of variable you want to create and how much space it takes to store it in the dataset.
Do not add periods to the lengths. Do not tell SAS to store only the first 3 bytes of the 8 bytes needed to represent a floating point number. Remember that SAS only has two types of variables: fixed length character strings and 64bit floating point numbers.
length item $20 branch $20 id $6 price amount 8;
It also looks like your input statement does not match the data in the text file.
It could be the CSV is corrupt or mall formed. Why would the ID values have periods in them?
As to why the AMOUNT variable is missing we probably need to see the LOG (please post the text of the log and not another photograph) and perhaps also the text of some of the lines from the CSV file. If you cannot figure out how to view the text of the CSV file then you can add the LIST statement to your SAS data step and SAS will display the lines read from the CSV file into the SAS log.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is good to use the LENGTH statement to explicitly define the variables. But you should not define the numeric variables as character. Also you do not need to include a decimal point when defining the length. The length can only be an integer value so no decimal point is needed.
length item $20 branch $20 id $6 price amount 8;
You still need to tell SAS to actually read the variables from the lines of the CSV file by adding an INPUT statement.
You cannot have a display width that has more decimal places than the total width. Fix the width used on the DOLLAR format specification. Make sure to take into account room for the decimal point (if requested), the $ character and any thousands block separators (commas).
data market;
length item $20 branch $20 id $6 price amount 8;
infile '/home/u62108616/sasuser.v94/S325/market.csv' dsd truncover;
input item branch id price amount;
format price DOLLAR12.2 amount COMMA15.0;
run;
No need to use SQL to print data. Just use PROC PRINT.
proc print data=market;
run;
Also does the CSV files have a header line or not? If so you will want to add the FIRSTOBS=2 option to the INFILE statement to tell it to skip the header line.