Hello
I want to create a report from data set SASHELP.CARS.
Task1-I want to add a new column to the raw data (sashelp.cars) with random date between 01.01.2021 and 05.01.2021
Task2- I want to create a summary report with following statistics calculations:
SUM of MSRP
Number of rows(actually it is number of transactions)
Number of distinct Cylinders
These calculations will be done for each category of following variables:
Type
Origin
EngineSize
The calculation will be done for each date .
So in the report will have 17 columns:
var name
category name
SUM of MSRP (for each date)
Number of rows (for each date)
Number of distinct Cylinders (for each date)
What is the way to create the desired task please?
proc format;
value $ typeF
'Sedan','Hybrid','SUV'='Type_a'
'Sports'='Type_b'
'Truck','Wagon'='Type_c'
;
run;
proc format;
value $ OriginF
'Asia','Europe'='Asia,Europe'
'USA'='USA'
;
run;
proc format ;
value EngineSizeF
0-1.3='(a) till 1.3'
1.3-1.6='(b) 1.3-1.6'
1.6-2.0'=(c) 1.6-2.0'
2.0-high='(d) 2.0+';
run;
I know that can be done with proc report but my question was to see a code that do it and also to see a code that add dates column to raw data with random allocation of dates between 01/01/2021 and 05/01/2021
Well, here is some code for getting the random dates. Do you know how to use proc report or proc tabulate?
@Ronein wrote:
Thank you!
A few questions please:
1- I see that rand() is not working and then I changed it
2-I want to check the dates in order to verify that I get dates between 01/01/2021 and 05.01.2021
I expect to see 5 dates only!!
Why do I get much more???
%let start=01/01/2021;
%let end=05/01/2021;
Data _null_;
start_SAS_date=input("&start",ddmmyy10.);
end_SAS_date=input("&end",ddmmyy10.);
dif=end_SAS_date-start_SAS_date;
call symput('start',put(start_SAS_date,best.));
call symput('dif',put(dif,best.));
run;
%put &start;
%put &dif;
Data RawDataTbl;
set SASHELP.cars;
format date_new ddmmyy10.;
date_new=&start+rand('uniform')*&dif;
Run;
proc sort data=RawDataTbl(keep=date_new)nodup;
by date_new;
run;
PROC SQL;
create table check_dates as
select distinct date_new
from RawDataTbl
;
QUIT;
As I said, you just need to generate random months. The year you want is always 2021 and the day of the month is always 1, these are not random.
So, if you have generated random month numbers 1 through 5, then you want
date_new = mdy(random_month,1,2021);
Add dates at random between Jan 2021 and May 2021 (from your example, the dates are always the first of the month, and the year is always 2021, so the only random thing would be the month)
Assign a new variable containing random integers 1 through 5 to each observation indicating the random month, then turn the random integer into a day/month/year.
@Ronein wrote:
I know that can be done with proc report but my question was to see a code that do it and also to see a code that add dates column to raw data with random allocation of dates between 01/01/2021 and 05/01/2021
Since you have 3 separate variables in the same column I don't think that Proc Report will be your friend.
Tabulate would have the date as the first Column dimension as a class variable with the other 3 nested inside:
table type origin enginesize,
datevar *(msrp*sum n <maybe you don't define what Nr_rows means>
<you also do not define what nr_distinct_cylinders is supposed to be>)
However If you intending what I think you are intending for the distinct cylinders you may be out of luck as that sort of summary likely needs to be done prior to the Tabulate or Report procedure as there is no "distinct values" statistic in Proc Tabulate or Report .
Are you expecting Proc Report or Tabulated to do a random column selection???
Adding values like dates belongs before any report procedure.
Hi: When you get the dates into the data, you still have a basic assumption problem.
You show 3 different variables and their values in the first 2 columns on the report. This is the type of thing that TABULATE will do very nicely. But PROC REPORT would NOT do this unless you restructure the data. So what you want to do, aside from the random dates, seems more of a TABULATE report. And I'm not sure how you would calculate number of distinct cylinders. It seems to me that number of distinct cylinders for TYPE will be different than number of distinct cylinders for ENGINESIZE and both of those will be different than number of distinct cylinders for ORIGIN. So that column doesn't seem to be something that you would be able to get in just one variable.
Cynthia
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.