I am a beginner and would you explain what this code does in English?
data monthly;
set test.monthly_exch_mv(where=(_type_=3));
by yyyymm exchange;
format numberofFirms comma8.0;
drop _type_;
run;
Within a DATA step, _TYPE_ is nothing special. It's just the name of a variable that exists within the incoming data set TEST.MONTHLY_EXCH_MV.
The WHERE clause subsets which observations should be read in from that incoming data set.
The harder part is what you don't see here. How did _TYPE_ get created within the data, what values does it take on, and why read in those observations where _TYPE_ is 3? For that, you will need to do a little bit of studying (perhaps more than a little), but I can point you in the right direction. Almost certainly, there is an earlier PROC MEANS or PROC SUMMARY that creates _TYPE_. Look at the documentation for either of those procedures (they perform the same calculations, so it doesn't matter which one you choose). In particular, look at the effects of adding a CLASS statement.
Experiment with a few PROC SUMMARY examples, to get a feel for the values of _TYPE_. It may not be the easiest thing in the world, but it is worthwhile to spend the time.
Good luck.
Yes, a special variable named _type_ is created by several SAS procedures. ALSO, a variable named _type_ can be created in a data step:
For example:
data test;
input sales_id $
sales_jn
sales_fe
sales_mr;
datalines;
W6790 50 400 350
W7693 25 100 125
W1387 99 300 250
;
run;
data tot;
set test;
_type_ = sales_jn + sales_fe + sales_mr;
run;
data sel;
set tot(where=(_type_=250));
run;
OUTPUT: test
sales_id | sales_jn | sales_fe | sales_mr |
W6790 | 50 | 400 | 350 |
W7693 | 25 | 100 | 125 |
W1387 | 99 | 300 | 250 |
OUTPUT: tot
sales_id | sales_jn | sales_fe | sales_mr | _type_ |
W6790 | 50 | 400 | 350 | 800 |
W7693 | 25 | 100 | 125 | 250 |
W1387 | 99 | 300 | 250 | 649 |
OUTPUT: sel
sales_id | sales_jn | sales_fe | sales_mr | _type_ |
W7693 | 25 | 100 | 125 | 250 |
Filters the data based on the _TYPE_ variable, but since we don't know the source data we can't comment. There are several procs that add a _TYPE_ variable. Do you know which one was used to create the input data set, test.monthly_exch_mv?
data monthly; /* You are creating an output data set named MONTHLY; This is a temporary data set and will be stored in the WORK library; It gets deleted once you terminate your sas session */
set test.monthly_exch_mv(where=(_type_=3)); /*
#1 test.monthly_exch_mv is your source data set. You applied a filter where you want the records to be _type_=3 only
#TEST - is a permanent library which is assigned in the libname statement
A libname statement is an alias to a path or a folder location;
Once you terminate your sas session the dataset monthly_exch_mv will still be there "physically" (as stated in the location defined in your libname TEST statement)
*/
by yyyymm exchange; /* I presume here the variables yyyymm and exchange were pre-sorted, serves as a grouping or classification variables but does not seem to have any effect in your succeeding statements below */
format numberofFirms comma8.0; /* format statement is used because you want the value to appear in max length of 8 with commas)
drop _type_; (you do not want to see this variable in your output dataset */
run; /* execute and termiantes the data step */
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.