DATA Step, Macro, Functions and more

Help understanding a data step

Reply
Frequent Learner
Posts: 1

Help understanding a data step

[ Edited ]

I am a beginner and would you explain what this code does in English?

 

data monthly;
set test.monthly_exch_mv(where=(_type_=3));
by yyyymm exchange;
format numberofFirms comma8.0;
drop _type_;
run;

Regular Contributor
Posts: 202

Re: understanding _type_ variable

data monthly: creates a temporary dataset.
set test.monthly_exch_mv(...): Reads observations having _type_ = 3
The BY-statement has no effect.
Format: applies the format comma8.0 to numberOfFirms.
Drop: the variable _type_ is not stored in thr result-dataset.
Super User
Posts: 6,632

Re: understanding _type_ variable

Within a DATA step, _TYPE_ is nothing special.  It's just the name of a variable that exists within the incoming data set TEST.MONTHLY_EXCH_MV.

 

The WHERE clause subsets which observations should be read in from that incoming data set.

 

The harder part is what you don't see here.  How did _TYPE_ get created within the data, what values does it take on, and why read in those observations where _TYPE_ is 3?  For that, you will need to do a little bit of studying (perhaps more than a little), but I can point you in the right direction.  Almost certainly, there is an earlier PROC MEANS or PROC SUMMARY that creates _TYPE_.  Look at the documentation for either of those procedures (they perform the same calculations, so it doesn't matter which one you choose).  In particular, look at the effects of adding a CLASS statement.

 

Experiment with a few PROC SUMMARY examples, to get a feel for the values of _TYPE_.  It may not be the easiest thing in the world, but it is worthwhile to spend the time.

 

Good luck.

Senior User
Posts: 1

Re: understanding _type_ variable

Posted in reply to Astounding

Yes, a special variable named _type_ is created by several SAS procedures. ALSO, a variable named _type_ can be created in a data step:

 

For example:

 

data test;

     input sales_id $

                           sales_jn

                           sales_fe

                           sales_mr;

datalines;

W6790 50 400 350

W7693 25 100 125

W1387 99 300 250

;

run;

data tot;

     set test;

           _type_ = sales_jn + sales_fe + sales_mr;

run;

data sel;

     set tot(where=(_type_=250));

run;

 

OUTPUT: test

sales_idsales_jnsales_fesales_mr
W679050400350
W769325100125
W138799300250

 

OUTPUT:  tot

sales_idsales_jnsales_fesales_mr_type_
W679050400350800
W769325100125250
W138799300250649

 

OUTPUT: sel

sales_idsales_jnsales_fesales_mr_type_
W769325100125250
Super User
Posts: 23,296

Re: understanding _type_ variable

Filters the data based on the _TYPE_ variable, but since we don't know the source data we can't comment. There are several procs that add a _TYPE_ variable. Do you know which one was used to create the input data set, test.monthly_exch_mv?

 

 

 

 

Super User
Posts: 23,296

Re: Help understanding a data step

FYI - I updated the title to help clarify your question
Frequent Contributor
Posts: 113

Re: Help understanding a data step

[ Edited ]

data monthly; /* You are creating an output data set named MONTHLY; This is a temporary data set and will be stored in the WORK library; It gets deleted once you terminate your sas session */

 

set test.monthly_exch_mv(where=(_type_=3)); /*

#1 test.monthly_exch_mv is your source data set.  You applied a filter where you want the records to be _type_=3 only

#TEST - is a permanent library which is assigned in the libname statement

A libname statement is an alias to a path or a folder location;

Once you terminate your sas session the dataset monthly_exch_mv will still be there "physically" (as stated in the location defined in your libname TEST statement)

*/

 

by yyyymm exchange; /* I presume here the variables yyyymm and exchange were pre-sorted, serves as a grouping or classification variables but does not seem to have any effect in your succeeding statements below */


format numberofFirms comma8.0; /* format statement is used because you want the value to appear in max length of 8 with commas)


drop _type_;  (you do not want to see this variable in your output dataset */
run; /* execute and termiantes the data step */

Ask a Question
Discussion stats
  • 6 replies
  • 187 views
  • 1 like
  • 6 in conversation