BookmarkSubscribeRSS Feed
jjb123
Obsidian | Level 7

I have firm-year level data.  I want to create person-specific indicators (not firm-specific).  However, there are multiple groups (5) that a person can be in.  In fact, most people will be in different groups for different years (although each person is only in one firm-year-group at a time).  Thus something like group fixed effects would not be adequate.  

Therefore, I think I need to create an indicator variable for each person, something like Person1, Person2, etc. as a dummy variable.  However, I have thousands of people in my data.  What is the most efficient way to code this?  I've experimented with a few loops to no avail.  

 

Thanks!

12 REPLIES 12
ChrisNZ
Tourmaline | Level 20

Did you see this when you posted your question?

 

Capture.PNG

 

The first 2 points are arguable.The last one you definitely ignored.

 

Show us input data (as code that can be run as is) and desired output.

 

jjb123
Obsidian | Level 7

My apologies.  I've attached screenshots of the beginning and desired datasets.  Here is a usable beginning dataset too:

data have;
input unique firm year group1 group2 group3 group4 group5;
datalines;1 1 2000 123 456 102 103 104
2 1 2001 345 123 103 104 105
3 1 2002 345 102 103 104 105
4 2 2000 136 137 138 345 135
5 2 2001 102 456 138 137 135
6 2 2002 867 539 986 753 135;

 

PGStats
Opal | Level 21

The vast majority of SAS statistical analysis procedures support a CLASS statement that will create the required indicator (or dummy) parameters for you based on the values of the class variables. So, all you need to identify a person is a unique name or id.

PG
jjb123
Obsidian | Level 7

Thanks for the reply.

 

It seems from this link: http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/viewer.htm#procstat_univaria... that the Class statement is only implementable for one or two variables, if I am not mistaken.


Please see the above reply with usable data to see if you think there is a way to implement Class to achieve my objective.

 

Many thanks.

PaigeMiller
Diamond | Level 26

@jjb123 wrote:

Thanks for the reply.

 

It seems from this link: http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/viewer.htm#procstat_univaria... that the Class statement is only implementable for one or two variables, if I am not mistaken.


Please see the above reply with usable data to see if you think there is a way to implement Class to achieve my objective.

 

Many thanks.


Many procedures, including most modelling procedures, create dummy variables for you by using the CLASS statement. Nothing in your original problem statement indiicates that UNIVARIATE is the procedure you need to use, and similar procedures to UNIVARIATE, such as PROC SUMMARY, have no limits on the number of class variables. (It's also unclear from your original problem statement why you think UNIVARIATE is the right thing to use here)

 

So, as far as I can see, you do not need to create dummy variables yourself, unless you are in a very unusual situation which has requirements that you have not yet articulated. And even then, SAS has built in methods that create dummy variables so you don't have to.


After viewing your .pdf files, my question becomes ... what do you want to do with these person-specific indicators? Are you going to use them in some sort of model, or some sort of statistical analysis? Can you be specific about where and how these person-specific indicators will be used after you create them?

--
Paige Miller
jjb123
Obsidian | Level 7

Paige,

 

The link I posted is about the CLASS statement for PROC UNIVARIATE.  I have also reviewed some documentation about PROC SUMMARY's CLASS statement, and it seems this is mostly used to group variables in summary commands (i.e., mean by group).  However, I have not been able to figure out how to create the dummy variables, which is my objective.  If you could articulate some of the SAS commands that create dummy variables, that would be greatly appreciated.  


JJB

PaigeMiller
Diamond | Level 26

Sorry, but you don't just create dummy variables so that now you have dummy variables. The method of creating them depends on how the analysis that you are planning. What analysis? Will they be used in a model? Will they be used in some other analysis? Please be specific.

--
Paige Miller
jjb123
Obsidian | Level 7

Apologies, but the usage should not matter for the creation of the dummy variables.  I am not asking for code to use the variables in any type of analysis.  I simply want N number of variables (N being the number of unique people) with 1s and 0s indicating whether that person exists in that firm-year observation in any group. I will be using them for various regression analyses.

PaigeMiller
Diamond | Level 26

@jjb123 wrote:

... the usage should not matter for the creation of the dummy variables. 


I disagree. However, your data will require some pre-processing before it can be used in a regression. I can give you some idea of how to do that later today.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Here's one way to use built-in SAS procedure that compute dummy variables for regression. Maybe there are better/faster ways.

 

data have;
    input unique firm year group1 group2 group3 group4 group5;
    array g group1-group5;
    seq=_n_;
    do i=1 to dim(g);
	value=g(i);
	output;
    end;
    drop i group1-group5;
datalines;
1 1 2000 123 456 102 103 104
2 1 2001 345 123 103 104 105
3 1 2002 345 102 103 104 105
4 2 2000 136 137 138 345 135
5 2 2001 102 456 138 137 135
6 2 2002 867 539 986 753 135
;

proc glmmod data=have outdesign=have1 outparm=names;
	class value;
	model seq=value;
run;
quit;
proc summary data=have1 nway;
	class seq;
	var col:;
	output out=have2 sum=;
run;

The data set HAVE2 will have your dummy variables named COL2-COL16, you can match up the dummy variables with the original variables UNIQUE FIRM and YEAR using the variable SEQ.

 

Also, the labels of COL2-COL16 indicate which value it is a dummy variable for, and any analysis you do will show the labels. I suppose you could do a big rename so COL2-COL16 get renamed to CLASS102, CLASS103, etc. if you want, using the NAMES data set that gets created.

--
Paige Miller
jjb123
Obsidian | Level 7

PM,

 

I really appreciate the code, and it works great.  However, the techniques seem to be very, very computationally demanding (as you alluded to when you said "Maybe there are better/faster ways").  Do you have any recommendations to speed up the process? Or should I post another question?

 

Thanks

jjb123
Obsidian | Level 7

Hello again PG,

 

Could you provide some sample code using a CLASS statement with the example data I've given above?  Thanks.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3547 views
  • 0 likes
  • 4 in conversation