BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mgilbert
Obsidian | Level 7

Working on a data set that has a mix of character and numeric variables. The character variables have levels (e.g. 'Gender' might take on F or M). What I'm trying to do is write a macro that will automatically create flag variables in the parent data set based on the character variable levels. These flag variables would follow a standard naming convention (e.g. 'Gender_F', 'Gender_M') and populate under the conventional 1 = T, 0 = F.

 

Where I'm getting stuck is getting SAS to extract and store the value of each level to later be brought in for the variable suffix. This thread helped me with understanding the count of levels, just not what the levels actually are. Is there an obvious option in PROC FREQ I'm missing?

 

Using cars as an example:

 

*	Sample Code;
*	Last updated: 2015-10-28 by MJG;

*	See the levels each character variable takes on;
*	Type has six levels;
proc freq data = sashelp.cars;
	tables _character_;
run; quit;

*	Count of levels;
*	Type has six levels;
ods trace on;
proc freq data = sashelp.cars
	(keep = _character_) nlevels;
	ods	exclude onewayfreqs;
	ods	output
		nlevels = char_levels
		(keep = tablevar nlevels
		rename = (tablevar = name nlevels = n_levels));
run; quit;
ods trace off;

*	Create new variables based on levels;
data cars;
	set sashelp.cars;
	if (Type = "Hybrid") then Type_Hybrid = 1; else Type_Hybrid = 0;
	if (Type = "SUV") then Type_SUV = 1; else Type_SUV = 0;
	if (Type = "Sedan") then Type_Sedan = 1; else Type_Sedan = 0;
	if (Type = "Sports") then Type_Sports = 1; else Type_Sports = 0;
	if (Type = "Truck") then Type_Truck = 1; else Type_Truck = 0;
	if (Type = "Wagon") then Type_Wagon = 1; else Type_Wagon = 0;
run; quit;

Does such an option exist in PROC FREQ? The printed output from the basic (first) PROC FREQ contains the values of the levels I want to extract, while the second one captures the count of levels. Using the OUT= in option does give me a data set with the levels, but it's only of the last table in PROC FREQ...

 

The end goal is to run a macro that would automatically create the flag variables above in the parent data set. Any thoughts are much appreciated!

 

Michael

 

1 ACCEPTED SOLUTION

Accepted Solutions
dcruik
Lapis Lazuli | Level 10

I think I understand what you're trying to accomplish from your descriptions, but I'm not sure if this is something like you are looking for.  In the cars example you provided, the following code will give you binary flag variables for every level of the variable Type.  Hope this helps, otherwise I would suggest providing some example data of what you have and what the output you desire should look like.

 

data cars;
input ID Type$;
datalines;
1 Hybrid
2 Hybrid
3 SUV
4 Sedan
5 Sedan
6 Sedan
7 Sports
8 Sports
9 Truck
10 Truck
11 Wagon
12 Wagon
13 Wagon
14 Wagon
;
run;

proc sql;
create table car_levels as
select distinct Type
from cars;
quit;

%macro levels;
data _NULL_;
set car_levels end=lastobs;
call symputx(cats("Type",_N_),Type);
If lastobs then call symputx("n",_N_);
run;

data cars_final;
set cars;
%do i=1 %to &n;
	If Type="&&Type&i" then Type_&&Type&i=1;
		Else Type_&&Type&i=0;
%end;
run;
%mend;

%levels;

View solution in original post

8 REPLIES 8
mgilbert
Obsidian | Level 7

I had not, and did not end up going that route, but thank you for sharing that procedure. Learning a lot, and these boards are awesome!

 

Michael

dcruik
Lapis Lazuli | Level 10

I think I understand what you're trying to accomplish from your descriptions, but I'm not sure if this is something like you are looking for.  In the cars example you provided, the following code will give you binary flag variables for every level of the variable Type.  Hope this helps, otherwise I would suggest providing some example data of what you have and what the output you desire should look like.

 

data cars;
input ID Type$;
datalines;
1 Hybrid
2 Hybrid
3 SUV
4 Sedan
5 Sedan
6 Sedan
7 Sports
8 Sports
9 Truck
10 Truck
11 Wagon
12 Wagon
13 Wagon
14 Wagon
;
run;

proc sql;
create table car_levels as
select distinct Type
from cars;
quit;

%macro levels;
data _NULL_;
set car_levels end=lastobs;
call symputx(cats("Type",_N_),Type);
If lastobs then call symputx("n",_N_);
run;

data cars_final;
set cars;
%do i=1 %to &n;
	If Type="&&Type&i" then Type_&&Type&i=1;
		Else Type_&&Type&i=0;
%end;
run;
%mend;

%levels;
mgilbert
Obsidian | Level 7

Edit: Updated the code, have been able to solve the &n issue, but now getting a different error.

 

Thanks very much for the help. The code I am using is below, and having one last issue I can't seem to resolve. Tried putting quotes around the &&&varname._n (from here), but that didn't seem to help.

 

Say my data set has 5 categorical variables, each with varying levels. Car_Use has 2 levels, Education has 5 levels, and Job_Type has 8 levels. When I run the code below, it initially gives me the error below, though running a second time (back-to-back) the error does not occur.

 

%let data_og = ABC;
%let contents = &data_og._contents;
%let response = TARGET_WINS;
%let key = INDEX;

*	Create contents data set;
proc contents data = &data_og. out = &contents._c;
run; quit;

*	Drop unnecessary variables gained from PROC CONTENTS;
*	Drop response, key, and character variables;
data &contents._c;
	set &contents._c(keep = name type length varnum format formatl
		informat informl just npos nobs);
		if name = "&response." then delete;
			else if name = "&key." then delete;
		if type = "1" then delete;
run; quit;

*	Macro for PROC FREQ;
%macro freq(varname);
	proc freq data = &data_og. noprint;
		tables &varname. / out = &varname. (drop = COUNT PERCENT);
	run; quit;
%mend;

*	Macro to store vars globally & merge into data set;
%macro catmerge(varname);
	data _null_;
		set &varname. end = lastobs;
			call symputx(cats("&varname.", _N_), &varname.);
			if lastobs then call symputx("&varname._n", _N_);
	run;
	
	data &data_og.;
		set &data_og.;
			%do i = 1 %to &&&varname._n;
				&varname._&&&varname.&i = (&varname. = "&&&varname.&i");
			%end;
	run; quit;
%mend;

*	Create the variables;
data _null_;
	do i = 1 to num;
		set &contents._c nobs = num;
			call execute('%freq('||name||')');
			call execute('%catmerge('||name||')');
	end;
run; quit;

So I'm using a PROC CONTENTS to extract the categorical variable names, then using those names to loop over the two macros. Here's the error I get if I run the code above:

 

WARNING: Apparent symbolic reference CAR_TYPE_N not resolved.
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: 
       "&&&varname._n" 
ERROR: The %TO value of the %DO I loop is invalid.
ERROR: The macro CATMERGE will stop executing.

Just strange to me that if I run it a first time, an error halts the code, but a second time it's fine. Tried splitting the second macro into two (for a total of 3), but that also resulted in the same error. Apologies for the lengthy post!

 

Michael

 

 
mgilbert
Obsidian | Level 7

Just wanted to give an update, I got everything to run. The trick was to split the macro, and then a second separate data step (the first has two macros, the second has one). Thank you to all who gave suggestions here, lot of good info!

 

Michael

harunbi
Calcite | Level 5

Do you have the code that you said worked for you?

Please let me know.

Thanks

mgilbert
Obsidian | Level 7

Yes, it's part of some larger SAS EDA code I wrote here. The macros are at the top of the code, and the categorical variables are handled at line 666.

 

Michael

ballardw
Super User

Instead of

if (Type = "Hybrid") then Type_Hybrid = 1; else Type_Hybrid = 0;

 

Type_Hybrid = (Type="Hybrid");

 

SAS by resolves logic tests as 1 for true and 0 for false. So the () around a comparison will return the logic value.

NOTE: If you do not want this type of assignment when the compared value is missing then you need to ensure the value isn't missing as missing will always be false in this case.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 4093 views
  • 3 likes
  • 5 in conversation