Hello
I have a data set that contain for each variable the distinct categories.
For example:
Variable X has 3 possible categories with values : 1,2,3
Variable W has 2 possible categories with values : 1,2
Variable Z has 4 possible categories with values : 1,2,3,4
Variable T has 2 possible categories with values : a,b
I want to add a new column that will have values of categories description.
I have proc formats where I can find for each variable+category what is the category description.
Please find sas code that perform the task correctly.
My question- Is there a better ,shorter and more useful code to perform this task?
Please note that in real word there are more variables and using the code I used can be not comfortable.
Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3
Z 4
T a
T b
;
Run;
proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;
Data part1;
set have(Where=(Var_name='X'));
category_description=put(category,$X_fmt.);
Run;
Data part2;
set have(Where=(Var_name='W'));
category_description=put(category,$W_fmt.);
Run;
/**For vairbale Z there is no format**/
Data part3;
set have(Where=(Var_name='Z'));
category_description=category;
Run;
Data part4;
set have(Where=(Var_name='T'));
category_description=put(category,$T_fmt.);
Run;
Data want;
retain Var_name category category_description;
length category_description $10.;
SET part1 part2 part3 part4;
Run;
Or try this shorter one .
Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3
Z 4
T a
T b
;
Run;
proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;
options error=0;
Data want;
SET have;
category_description=coalescec(putc(category,cats('$',Var_name,'_fmt.')),category);
Run;
Okay,
I found the better solution
Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3
Z 4
T a
T b
;
Run;
proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;
data want;
set have;
IF Var_name='X' then category_description=put(category,$X_fmt.);
Else IF Var_name='W' then category_description=put(category,$W_fmt.);
Else IF Var_name='T' then category_description=put(category,$T_fmt.);
Else IF Var_name='Z' then category_description=category;
Run;
If you are going to create a report via PROC REPORT, you can have different rows shown with different formats. Otherwise, I think your solution is the only way. https://communities.sas.com/t5/ODS-and-Base-Reporting/PROC-REPORT-different-formats-for-different-ro...
But it seems strange to me that you have a column where sometimes the variable has a value 'North', sometimes it has a value 'Yes' and sometimes it has a value 'High'. Seems like these shouldn't be in one column, as the meaning changes from row to row.
@PaigeMiller wrote:
If you are going to create a report via PROC REPORT, you can have different rows shown with different formats. Otherwise, I think your solution is the only way. https://communities.sas.com/t5/ODS-and-Base-Reporting/PROC-REPORT-different-formats-for-different-ro...
But it seems strange to me that you have a column where sometimes the variable has a value 'North', sometimes it has a value 'Yes' and sometimes it has a value 'High'. Seems like these shouldn't be in one column, as the meaning changes from row to row.
Probably because OP is using the silly requested data structure from this post: https://communities.sas.com/t5/SAS-Programming/Freq-table-for-all-variables-that-have-less-then-100-...
which specifically placed all the categories from multiple variables into the same variable.
Now, WHERE are you going with this stuff @Ronein ? Describe what the END should look like. Very likely you are creating a lot of garbage because you "picked" a solution path without having any idea of the details.
Like possibly applying formats before the steps in the "100" categories post? That approach I suggested uses the FORMATTED value of variables. So if the format is applied, either in Proc Freq or before I think you should get what this step is kludging. But then you didn't bother to provide any example data or actual desired result in the other thread.
I feel there is something missing in your description of the problem. Just based on what you share it looks to me like your data have is some sort of a control table that defines "formats" for variables.
If I'm right then please share the full story for us to provide solution approaches.
Hello,
This question is continue of my previous question.
I have a very big data set and I need to list the all possible categories for each variable in the data set.
It would be better to show also for each category value what is the value description.
Wanted data set will look like that:
@Ronein wrote:
Hello,
This question is continue of my previous question.
I have a very big data set and I need to list the all possible categories for each variable in the data set.
Time to provide an example data set from before reducingit. It does not have to have a many rows as my demonstration with SASHELP.CLASS shows or variables but it has to have some bearing to the actual problem. Such as value ranges.
The steps I use where they are using F_ named variables are the formatted values. That is only way to have a chance of mixing numeric and character variables, create character values. If the format is applied in Proc Freq (or before) the resulting F_ variables should be as needed.
@Ronein wrote:
Hello,
This question is continue of my previous question.
I have a very big data set and I need to list the all possible categories for each variable in the data set.
It would be better to show also for each category value what is the value description.
Wanted data set will look like that:
Convert your original datasets into a "TALL" dataset and summarize that. The TALL dataset could be a VIEW so you don't have store it on disk.
Example that prints the top 3 values for each variable.
data tall / view=tall;
set have ;
array _numbers _numeric_;
array _chars _character_;
length varname $32 category $32 category_description $50 ;
do over _numbers;
varname=vname(_numbers);
category=cats(_numbers);
category_description = vvalue(_numbers);
output;
end;
do over _chars ;
varname=vname(_chars );
category=cats(_chars );
category_description = vvalue(_chars );
output;
end;
keep varname category category_description ;
run;
proc freq data=tall order=freq;
tables varname*category*category_description / noprint out=counts;
run;
data want;
set counts;
by varname ;
if first.varname then order=0;
order+1;
if order <= 3;
run;
If I run it on SASHELP.CLASS I get this output.
There is a macro available that does essentially what you are asking for.
https://github.com/sasutils/macros/blob/master/dbcon.sas
Note it uses these other macros that are available from that same site: parmv.sas qlist.sas contentv.sas nobs.sas
Example:
* Load macros from GITHUB ;
data _null_;
length macro $32 ;
do macro='dbcon','parmv','nobs','qlist','contentv' ;
call execute(catx(' '
,'filename url url'
,quote(cats('https://raw.githubusercontent.com/sasutils/macros/master/',macro,'.sas'))
,'; %include url;'
));
end;
run;
* Run DBCON on SASHELP.CARS direct print out to PRINT ;
%dbcon(sashelp.cars,fname=print,printn=yes);
Results
The SAS System 13:54 Saturday, May 6, 2023 1 ~===================================================================== CARS NOBS=428 [Display limited to 10 values] ====================================================================== N CYLINDERS LEN=8 nval=8 ---------------------------------------------------------------------- 2 . 1 3 136 4 7 5 190 6 87 8 2 10 3 12 ______________________________________________________________________ N DRIVETRAIN LEN=$5 nval=3 maxlen=5 ---------------------------------------------------------------------- 92 All 226 Front 110 Rear ______________________________________________________________________ N ENGINESIZE LEN=8 nval=43 label=Engine Size (L) ---------------------------------------------------------------------- 2 1.3 1 1.4 6 1.5 10 1.6 4 1.7 ................ 2 5.6 3 5.7 6 6 1 6.8 1 8.3 ______________________________________________________________________ N HORSEPOWER LEN=8 nval=110 ---------------------------------------------------------------------- 1 73 1 93 1 100 5 103 3 104 ................ 1 420 1 450 1 477 3 493 1 500 ______________________________________________________________________ N INVOICE LEN=8 nval=425 format=DOLLAR8. ---------------------------------------------------------------------- 1 9875 $9,875 1 10107 $10,107 1 10144 $10,144 1 10319 $10,319 1 10642 $10,642 ................ 1 88324 $88,324 1 113388 $113,388 1 117854 $117,854 The SAS System 13:54 Saturday, May 6, 2023 2 1 119600 $119,600 1 173560 $173,560 ______________________________________________________________________ N LENGTH LEN=8 nval=67 label=Length (IN) ---------------------------------------------------------------------- 1 143 1 144 1 150 2 153 1 154 ................ 2 222 1 224 1 227 1 230 1 238 ______________________________________________________________________ N MAKE LEN=$13 nval=38 maxlen=13 ---------------------------------------------------------------------- 7 Acura 19 Audi 20 BMW 9 Buick 8 Cadillac ................ 11 Subaru 8 Suzuki 28 Toyota 15 Volkswagen 12 Volvo ______________________________________________________________________ N MODEL LEN=$40 nval=425 maxlen=40 ---------------------------------------------------------------------- 1 3.5 RL 4dr 1 3.5 RL w/Navigation 4dr 1 300M 4dr 1 300M Special Edition 4dr 1 325Ci 2dr ................ 1 lon2 quad coupe 2dr 1 lon3 4dr 1 lon3 quad coupe 2dr 1 xA 4dr hatch 1 xB ______________________________________________________________________ N MPG_CITY LEN=8 nval=28 label=MPG (City) ---------------------------------------------------------------------- 2 10 4 12 12 13 13 14 17 15 ................ 1 36 1 38 1 46 1 59 1 60 ______________________________________________________________________ The SAS System 13:54 Saturday, May 6, 2023 3 N MPG_HIGHWAY LEN=8 nval=33 label=MPG (Highway) ---------------------------------------------------------------------- 1 12 1 13 1 14 2 16 9 17 ................ 2 43 1 44 1 46 2 51 1 66 ______________________________________________________________________ N MSRP LEN=8 nval=410 format=DOLLAR8. ---------------------------------------------------------------------- 1 10280 $10,280 1 10539 $10,539 1 10760 $10,760 1 10995 $10,995 1 11155 $11,155 ................ 1 94820 $94,820 1 121770 $121,770 1 126670 $126,670 1 128420 $128,420 1 192465 $192,465 ______________________________________________________________________ N ORIGIN LEN=$6 nval=3 maxlen=6 ---------------------------------------------------------------------- 158 Asia 123 Europe 147 USA ______________________________________________________________________ N TYPE LEN=$8 nval=6 maxlen=6 ---------------------------------------------------------------------- 3 Hybrid 60 SUV 262 Sedan 49 Sports 24 Truck 30 Wagon ______________________________________________________________________ N WEIGHT LEN=8 nval=348 label=Weight (LBS) ---------------------------------------------------------------------- 1 1850 1 2035 1 2055 1 2085 1 2195 ................ 1 5879 1 5969 1 6133 1 6400 1 7190 ______________________________________________________________________ N WHEELBASE LEN=8 nval=40 label=Wheelbase (IN) ---------------------------------------------------------------------- The SAS System 13:54 Saturday, May 6, 2023 4 2 89 9 93 11 95 5 96 3 97 ................ 1 131 2 133 1 137 1 140 2 144 ______________________________________________________________________
Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3
Z 4
T a
T b
;
Run;
proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;
Data want;
SET have;
if Var_name='Z' then category_description=category ;
else category_description=putc(category,cats('$',Var_name,'_fmt.'));
Run;
Or try this shorter one .
Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3
Z 4
T a
T b
;
Run;
proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;
options error=0;
Data want;
SET have;
category_description=coalescec(putc(category,cats('$',Var_name,'_fmt.')),category);
Run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.