BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I have a data set that contain for  each variable the distinct categories.

For example:

Variable X has 3 possible categories with values : 1,2,3

Variable W has 2 possible categories with values : 1,2

Variable Z has 4 possible categories with values : 1,2,3,4

Variable T has 2 possible categories with values : a,b

 

I want to add a  new column that will have values of categories description.

I have proc formats where I can find for each variable+category what  is the category description.

Please find sas code that perform the task correctly.

My question- Is there a better  ,shorter  and more useful code to perform this task?

Please note that in real word there are more variables and using  the code I used can be not comfortable.

 

Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3 
Z 4
T a
T b
;
Run;

proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;

Data part1;
set have(Where=(Var_name='X'));
category_description=put(category,$X_fmt.);
Run;

Data part2;
set have(Where=(Var_name='W'));
category_description=put(category,$W_fmt.);
Run;

/**For vairbale Z there is no format**/
Data part3;
set have(Where=(Var_name='Z'));
category_description=category;
Run;
Data part4;
set have(Where=(Var_name='T'));
category_description=put(category,$T_fmt.);
Run;

Data want;
retain  Var_name category category_description;
length category_description $10.;
SET part1  part2  part3  part4;
Run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Or try this shorter one .

 

Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3 
Z 4
T a
T b
;
Run;

proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;

options error=0;
Data want;
SET have;
category_description=coalescec(putc(category,cats('$',Var_name,'_fmt.')),category);
Run;

View solution in original post

10 REPLIES 10
Ronein
Meteorite | Level 14

Okay,

I found the better solution

Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3 
Z 4
T a
T b
;
Run;

proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;

data want;
set have;
IF  Var_name='X' then category_description=put(category,$X_fmt.);
Else IF  Var_name='W' then category_description=put(category,$W_fmt.);
Else IF  Var_name='T' then category_description=put(category,$T_fmt.);
Else IF  Var_name='Z' then category_description=category;
Run;

PaigeMiller
Diamond | Level 26

If you are going to create a report via PROC REPORT, you can have different rows shown with different formats. Otherwise, I think your solution is the only way. https://communities.sas.com/t5/ODS-and-Base-Reporting/PROC-REPORT-different-formats-for-different-ro...

 

But it seems strange to me that you have a column where sometimes the variable has a value 'North', sometimes it has a value 'Yes' and sometimes it has a value 'High'. Seems like these shouldn't be in one column, as the meaning changes from row to row.

--
Paige Miller
ballardw
Super User

@PaigeMiller wrote:

If you are going to create a report via PROC REPORT, you can have different rows shown with different formats. Otherwise, I think your solution is the only way. https://communities.sas.com/t5/ODS-and-Base-Reporting/PROC-REPORT-different-formats-for-different-ro...

 

But it seems strange to me that you have a column where sometimes the variable has a value 'North', sometimes it has a value 'Yes' and sometimes it has a value 'High'. Seems like these shouldn't be in one column, as the meaning changes from row to row.


Probably because OP is using the silly requested data structure from this post: https://communities.sas.com/t5/SAS-Programming/Freq-table-for-all-variables-that-have-less-then-100-...

which specifically placed all the categories from multiple variables into the same variable.

 

Now, WHERE are you going with this stuff @Ronein ? Describe what the END should look like. Very likely you are creating a lot of garbage because you "picked" a solution path without having any idea of the details.

Like possibly applying formats before the steps in the "100" categories post? That approach I suggested uses the FORMATTED value of variables. So if the format is applied, either in Proc Freq or before I think you should get what this step is kludging. But then you didn't bother to provide any example data or actual desired result in the other thread.

Patrick
Opal | Level 21

I feel there is something missing in your description of the problem. Just based on what you share it looks to me like your data have is some sort of a control table that defines "formats" for variables. 

If I'm right then please share the full story for us to provide solution approaches.

Ronein
Meteorite | Level 14

Hello,

This question is continue of my previous question.

I have a very big data set and I need to list the all possible categories for each variable in the data set.

It would be better to show also for each category value what is the value description.

Wanted data set will look like that:

Ronein_0-1683375158006.png

 

ballardw
Super User

@Ronein wrote:

Hello,

This question is continue of my previous question.

I have a very big data set and I need to list the all possible categories for each variable in the data set.

 


Time to provide an example data set from before reducingit. It does not have to have a many rows as my demonstration with SASHELP.CLASS shows or variables but it has to have some bearing to the actual problem. Such as value ranges.

 

The steps I use where they are using F_ named variables are the formatted values. That is only way to have a chance of mixing numeric and character variables, create character values. If the format is applied in Proc Freq (or before) the resulting F_ variables should be as needed.

 

Tom
Super User Tom
Super User

@Ronein wrote:

Hello,

This question is continue of my previous question.

I have a very big data set and I need to list the all possible categories for each variable in the data set.

It would be better to show also for each category value what is the value description.

Wanted data set will look like that:

Ronein_0-1683375158006.png

 


Convert your original datasets into a "TALL" dataset and summarize that. The TALL dataset could be a VIEW so you don't have store it on disk.

 

Example that prints the top 3 values for each variable.

data tall / view=tall;
   set have ;
   array _numbers _numeric_;
   array _chars _character_;
   length varname $32 category $32 category_description $50 ;
   do over _numbers;
      varname=vname(_numbers);
      category=cats(_numbers);
      category_description = vvalue(_numbers);
      output;
   end;
   do over _chars ;
      varname=vname(_chars );
      category=cats(_chars );
      category_description = vvalue(_chars );
      output;
   end;
   keep varname category category_description ;
run;

proc freq data=tall order=freq;
  tables varname*category*category_description / noprint out=counts;
run;

data want;
  set counts;
  by varname ;
  if first.varname then order=0;
  order+1;
  if order <= 3;
run;

If I run it on SASHELP.CLASS I get this output.

Tom_0-1683391467232.png

 

 

Tom
Super User Tom
Super User

There is a macro available that does essentially what you are asking for.

https://github.com/sasutils/macros/blob/master/dbcon.sas

 

Note it uses these other macros that are available from that same site:  parmv.sas qlist.sas contentv.sas nobs.sas

 

Example:

* Load macros from GITHUB ;
data _null_;
  length macro $32 ;
  do macro='dbcon','parmv','nobs','qlist','contentv' ;
    call execute(catx(' '
      ,'filename url url'
      ,quote(cats('https://raw.githubusercontent.com/sasutils/macros/master/',macro,'.sas'))
      ,'; %include url;'
    ));
  end;
run;

* Run DBCON on SASHELP.CARS direct print out to PRINT ;
%dbcon(sashelp.cars,fname=print,printn=yes);

Results

The SAS System                                                                                       13:54 Saturday, May 6, 2023   1
~=====================================================================
CARS NOBS=428 [Display limited to 10 values]
======================================================================
  N CYLINDERS LEN=8 nval=8
----------------------------------------------------------------------
  2            .
  1            3
136            4
  7            5
190            6
 87            8
  2           10
  3           12
______________________________________________________________________
  N DRIVETRAIN LEN=$5 nval=3 maxlen=5
----------------------------------------------------------------------
 92 All
226 Front
110 Rear
______________________________________________________________________
  N ENGINESIZE LEN=8 nval=43 label=Engine Size (L)
----------------------------------------------------------------------
  2          1.3
  1          1.4
  6          1.5
 10          1.6
  4          1.7
................
  2          5.6
  3          5.7
  6            6
  1          6.8
  1          8.3
______________________________________________________________________
  N HORSEPOWER LEN=8 nval=110
----------------------------------------------------------------------
  1           73
  1           93
  1          100
  5          103
  3          104
................
  1          420
  1          450
  1          477
  3          493
  1          500
______________________________________________________________________
  N INVOICE LEN=8 nval=425 format=DOLLAR8.
----------------------------------------------------------------------
  1         9875 $9,875
  1        10107 $10,107
  1        10144 $10,144
  1        10319 $10,319
  1        10642 $10,642
................
  1        88324 $88,324
  1       113388 $113,388
  1       117854 $117,854

The SAS System                                                                                       13:54 Saturday, May 6, 2023   2
  1       119600 $119,600
  1       173560 $173,560
______________________________________________________________________
  N LENGTH LEN=8 nval=67 label=Length (IN)
----------------------------------------------------------------------
  1          143
  1          144
  1          150
  2          153
  1          154
................
  2          222
  1          224
  1          227
  1          230
  1          238
______________________________________________________________________
  N MAKE LEN=$13 nval=38 maxlen=13
----------------------------------------------------------------------
  7 Acura
 19 Audi
 20 BMW
  9 Buick
  8 Cadillac
................
 11 Subaru
  8 Suzuki
 28 Toyota
 15 Volkswagen
 12 Volvo
______________________________________________________________________
  N MODEL LEN=$40 nval=425 maxlen=40
----------------------------------------------------------------------
  1 3.5 RL 4dr
  1 3.5 RL w/Navigation 4dr
  1 300M 4dr
  1 300M Special Edition 4dr
  1 325Ci 2dr
................
  1 lon2 quad coupe 2dr
  1 lon3 4dr
  1 lon3 quad coupe 2dr
  1 xA 4dr hatch
  1 xB
______________________________________________________________________
  N MPG_CITY LEN=8 nval=28 label=MPG (City)
----------------------------------------------------------------------
  2           10
  4           12
 12           13
 13           14
 17           15
................
  1           36
  1           38
  1           46
  1           59
  1           60
______________________________________________________________________

The SAS System                                                                                       13:54 Saturday, May 6, 2023   3
  N MPG_HIGHWAY LEN=8 nval=33 label=MPG (Highway)
----------------------------------------------------------------------
  1           12
  1           13
  1           14
  2           16
  9           17
................
  2           43
  1           44
  1           46
  2           51
  1           66
______________________________________________________________________
  N MSRP LEN=8 nval=410 format=DOLLAR8.
----------------------------------------------------------------------
  1        10280 $10,280
  1        10539 $10,539
  1        10760 $10,760
  1        10995 $10,995
  1        11155 $11,155
................
  1        94820 $94,820
  1       121770 $121,770
  1       126670 $126,670
  1       128420 $128,420
  1       192465 $192,465
______________________________________________________________________
  N ORIGIN LEN=$6 nval=3 maxlen=6
----------------------------------------------------------------------
158 Asia
123 Europe
147 USA
______________________________________________________________________
  N TYPE LEN=$8 nval=6 maxlen=6
----------------------------------------------------------------------
  3 Hybrid
 60 SUV
262 Sedan
 49 Sports
 24 Truck
 30 Wagon
______________________________________________________________________
  N WEIGHT LEN=8 nval=348 label=Weight (LBS)
----------------------------------------------------------------------
  1         1850
  1         2035
  1         2055
  1         2085
  1         2195
................
  1         5879
  1         5969
  1         6133
  1         6400
  1         7190
______________________________________________________________________
  N WHEELBASE LEN=8 nval=40 label=Wheelbase (IN)
----------------------------------------------------------------------

The SAS System                                                                                       13:54 Saturday, May 6, 2023   4
  2           89
  9           93
 11           95
  5           96
  3           97
................
  1          131
  2          133
  1          137
  1          140
  2          144
______________________________________________________________________
Ksharp
Super User
Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3 
Z 4
T a
T b
;
Run;

proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;

Data want;
SET have;
if Var_name='Z' then category_description=category ;
 else category_description=putc(category,cats('$',Var_name,'_fmt.'));
Run;
Ksharp
Super User

Or try this shorter one .

 

Data have;
input Var_name $ category $;
cards;
X 1
X 2
X 3
W 1
W 2
Z 1
Z 2
Z 3 
Z 4
T a
T b
;
Run;

proc format;
value $X_Fmt
'1'='North'
'2'='South'
'3'='Centre'
;
value $W_Fmt
'1'='Yes'
'2'='No'
;
value $T_Fmt
'a'='High'
'b'='Low'
;
Run;

options error=0;
Data want;
SET have;
category_description=coalescec(putc(category,cats('$',Var_name,'_fmt.')),category);
Run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1656 views
  • 4 likes
  • 6 in conversation