BookmarkSubscribeRSS Feed
anaisrossetto
Calcite | Level 5

Hi all,

I have data in a table and I wish to exctract unique modality per variable for every variables in my table (exclude the ID).

The idea is to add a column to the output I get from proc print (variable name, type, len, pos, format, informat, Label). The column would be unique value.

For example for the variable sexe, it will be men, women;

for smoker, it will be : yes, no;

size: small, medium, large, ....

I can have the information thanks to proc freq but I have to write the result manually.

I have the information for one variable with :

proc SQL;
select DISTINCT variable_name
FROM lib.table;
quit;

But only one variable at a time

Anyone to help me please 🙂

 

12 REPLIES 12
FreelanceReinh
Jade | Level 19

Hello @anaisrossetto and welcome to the SAS Support Communities!

 


@anaisrossetto wrote:

proc SQL;
select DISTINCT variable_name
FROM lib.table;
quit;

But only one variable at a time


With DOSUBL you can call PROC SQL during DATA step execution. For large datasets this is not really efficient, but for datasets of small to moderate size it can be done:

/* Create test data for demonstration */

data have;
set sashelp.heart;
keep smoking _char_;
run;

/* Retrieve selected metadata */

proc contents data=have out=cnt(keep=name type length varnum /* format informat */ label) noprint;
run;

/* Add lists of unique variable values */

data want;
set cnt;
length VALUES $300;
_n_=dosubl('proc sql noprint; select distinct '||name||' into :vlist separated by ", " from have; quit;');
values=symget('vlist');
run;

proc print data=want noobs;
run;

Result:

NAME              TYPE    LENGTH    VARNUM    LABEL                    VALUES

BP_Status           2        7         6      Blood Pressure Status    High, Normal, Optimal
Chol_Status         2       10         5      Cholesterol Status       , Borderline, Desirable, High
DeathCause          2       26         2      Cause of Death           , Cancer, Cerebral Vascular Disease, Coronary Heart Disease, Other, Unknown
Sex                 2        6         3                               Female, Male
Smoking             1        8         4                               ., 0, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60
Smoking_Status      2       17         8      Smoking Status           , Heavy (16-25), Light (1-5), Moderate (6-15), Non-smoker, Very Heavy (> 25)
Status              2        5         1                               Alive, Dead
Weight_Status       2       11         7      Weight Status            , Normal, Overweight, Underweight

(Note that a leading comma in variable VALUES indicates a missing value of the corresponding [character] variable in dataset HAVE.)

anaisrossetto
Calcite | Level 5

Excellent! It is exactly what I am looking for, thank you very much 🙂

Do you know if there is a possibility to exclude 2 variables (i have the id and another variable with too many modalities. I don't want to list every modalities for these variables) 

By this I mean excluding them in the variable list we want to get the modality but still having them on the table?

I've done it in multiple step, so I wonder if a shorter solution exists. 

 

Another question, if you can give me some advice please. I am a beginner in SAS and I've tried to read the documentation of dosubl function but it is still difficult to understand. 

Any good advice from your experience to learn SAS? 

Thanks a lot 🙂 

 

FreelanceReinh
Jade | Level 19

Glad to read that my suggestion helped you.

 

A simple way to exclude a few variables (from the listing of distinct values) is to name them in an IF condition. For example, suppose that Height and Weight (from SASHELP.HEART) were included in our dataset HAVE. Then we could write:

data want;
set cnt;
length VALUES $300;
if upcase(name) not in ('HEIGHT' 'WEIGHT') then do;
  _n_=dosubl('proc sql noprint; select distinct '||name||' into :vlist separated by ", " from have; quit;');
  values=symget('vlist');
end;
else values='[too many to list]';
run;

A more elegant way, avoiding hardcoded variable names, would be to determine the number of distinct values for each variable in a preliminary step and then decide based on these numbers whether or not the distinct values of a variable should be listed. This can be done with the NLEVELS option of PROC FREQ:

/* Create test data for demonstration */

data have;
set sashelp.heart;
keep height weight smoking _char_;
run;

/* Retrieve selected metadata */

proc contents data=have out=cnt(keep=name type length varnum /* format informat */ label) noprint;
run;

/* Count distinct values of each variable */

ods select none;
ods output nlevels=nlev;
proc freq data=have nlevels;
tables _all_;
run;
ods select all;

/* Add these counts to the metadata */

proc sql;
create table cnt2 as
select c.*, NLevels from cnt c left join nlev f
on c.name=f.TableVar
order by name;
quit;

/* Add lists of unique variable values or, if there are "too many" (here: >10), just their number */

data want(drop=nlevels);
set cnt2;
length VALUES $300;
if nlevels<=10 /* or some other threshold */ then do;
  _n_=dosubl('proc sql noprint; select distinct '||name||' into :vlist separated by ", " from have; quit;');
  values=symget('vlist');
end;
else values=catx(' ', '[', nlevels,'distinct values ]');
run;

proc print data=want noobs;
run;

Result:

NAME              TYPE    LENGTH    VARNUM    LABEL                    VALUES

BP_Status           2        7         8      Blood Pressure Status    High, Normal, Optimal
Chol_Status         2       10         7      Cholesterol Status       , Borderline, Desirable, High
DeathCause          2       26         2      Cause of Death           , Cancer, Cerebral Vascular Disease, Coronary Heart Disease, Other, Unknown
Height              1        8         4                               [ 88 distinct values ]
Sex                 2        6         3                               Female, Male
Smoking             1        8         6                               [ 15 distinct values ]
Smoking_Status      2       17        10      Smoking Status           , Heavy (16-25), Light (1-5), Moderate (6-15), Non-smoker, Very Heavy (> 25)
Status              2        5         1                               Alive, Dead
Weight              1        8         5                               [ 178 distinct values ]
Weight_Status       2       11         9      Weight Status            , Normal, Overweight, Underweight

 

The DOSUBL function is relatively new. It was introduced in SAS 9.4 and the original documentation (even up to release 9.4M5) was indeed difficult to grasp. Apparently it has been rewritten and extended (with more examples) recently. In practice I have basically used DOSUBL in just one project where the calculation of a mathematical function involved several DATA and PROC steps and all these steps could be executed "on the side" by means of the DOSUBL function. Thus I was able to call the complicated mathematical function in a DATA step like any other function.

 

Similarly, in your example the PROC SQL step creating macro variable vlist is executed "on the side" (like in a separate SAS session) in each iteration of the DATA step calling the DOSUBL function.

 

An introduction to the DOSUBL function with more details can be found in the SAS Global Forum 2013 paper Submitting SAS® Code On The Side by Rick Langston. Many more conference papers discussing the DOSUBL function are available in the search results https://www.lexjansen.com/search/searchresults.php?q=DOSUBL on lexjansen.com.

 


@anaisrossetto wrote:

Any good advice from your experience to learn SAS? 


You mean: learn programming in SAS? Once you've learned the basics (e.g., in one or more of the free courses such as SAS® Programming 1: Essentials) I think the best you can do is to practice programming in SAS. Play around with DATA step statements, functions, formats, procedure options, etc. using small datasets (such as SASHELP.CLASS, SASHELP.CARS, ... or datasets you created on your own in a DATA step) and examine the logs, outputs and (using procedures like PROC PRINT and PROC CONTENTS) resulting datasets. Read the documentation of the language elements you are using and be curious to see what other functions, formats, etc. exist and what they could be used for. Personally, I never attended an introductory SAS course, but learned it (starting in 1997) on the job and from the documentation and I'm still learning by doing as well as from the documentation, from the Knowledge Base, conference papers, blogs, other online resources, books -- and by reading and writing posts in this wonderful forum.

anaisrossetto
Calcite | Level 5

Thank you very much for all this information!! 

I've learnt many things thanks to you  

plus I am able to do what i am looking for 🙂 

I've tried to insert the code in a macro, but something goes wrong and I think it is because of the dosubl. the table (have) passed has ab argument has to be called in a different manner than the normal way. I searched in the doc but I am limited in my understanding on how sas is working. 

maybe in few months ...  🙂 

 

FreelanceReinh
Jade | Level 19

You're welcome. If you share your macro code or the log (assuming that it shows that "something goes wrong," as you say), I can have a look and suggest improvements. (Please use the text box opened with the "</>" button to post the code or log.)

anaisrossetto
Calcite | Level 5

oh nice 🙂

I use the exact same code and data than you give and just insert it in a macro.

The problem in the output is for the column values of the want table, every row gets the value of the first variable instead of its own. I guess it is something related to the environement and the way dosubl work. 

here is the coresponding code :

 

%macro metadata(in, out, nblevel);
	/* Retrieve selected metadata */
	proc contents data=&in. out=cnt(keep=name type length varnum /* format informat */ label) noprint;
	run;

	/* Count distinct values of each variable */

	ods select none;
	ods output nlevels=nlev;
	proc freq data=&in. nlevels;
	tables _all_;
	run;
	ods select all;

	/* Add these counts to the metadata */

	proc sql;
	create table cnt2 as
	select c.*, NLevels from cnt c left join nlev f
	on c.name=f.TableVar
	order by name;
	quit;

	/* Add lists of unique variable values or, if there are "too many" (here: >10), just their number */
	data &out.(drop=nlevels);
	set cnt2;
	length VALUES $300;
	if nlevels<=&nblevel. /* or some other threshold */ then do;
	  _n_=dosubl('proc sql noprint; select distinct '||name||' into :vlist separated by ", " from &in.; quit;');
	  values=symget('vlist');
	end;
	else values=catx(' ', '[', nlevels,'distinct values ]');
	run;
%mend ;

/* call*/
%metadata(in=have, out = work.want, nblevel =10);
FreelanceReinh
Jade | Level 19

I applied your macro definition and macro call to the input dataset HAVE that we used earlier ...

data have;
set sashelp.heart;
keep height weight smoking _char_;
run;

... and the output was exactly the same as shown in my earlier post. So your macro works as expected on this test dataset. Do you get a different output for the above sample data?

anaisrossetto
Calcite | Level 5

 

here is the output using the macro : 

Capture_dataset_diff.PNG

 

instead of the code not in a macro:

normal.PNG

 

I am using SAS 9.4

FreelanceReinh
Jade | Level 19

This is strange and does not occur with my SAS 9.4M5. What is your release number as shown by automatic macro variable SYSVLONG?

%put &sysvlong;
anaisrossetto
Calcite | Level 5

I get 

 %put &sysvlong;
9.04.01M3P062415

 

FreelanceReinh
Jade | Level 19

@anaisrossetto wrote:

I get 

 %put &sysvlong;
9.04.01M3P062415

 


Thanks. Unfortunately, this is the release for which at least one bug with DOSUBL has been discussed (and reported to SAS Technical Support) years ago:

So I think your choices (sorted by increasing complexity) are:

  • upgrade to a more recent SAS release (like 9.4M7)
  • apply PROC SQL (select distinct ...) to one variable at a time and combine the results
  • use a different technique (e.g. hash objects) to create the desired lists of distinct values.

Do you think one of these options would be feasible?

FreelanceReinh
Jade | Level 19

@FreelanceReinh wrote:
  • apply PROC SQL (select distinct ...) to one variable at a time and combine the results

Here is a simple implementation of the above suggestion:

 

Replace the last DATA step in your macro by these two DATA steps:

data _null_;
set cnt2;
if nlevels<=&nblevel.
  then call execute('proc sql noprint; select distinct '||name||' into :'||name||' separated by ", " from &in.; quit;');
run;

data &out.(drop=nlevels);
set cnt2;
length VALUES $300;
if nlevels<=&nblevel. then values=symget(name);
else values=catx(' ', '[', nlevels,'distinct values ]');
run;

The first DATA step creates a macro variable containing the list of distinct values for each variable with at most &nblevel distinct values in dataset &in. The name of the macro variable is the same as the name of the dataset variable. The second DATA step puts these lists into variable VALUES where appropriate.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1771 views
  • 0 likes
  • 2 in conversation