BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi all, 

 

Today I want to list all distinct values in one column in one dataset. I did a search but mainly other discussions are about how to count the frequency of unique value of one variable by using sql.

 

For example, I use the dataset sashelp.class

Name	Sex	Age	Height	Weight
Alfred	M	14	69	    112.5
Alice	F	13	56.5	84
Barbara	F	13	65.3	98
Carol	F	14	62.8	102.5
Henry	M	14	63.5	102.5
James	M	12	57.3	83
Jane	F	12	59.8	84.5
Janet	F	15	62.5	112.5
Jeffrey	M	13	62.5	84
John	M	12	59	    99.5
Joyce	F	11	51.3	50.5
Judy	F	14	64.3	90
Louise	F	12	56.3	77
Mary	F	15	66.5	112
Philip	M	16	72	    150
Robert	M	12	64.8	128
Ronald	M	15	67	    133
Thomas	M	11	57.5	85
William	M	15	66.5	112

What I want is how to code to get

Sex 
M
F

or

Age
14
13
12
15
16
11

(neither the ascending nor descending order is not a concern here).

 

Warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

From the array statement documentation

array x{1:5,1:3} score1-score15;

{*}

specifies that SAS is to determine the subscript by counting the variables in the array. When you specify the asterisk, also include array-elements.

Restriction You cannot use the asterisk with _TEMPORARY_ arrays or when you define a multidimensional array.

 

So SAS counts the number of variable names provided to set the size of the array.

Your example with a macro variable actually hides a potential issue. IF you provide an index and it does not match the number of variables provided you get an error:

54   data junk;
55      array v (2) var1 var2 var3;
ERROR: Too many variables defined for the dimension(s) specified for the array v.
56      do i = 1 to dim(v);
57        v[i]=i;
58      end;
59   run;

So: Array v(2) &vars.; is not necessarily equivalent to Array v(*) &vars.;

The same error occurs if the specified index is greater than the number of variables provided for the same reason.

 

As stated there are restrictions. Also if you are using Array var can use the index to determine the variables created as

Array var (3);

or use

Array var(*) var1-var3;

 

Ways specifies how many class variables are used in the combinations of Proc Means/summary.

When you use Ways 1; then the "combinations" consist of single variables.

Run this code with different numbers for the Ways:

 

Proc summary data=sashelp.class;

   class _all_;

  ways (number between 1 and 5 as that is how many variables are in the data set);

   output out=work.summary;

run;

and examine the results. Should get pretty clear quickly.

 

By Default if no Ways or Types statement or the option NWAY  isn't used you get summaries overall (type=0) then the single levels of class variables, the paired levels of class variables , the triplets of class variables etc until all the possible combinations of the number of class variables is exhausted.

Again look at the output for something like

 

Proc summary data=sashelp.class;

   class _all_;

  output out=work.summary2;

run;

which will generate 31 combinations of the class variables. With type 31 basically being the input data set with _type_=31 and _freq_=1 in this case.

View solution in original post

12 REPLIES 12
Shmuel
Garnet | Level 18
proc sql;
   create table want as
   select distinct <variable(s)>   /* SEX or AGE */
   from have;
quit;

or 

proc sort data=have(keep=<variable>
                out=want NODUPKEY;
    by <variable>;
run;
Phil_NZ
Barite | Level 11

Hi @Shmuel 

 

Thank you for your suggestion, I just have a small curiosity about the first code

proc sql;
   create table want as
   select distinct <variable(s)>   /* SEX or AGE */
   from have;
quit;

Associated log

28         proc sql;
29            create table want as
30            select distinct age sex /* SEX or AGE */
                                  ___
                                  22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS, 
              CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.  

31            from sashelp.class;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
32         quit;

 

Because you wrote down that <variable(s)> so I expect that I can list all unique values of 2 or more variables at one time but when running in SAS, I get error.

Other than that, these two solutions work well with one variable at one time.

 

Please let me know if I fell into any fallacy when interpreting your code.

 

Warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
PaigeMiller
Diamond | Level 26

@Phil_NZ wrote:

Hi @Shmuel 

 

Thank you for your suggestion, I just have a small curiosity about the first code

proc sql;
   create table want as
   select distinct <variable(s)>   /* SEX or AGE */
   from have;
quit;

Associated log

28         proc sql;
29            create table want as
30            select distinct age sex /* SEX or AGE */
                                  ___
                                  22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, AS, 
              CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.  

31            from sashelp.class;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
32         quit;

 

Because you wrote down that <variable(s)> so I expect that I can list all unique values of 2 or more variables at one time but when running in SAS, I get error.

Other than that, these two solutions work well with one variable at one time.

 

Please let me know if I fell into any fallacy when interpreting your code.

 

Warm regards.


The SAS SQL syntax always separates variable names in a list using a comma, not a space.

--
Paige Miller
Phil_NZ
Barite | Level 11

Thank you!

If I change from the space to ',', it leads to an interaction between variables, is there any way to adjust this proc SQL to list unique values of two variables at one time?

 

My97_0-1614727672261.png

Warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Shmuel
Garnet | Level 18

@Phil_NZ relating to your's 

"Because you wrote down that <variable(s)> so I expect that I can list all unique values of 2 or more variables at one time but when running in SAS, I get error.

Other than that, these two solutions work well with one variable at one time."

1) As already said, any list used in sql need be separated by comma.

2) The meaning of select distinct <var1>,<var2> [,...,varn] is to get

     the distinct combinations of the variables in the list.

3) To get distinct value of each variable in a separate output dataset you can use:

proc sql;
   create table sex as select distinct sex from <dataset>;
   create table age as select distinct age from <dataset>;
quit;
data_null__
Jade | Level 19
proc summary data=sashelp.class missing;
   class sex age;
   ways 1;
   output out=UNI;
   run;
proc print;
   run;

Capture.PNG

ballardw
Super User

Multiple variables in one set or one variable in one set? Quite different especially if mixing numeric an character variables (hint: Character variables cannot appear in a numeric column)

 

proc sql;

   create table sex as

   select distinct sex

   from sashelp.class

   ;

quit;

 

is one way to get single variable into a single data set.

 

May take awhile if you are still working with that 120Gbyte data set.

Phil_NZ
Barite | Level 11

Hi @ballardw 

 

Thank you very much for your help so far. It is exactly a 1,2TB dataset with more than 100 million observations (daily frequency). Now I am chopping the dataset to small pieces and deal with it step by step and only keep some inevitable variables following your courtesy suggestion in previous topics.

 

I am confident that I can deal with this project with the help from the forum, even I cannot finish it, it is also a great chance for me to boost my SAS skills, just try my best, worrying brings almost nothing.

 

Sorry, it is quite a bit offtrack, regarding your confusion because of my ambiguous introduction, it is " get unique values of one variable in one dataset containing many variables" (I just put two samples for a clear description).

Best regards,

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
ballardw
Super User

@Phil_NZ wrote:

Hi @ballardw 

 

Thank you very much for your help so far. It is exactly a 1,2TB dataset with more than 100 million observations (daily frequency). Now I am chopping the dataset to small pieces and deal with it step by step and only keep some inevitable variables following your courtesy suggestion in previous topics.

 

I am confident that I can deal with this project with the help from the forum, even I cannot finish it, it is also a great chance for me to boost my SAS skills, just try my best, worrying brings almost nothing.

 

Sorry, it is quite a bit offtrack, regarding your confusion because of my ambiguous introduction, it is " get unique values of one variable in one dataset containing many variables" (I just put two samples for a clear description).

Best regards,

 


Clear as mud. Your "example" shows exactly one variable and the appearance seems like one data set.

Is there supposed to be a variable name in the column? If so, how are you going to tell a variable name from a value if you have multiple variables in a single column?

 

A modification of @data_null__ 's approach:

proc summary data=sashelp.class missing;
   class _all_;
   ways 1;
   output out=work.temp;
run;


options missing=' ';
data want; 
   set work.temp (drop=_type_ _freq_);
   length value $ 100;
   value = cats( of _all_);
   keep value;
run;

The option missing of a blank prevents . from getting concatenated and the structure of the output from summary with those two variables dropped on leaves only one non-missing value per row which the concatenation leaves as a single value.

Set the length to longest expected single value.

 

And perhaps it may be time to discuss what this single column data set is going to be used for and how.

data_null__
Jade | Level 19

Maybe something like this using CLASS statement option MLF so numeric class variables are converted to character.

 

%let vars=sex age;
proc summary data=sashelp.class missing descendtypes chartype;;
   class sex age / mlf; /*MLF converts numeric class to character*/
   ways 1;
   output out=UNI;
   run;
proc print;
   run;

data uni;
   set uni;
   array _cls[*] &vars;
   i = index(_type_,'1');
   vname = vname(_cls[i]);
   length value $64; /*or more*/
   value = _cls[i];
   drop &vars;
   run;
proc print;
   run;
   

Capture.PNG

Phil_NZ
Barite | Level 11

Hi @data_null__ 

 

Thank you for a very nice code, it is over my expectation already, just two novice questions about the way you use your code:

 

1. Your array code is as below

array _cls[*] &vars;

From what I know about array code, inside [ ] should be the number of the elements of this array, that's why when I adjust this code to

array _cls[2] &vars;

 The result is still the same. 

Can I ask when you are allowed to use the asterisk like that inside the bracket [ ], I deem it is only available if the elements of the array being listed by a macro statement as in your example.

 

2. About the line of code

 ways 1;

I did a search to understand this statement and I found this link.  But I still feel ambiguous about this statement, could you please explain what does it play in your code?

 

Warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
ballardw
Super User

From the array statement documentation

array x{1:5,1:3} score1-score15;

{*}

specifies that SAS is to determine the subscript by counting the variables in the array. When you specify the asterisk, also include array-elements.

Restriction You cannot use the asterisk with _TEMPORARY_ arrays or when you define a multidimensional array.

 

So SAS counts the number of variable names provided to set the size of the array.

Your example with a macro variable actually hides a potential issue. IF you provide an index and it does not match the number of variables provided you get an error:

54   data junk;
55      array v (2) var1 var2 var3;
ERROR: Too many variables defined for the dimension(s) specified for the array v.
56      do i = 1 to dim(v);
57        v[i]=i;
58      end;
59   run;

So: Array v(2) &vars.; is not necessarily equivalent to Array v(*) &vars.;

The same error occurs if the specified index is greater than the number of variables provided for the same reason.

 

As stated there are restrictions. Also if you are using Array var can use the index to determine the variables created as

Array var (3);

or use

Array var(*) var1-var3;

 

Ways specifies how many class variables are used in the combinations of Proc Means/summary.

When you use Ways 1; then the "combinations" consist of single variables.

Run this code with different numbers for the Ways:

 

Proc summary data=sashelp.class;

   class _all_;

  ways (number between 1 and 5 as that is how many variables are in the data set);

   output out=work.summary;

run;

and examine the results. Should get pretty clear quickly.

 

By Default if no Ways or Types statement or the option NWAY  isn't used you get summaries overall (type=0) then the single levels of class variables, the paired levels of class variables , the triplets of class variables etc until all the possible combinations of the number of class variables is exhausted.

Again look at the output for something like

 

Proc summary data=sashelp.class;

   class _all_;

  output out=work.summary2;

run;

which will generate 31 combinations of the class variables. With type 31 basically being the input data set with _type_=31 and _freq_=1 in this case.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 31875 views
  • 9 likes
  • 5 in conversation