- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello:
I have a sample dataset list below. I would like to keep all the text in column "Class", "Component", and "Product" over (>=) 3 times and keep them. The rest would delete. Please help, thank you.
data sample;
infile datalines dsd;
input Class : $20. Component : $20. Product : $20.;
datalines;
ANTI/INSULIN, , ,
ANTI/INSULIN, , ,
ANTI/VITA, , ,
PHENAZO, , ,
Large, , ,
ANTI/VITA, , ,
ANTI/VITA, , ,
BETA-ADRENERGIC, , ,
FLOWER, , ,
PHENAZO, , ,
PHENAZO, , ,
ANTI/INSULIN, , ,
ANTI/INSULIN, , ,
, ANTI/INSULIN, ,
, BETA-ADRENERGIC, ,
, ANTI/INSULIN, ,
, PSYCH, ,
, HERBS, ,
, PHENAZO, ,
, ANTI/VITA, ,
, ANTI/INSULIN, ,
, ANTI/INSULIN, ,
, HERBS, ,
, PSYCH, ,
, PSYCH, ,
, HERBS, ,
, HERBS, ,
, BETA-ADRENERGIC, ,
, BETA-ADRENERGIC, ,
, , BANZOCAINE,
, , BANZOCAINE,
, , FLOWER,
, , ANTI/INSULIN,
, , FLOWER,
, , ANTI/INSULIN,
, , ANTI/INSULIN,
, , ANTI/INSULIN,
, , HERBS,
, , BETA-ADRENERGIC,
, , SKIN,
, , BETA-ADRENERGIC,
, , PHENAZO,
, , PHENAZO,
, , FLOWER,
;
proc sort data=sample; by class component product; run;
data frequencyover3;
infile datalines dsd;
input Class : $20. Component : $20. Product : $20.;
datalines;
ANTI/INSULIN, , ,
ANTI/INSULIN, , ,
ANTI/VITA, , ,
PHENAZO, , ,
ANTI/VITA, , ,
ANTI/VITA, , ,
PHENAZO, , ,
PHENAZO, , ,
ANTI/INSULIN, , ,
ANTI/INSULIN, , ,
, ANTI/INSULIN, ,
, BETA-ADRENERGIC, ,
, ANTI/INSULIN, ,
, PSYCH, ,
, HERBS, ,
, ANTI/INSULIN, ,
, ANTI/INSULIN, ,
, HERBS, ,
, PSYCH, ,
, PSYCH, ,
, HERBS, ,
, HERBS, ,
, BETA-ADRENERGIC, ,
, BETA-ADRENERGIC, ,
, , FLOWER,
, , ANTI/INSULIN,
, , FLOWER,
, , ANTI/INSULIN,
, , ANTI/INSULIN,
, , ANTI/INSULIN,
, , FLOWER,
;
proc sort data=final; by class component product; run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
use HAVING in a SQL clause.
select *
from have
group by class
having count(class) >=3;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
use HAVING in a SQL clause.
select *
from have
group by class
having count(class) >=3;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello:
I modified Reeza's code below. However, I got an error message from Log. Please help. Thanks.
data sample;
infile datalines dsd;
input Class : $20. Component : $20. Product : $20.;
datalines;
ANTI/INSULIN, PSYCH, BANZOCAINE,
ANTI/INSULIN, HERBS, BANZOCAINE,
ANTI/VITA, PHENAZO, FLOWER,
PHENAZO, ANTI/VITA, ANTI/INSULIN,
PHENAZO, ANTI/INSULIN, FLOWER,
Large, ANTI/INSULIN, ANTI/INSULIN,
ANTI/VITA, HERBS, ANTI/INSULIN,
ANTI/VITA, PSYCH, ANTI/INSULIN,
BETA-ADRENERGIC, PSYCH, HERBS,
FLOWER, SKIN, BETA-ADRENERGIC,
PHENAZO, HERBS, SKIN,
PHENAZO, HERBS, BETA-ADRENERGIC,
ANTI/INSULIN, BETA-ADRENERGIC, PHENAZO,
ANTI/INSULIN, BETA-ADRENERGIC, PHENAZO,
SUBST, ANTI/INSULIN, FLOWER,
BANZOCAINE, BETA-ADRENERGIC, SUBST,
PHENAZO, ANTI/INSULIN, BANZOCAINE,
;
proc sort data=sample; by class component product; run;
proc sql;
create table frequencyover3
select *
from sample
group by Class Component product
having count(Class or Component or product ) >=3;
quit;
147 proc sql;
148 create table frequencyover3
149 select *
------
73
ERROR 73-322: Expecting an AS.
150 from sample
151 group by Class Component product
---------
22
202
ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, ',', -,
'.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET,
GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN,
OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
152 having count(Class or Component or product ) >=5;
153 quit;
NOTE: The SAS System stopped processing this step because of errors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like your log says: you're missing the string "as". i.e., your line should have been:
create table frequencyover3 as
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Before you spend more time fixing the syntax (and logic) of your proc sql statements, why not provide an example of what you expect the output to look like?
It's easy when you show what it looks like dealing with one column.
But you specified three variables for which you want to filter for values with frequency >= 3. Does that mean you want all the variables for every observation in which at least one of the variables has a value seen 3+ times? (even if the other 2 variables are seen fewer time)?
Or do you really want 3 tables, one per variable, each listed in alphabetic order, per your example?
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I apologize that my sample dataset was post right at the first beginning. I correct it now. Please see the first message. I have edited it, and I add the final version I would like to create. However, I continue receiving error message when I add "as" in Proc SQL. Please help.
proc sql;
create table frequencyover5 as
select *
from sample
group by Class Component product
having count(Class or Component or product ) >=3;
quit;
243 proc sql;
244 create table frequencyover3 as
245 select *
246 from sample
247 group by Class Component product
---------
22
202
ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, ',', -,
'.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET,
GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN,
OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
248 having count(Class or Component or product ) >=3;
249 quit;
NOTE: The SAS System stopped processing this step because of errors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The variables in the GROUP BY statement in SQL need to be comma separated.
ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, ',', -,
'.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET,
GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN,
OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.
That's what the error is telling you, it expects comma's between the variables in the line indicated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for everyone's great suggestion. I got it worked! ![]()