Hi, I am new to SAS and hope for your help.
I am currently using a query which yields a table similar to this:
Attribute1 Attribute2 Attribute3 Attribute4 Count
A x k num1 1
A y k num4 1
A z m num4 1
B x k num2 2
B z k num9 2
B a m num4 2
B b m num7 2
C y k num2 2
...
The current query is approx. like this (I am not allowed to use original data, therefore the strange naming...):
proc sql;
create table .... as
select Attr1, Attr2, Attr3, Attr4, count(*) from Sourcetable
where (Attr4 between ... and ... or Attr 4 between ... and ...)
and Attr1 in (...)
group by Attr1, Attr2, Attr3, Attr4
order by Attr1, Attr2, Attr3, Attr4;
quit;
The Sourcetable also contains, amongst others, Attribute 5.
I would like to include Attribute 5, which is not necessarily a number, after Attribute 4 or after count(*). I do not want to group by Attribute 5 itself. In the grouping I want to see the value of Attribute 5 for which Attribute 4 (which is a number) is highest (and Attribute 1, Attribute 2, Attribute 3 are identical withing the grouping).
I tried to adapt the above query using row_number and partition by, but I learned that this is not possible in SAS:
I make up a specific example:
Let Sourcetable be as follows:
COLOR HOUSE TYPE DAY NAME NOMATTER1 NOMATTER2
2 3 garage 70 mickey blabla k
2 3 garage 71 minni ozk k
2 7 bungalow 15 nice -- l
2 7 bungalow 16 bad 1,1 l
2 8 bungalow 3 cold x l
5 2 garage 7 10 NULL r
5 80 bungalow 8 15 Hi k
5 80 garage 9 8 dog s
5 100 school 1 2 tree k
...
I then want to modify my above proc sql query in a way to obtain from it the following set of rows:
COLOR HOUSE TYPE DAY NAME COUNT
2 3 garage 71 minni 2
2 7 bungalow 16 bad 2
2 8 bungalow 3 cold 1
5 2 garage 7 10 1
5 80 bungalow 8 15 1
5 80 garage 9 8 1
...
Sorry for being complicated or even unclear, only just beginning.
Thanks a lot for any help.
First, since you are new, some advice on your question.
It's great that you included sample data. When you do that, it's better to share the data as code, so that people can easily create your data. And when you share code, you should use the code format (see the running man icon). So to share your example data, you could share it like:
data have ;
input Color House Type : $12. Day Name :$8. ;
cards ;
2 3 garage 70 mickey
2 3 garage 71 minni
2 7 bungalow 15 nice
2 7 bungalow 16 bad
2 8 bungalow 3 cold
5 2 garage 7 10
5 80 bungalow 8 15
5 80 garage 9 8
;
Now, since you kindly went to the trouble of making sample data you have, and the example of the data you want, your question would be clearer if the example SQL code you posted actually used that sample data.
That's all tips for future questions.
As for my suggestion, I think you actually only need to group by 3 columns, not 4.
I think in most SQL implementations to get what you want would require joining together two subqueries, one to calculate the counts, and one to select the record with the max(Day). But because of an oddity in SAS SQL (where it will "remerge" data if you include a non-aggregated column in a grouped query), I think you can get what you want in one step:
proc sql ;
create table want as
select Color,House,Type,Day,Name,count(*) as Count
from have
group by Color,House,Type
having Day=max(day)
;
quit ;
If there are ties for max(day) that will return multiple rows.
Looking at the data how do we know which is the "latest" entry?
Proc SQL in general doesn't normally do anything in the order of the data set so values of variables need to be defined to determine what "latest" might be.
If you need to process data in the order it appears in a data set then the DATA step does that.
Hint: code that does not do what you need is not sufficient description of the process. You need to include a bit more narrative as to what is required referencing variables and values.
Hi, thank you for your reply.
I wrote "for which Attribute 4 (which is a number) is highest ", so this is how I want to determine which is "latest". Is it clearer like this?
I only included "code that does not (fully) do what I want" since I am aiming at improving this existing code.
I also gave an example with real data and was hoping it became clear from it what I aim at.
First, since you are new, some advice on your question.
It's great that you included sample data. When you do that, it's better to share the data as code, so that people can easily create your data. And when you share code, you should use the code format (see the running man icon). So to share your example data, you could share it like:
data have ;
input Color House Type : $12. Day Name :$8. ;
cards ;
2 3 garage 70 mickey
2 3 garage 71 minni
2 7 bungalow 15 nice
2 7 bungalow 16 bad
2 8 bungalow 3 cold
5 2 garage 7 10
5 80 bungalow 8 15
5 80 garage 9 8
;
Now, since you kindly went to the trouble of making sample data you have, and the example of the data you want, your question would be clearer if the example SQL code you posted actually used that sample data.
That's all tips for future questions.
As for my suggestion, I think you actually only need to group by 3 columns, not 4.
I think in most SQL implementations to get what you want would require joining together two subqueries, one to calculate the counts, and one to select the record with the max(Day). But because of an oddity in SAS SQL (where it will "remerge" data if you include a non-aggregated column in a grouped query), I think you can get what you want in one step:
proc sql ;
create table want as
select Color,House,Type,Day,Name,count(*) as Count
from have
group by Color,House,Type
having Day=max(day)
;
quit ;
If there are ties for max(day) that will return multiple rows.
Dear Quentin,
thanks a lot for your reply and the advice for future questions. (It was a bit complicated for me, since I have to use two seperate PCs due to security issues 😕 I will try and follow your advice in future questions, though.)
As for your suggested solution: I wonder if this way I would not lose contributions to "count(*)"...?
Maybe I am wrong and I had a false understanding on "having...". I have understood it so far like a "where"-condition that throws away the other data.
I will try and find out if this works tomorrow, haven't got access to the data right now.
@k_k_ wrote:
Maybe I am wrong and I had a false understanding on "having...". I have understood it so far like a "where"-condition that throws away the other data.
I will try and find out if this works tomorrow, haven't got access to the data right now.
I'm not really a big SQL guy... the HAVING clause is similar to WHERE clause, but HAVING filters data after it has been aggregated by the group by. So the counts are still based on all the data. See the docs: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/sqlproc/p1n8ntmltj8ct6n1hso7x2ys6sbw.htm
The code I posted returns your desired WANT dataset:
Obs Color House Type Day Name Count 1 2 3 garage 71 minni 2 2 2 7 bungalow 16 bad 2 3 2 8 bungalow 3 cold 1 4 5 2 garage 7 10 1 5 5 80 bungalow 8 15 1 6 5 80 garage 9 8 1
So cool, thank you! I will retest during weekend and confirm, but I am hopeful now! 🙂
Whenever sequence comes into play, the DATA step is the tool of choice.
The code is much simpler and easier to read, and usually performs better.
data have;
input color house type $ day name $ nomatter1 $ nomatter2 $;
datalines;
2 3 garage 70 mickey blabla k
2 3 garage 71 minni ozk k
2 7 bungalow 15 nice -- l
2 7 bungalow 16 bad 1,1 l
2 8 bungalow 3 cold x l
5 2 garage 7 10 NULL r
5 80 bungalow 8 15 Hi k
5 80 garage 9 8 dog s
5 100 school 1 2 tree k
;
data want;
set have;
drop nomatter:;
by color house type;
if first.type
then count = 1;
else count + 1;
if last.type;
run;
Feel free to ask if there are elements in there which you do not yet understand.
Thank you for your solution, aswell! I am still a total beginner and felt more comfortable with proc sql to start with 😉 However I will next understand your solution, too. Thanks for your effort and help!
@k_k_ wrote:
Thank you for your solution, aswell! I am still a total beginner and felt more comfortable with proc sql to start with 😉 However I will next understand your solution, too. Thanks for your effort and help!
Any time you spend mastering the DATA step will improve your SAS life greatly. I can guarantee that. The DATA step is the "Swiss Army Knife" of the SAS language.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.