hello,
i am new to the proc sql in sas. my dataset like this:
| Obs | id | area | year |
|---|---|---|---|
| 5818 | 1 | NEW YORK | 2020 |
| 5819 | 1 | NEW YORK | 2021 |
| 5820 | 1 | NEWARK | 2020 |
| 5821 | 1 | NEWARK | 2021 |
i try to group them by id, year and area and i used the following code:
i think i supposed to get only two records but instead, i still got the 4 records. i can't figure out what's wrong with this code.
proc sql;
create table data.clientematga2
as select distinct a.id, a.area, a.year
from data.clientematga a
group by euci42, ematga , a.year;
quit;
What does your log say?
Hard to say without seeing representative data. You have columns in your group by clause that do not appear in your posted data.
i try to group them by id, year and area and i used the following code:
The data is already grouped by id, area and year, but you want id, year and area? Is that it? This is simply a sort of the data
proc sort data=have;
by id year area;
run;
Good evening!
The "select distinct" refers to uniqueness in all columns given as an argument, in your case id, year, area ...
Now as I see four different (distinct) combinations in the sample provided by you, why do you expect 2 rows?
--fja
group by euci42, ematga , a.year
What is contained in these variables? Why do you use GROUP BY without a summary function (SAS SQL will automatically convert this to ORDER BY)? Why do you use variables in the GROUP BY which are not contained in your SELECT?
GROUP BY is used to calculate summary statistics for the groups, not for simple re-ordering.
Please post usable example data in a data step with datalines, and the expected result.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.