- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am attempting to write a query that pulls data from one column in a table. The query contains two case statements that each pull from the same column based on the year. I want my data to look like:
ID Star2019 Star2020
1 5 5
2 2 3
3 4 2
Instead my results look like:
ID Star2019 Star2020
1 5
1 5
2 3
2 2
3 2
3 4
The problem I am running into is that the first case statement creates a row, and the second case statement creates a duplicate row. Both of the rows contain a value and a blank. I keep thinking that if I can somehow create the variables using a case statements and two join steps, I may be able to avoid the duplicates.
Here is a simplified version of the code:
proc sql;
create table Stars_Compare as
Select distinct
A.ID,
Other_Variables,
etc,
Case
When YYYY = '2019' and Overall like ('X%') Then 'N/A'
When YYYY = '2019' and Overall like ('Y%') Then 'N/A'
When YYYY = '2019' Then Overall
End as Star_2019,
Case
When YYYY = '2020' and Overall like ('X%') Then 'N/A'
When YYYY = '2020' and Overall like ('Y%') Then 'N/A'
When YYYY = '2020' Then Overall
End as Star_2020
From work.Info A
Left join work.Stars B on
A.ID=B.ID
Where YYYY in ('2019','2020')
Order by ID;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
SAS (and SQL) operates row by row from the original data set (or here, the product of two data sets joined).
If you had YYYY value of 2019 on one row, and value YYYY of 2020 on another row, for the same ID, that's going to come out in the resulting data set, row by row, just as before. Thus, on the row where YYYY = 2019, the variable Star2020 can't have a value because the criterion is not satisfied.
In order to produce the result you showed, you would need at least one more step in SQL, or approach it differently from the beginning.
One way to get this result, following the SQL you showed, is to "group by" all the variables that are identical across the rows to the level you desire, then use the max() function for any numeric variables where you want the value to show in the end and not the blank / missing.
proc sql;
create table Stars_Compare2 as
select ID,
<Other Variables>,
max(Star2019) as Star2019,
max(Star2020) as Star2020
from Stars_Compare
group by ID, <Other Variables>;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would try:
proc sql;
create table Stars_Compare as
(Select
A.ID,
<Other_Variables>,
case when first(Overall) in ("X", "Y") Then 'N/A'
else Overall end as star_2019
From work.Info A
Left join work.Stars B on
A.ID=B.ID)
where YYYY = '2019')
natural full join
(Select
A.ID,
<Other_Variables>,
case when first(Overall) in ("X", "Y") Then 'N/A'
else Overall end as star_2020
From work.Info A
Left join work.Stars B on
A.ID=B.ID)
where YYYY = '2020');
quit;
just make sure <Other_Variables> doesn't include YYYY.