BookmarkSubscribeRSS Feed
DGBK
Obsidian | Level 7

Hi,

 

I am attempting to write a query that pulls data from one column in a table. The query contains two case statements that each pull from the same column based on the year. I want my data to look like:

 

ID     Star2019      Star2020

1       5                   5

2       2                   3

3       4                   2

 

Instead my results look like:

ID     Star2019      Star2020

1                            5

1       5                   

2                            3

2       2                   

3                            2

3       4                   

 

 

The problem I am running into is that the first case statement creates a row, and the second case statement creates a duplicate row. Both of the rows contain a value and a blank. I keep thinking that if I can somehow create the variables using a case statements and two join steps, I may be able to avoid the duplicates.

 

Here is a simplified version of the code:

proc sql;
create table Stars_Compare as
Select distinct
	A.ID,
	Other_Variables,
	etc,
	
	Case
		When YYYY = '2019' and Overall like ('X%') Then 'N/A'
		When YYYY = '2019' and Overall like ('Y%') Then 'N/A'
		When YYYY = '2019' Then Overall
	End as Star_2019,

	Case
		When YYYY = '2020' and Overall like ('X%') Then 'N/A'
		When YYYY = '2020' and Overall like ('Y%') Then 'N/A'
		When YYYY = '2020' Then Overall
	End as Star_2020

		

From work.Info A


	Left join work.Stars B on
	A.ID=B.ID



Where YYYY in ('2019','2020')
Order by ID;
quit;

 

 

 

 

 

2 REPLIES 2
heather_g
Fluorite | Level 6

SAS (and SQL) operates row by row from the original data set (or here, the product of two data sets joined).

If you had YYYY value of 2019 on one row, and value YYYY of 2020 on another row, for the same ID, that's going to come out in the resulting data set, row by row, just as before. Thus, on the row where YYYY = 2019, the variable Star2020 can't have a value because the criterion is not satisfied.

 

In order to produce the result you showed, you would need at least one more step in SQL, or approach it differently from the beginning.

 

One way to get this result, following the SQL you showed, is to "group by" all the variables that are identical across the rows to the level you desire, then use the max() function for any numeric variables where you want the value to show in the end and not the blank / missing.

 

proc sql;

create table Stars_Compare2 as

select ID,

          <Other Variables>,

          max(Star2019) as Star2019,

          max(Star2020) as Star2020

from Stars_Compare

group by ID, <Other Variables>;

quit;

       

 

PGStats
Opal | Level 21

I would try:

 

proc sql;
create table Stars_Compare as
(Select
	A.ID,
	<Other_Variables>,
	case when first(Overall) in ("X", "Y") Then 'N/A'
		else Overall end as star_2019
 From work.Info A
	Left join work.Stars B on
	A.ID=B.ID) 
 where YYYY = '2019')
natural full join	
(Select
	A.ID,
	<Other_Variables>,
	case when first(Overall) in ("X", "Y") Then 'N/A'
		else Overall end as star_2020
 From work.Info A
	Left join work.Stars B on
	A.ID=B.ID) 
 where YYYY = '2020');
quit;

just make sure <Other_Variables> doesn't include YYYY.

PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 7481 views
  • 0 likes
  • 3 in conversation