BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi all, I am looking for a better way to code in SAS for the following situation. Data I have is city, temperature and maximum temperature. What I want to code is for each city if the temp is greater than max_temp, I want to ignore the rest of the datalines and calculate average temperature for each city. For example, for city DC, I want to consider 89, 91, 98 and ignore rest of the data  like 101 & 105 since these temp are greater than max_temp. I know how to code in cobol but I want code better way(may be sas function which I may not know) in sas, please advice. thanks in advance.

 

Neal.

data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ScottBass
Rhodochrosite | Level 12

I get an error when I run this:

 

data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;

Log:

 

NOTE: Invalid data for city in line 30 1-2.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0         
30         DC 89 100
city=. temp=89 max_temp=100 _ERROR_=1 _N_=1
NOTE: Invalid data for city in line 31 1-2.
31         DC 91 100
city=. temp=91 max_temp=100 _ERROR_=1 _N_=2
NOTE: Invalid data for city in line 32 1-2.
32         DC 98 100
city=. temp=98 max_temp=100 _ERROR_=1 _N_=3
NOTE: Invalid data for city in line 33 1-2.
33         DC 101 100
city=. temp=101 max_temp=100 _ERROR_=1 _N_=4
NOTE: Invalid data for city in line 34 1-2.
34         DC 105 100
city=. temp=105 max_temp=100 _ERROR_=1 _N_=5

Edit your original post with valid SAS code.  Hint:  city is a character variable.

Put as much effort into your post as you expect of others in their reply (I'm not trolling, perhaps that's a harsh way of saying test your code before you post).

 

Edit:

 

Ok, this is what I tried, but I'm not going to put further effort into it.  Perhaps you can run with this and get it to work.

 

data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;

proc summary data=have (where=(temp <= max_temp)) nway;
   by city notsorted;
   var temp;
   output out=avg mean=avg;
run;

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

View solution in original post

6 REPLIES 6
PGStats
Opal | Level 21

"if the temp is greater than max_temp, I want to ignore the rest of the datalines and calculate average temperature for each city"

Do you mean to ignore all subsequent temperatures, even if inferior to max_temp?

PG
saslovethemost
Quartz | Level 8

Thanks for the reply, the data provided does not have like that(inferior) but I would like to know if it is inferior as well.

ScottBass
Rhodochrosite | Level 12

I get an error when I run this:

 

data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;

Log:

 

NOTE: Invalid data for city in line 30 1-2.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0         
30         DC 89 100
city=. temp=89 max_temp=100 _ERROR_=1 _N_=1
NOTE: Invalid data for city in line 31 1-2.
31         DC 91 100
city=. temp=91 max_temp=100 _ERROR_=1 _N_=2
NOTE: Invalid data for city in line 32 1-2.
32         DC 98 100
city=. temp=98 max_temp=100 _ERROR_=1 _N_=3
NOTE: Invalid data for city in line 33 1-2.
33         DC 101 100
city=. temp=101 max_temp=100 _ERROR_=1 _N_=4
NOTE: Invalid data for city in line 34 1-2.
34         DC 105 100
city=. temp=105 max_temp=100 _ERROR_=1 _N_=5

Edit your original post with valid SAS code.  Hint:  city is a character variable.

Put as much effort into your post as you expect of others in their reply (I'm not trolling, perhaps that's a harsh way of saying test your code before you post).

 

Edit:

 

Ok, this is what I tried, but I'm not going to put further effort into it.  Perhaps you can run with this and get it to work.

 

data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;

proc summary data=have (where=(temp <= max_temp)) nway;
   by city notsorted;
   var temp;
   output out=avg mean=avg;
run;

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
saslovethemost
Quartz | Level 8

You are absolutely right, I could have spent some more time in posting the data and verified the code myself. It has been while I logged in SAS communities and almost forgot the rules. thanks for reminder and for the code.

mkeintz
PROC Star

@saslovethemost 

 

Once you solved the problem of generating data set have, consider the following code:

 

data want;
  set have;
  by city;
  retain high_temp_found 0;
  if first.city then high_temp_found=0;
  if high_temp_found=0 and temp>max_temp then high_temp_found=1;
  if high_temp_found=0 then output;
run;

This code generate a new variable HIGH_TEMP_FOUND whose value will be retained.  I.e., it will not be automatically reset to missing with every incoming observation.  Set it to zero at the start of each city.  For each observation that it is 0, see if it should be changed to 1 (if temp>max_temp).  This value of 1 will be retained to the end of the city, so no testing of subsequent records is needed.  Finaly, only if the record-in-hand still has a value of 0 then output.

 

Note this will even delete records with temp<=max_temp, if they follow the first observation having temp>max_temp. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jimbarbour
Meteorite | Level 14

@saslovethemost,

 

You're missing a $ sign on you Input statement for City.  Otherwise your Data step to read the data in is essentially fine.

 

Here's my approach for what your asking for, below.  

 

data have;
	input	city $
			temp 
			max_temp;
	IF	Temp	<	Max_Temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;

PROC	SQL;
	SELECT	City
			,AVG(Temp)		AS	Avg_Temp	FORMAT	8.1
		FROM	Have
		GROUP	BY	City
		;
QUIT;

 

If you look in the Data step, you'll see:

IF	Temp	<	Max_Temp;

This is called a "subsetting IF".  Only records meeting the criteria of the IF statement will be allowed to continue in the processing.  With this one concise statement, I can filter out all records that exceed the maximum temperature to be considered.

 

Next, I use SQL to get the average.  The group by clause aggregates the data by city, and then I use an aggregate function, AVG, to compute the average.  

 

Results:

jimbarbour_0-1602132244301.png

 

Jim

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1218 views
  • 4 likes
  • 5 in conversation