Hi all, I am looking for a better way to code in SAS for the following situation. Data I have is city, temperature and maximum temperature. What I want to code is for each city if the temp is greater than max_temp, I want to ignore the rest of the datalines and calculate average temperature for each city. For example, for city DC, I want to consider 89, 91, 98 and ignore rest of the data like 101 & 105 since these temp are greater than max_temp. I know how to code in cobol but I want code better way(may be sas function which I may not know) in sas, please advice. thanks in advance.
Neal.
data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;
I get an error when I run this:
data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;
Log:
NOTE: Invalid data for city in line 30 1-2.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
30 DC 89 100
city=. temp=89 max_temp=100 _ERROR_=1 _N_=1
NOTE: Invalid data for city in line 31 1-2.
31 DC 91 100
city=. temp=91 max_temp=100 _ERROR_=1 _N_=2
NOTE: Invalid data for city in line 32 1-2.
32 DC 98 100
city=. temp=98 max_temp=100 _ERROR_=1 _N_=3
NOTE: Invalid data for city in line 33 1-2.
33 DC 101 100
city=. temp=101 max_temp=100 _ERROR_=1 _N_=4
NOTE: Invalid data for city in line 34 1-2.
34 DC 105 100
city=. temp=105 max_temp=100 _ERROR_=1 _N_=5
Edit your original post with valid SAS code. Hint: city is a character variable.
Put as much effort into your post as you expect of others in their reply (I'm not trolling, perhaps that's a harsh way of saying test your code before you post).
Edit:
Ok, this is what I tried, but I'm not going to put further effort into it. Perhaps you can run with this and get it to work.
data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;
proc summary data=have (where=(temp <= max_temp)) nway;
by city notsorted;
var temp;
output out=avg mean=avg;
run;
"if the temp is greater than max_temp, I want to ignore the rest of the datalines and calculate average temperature for each city"
Do you mean to ignore all subsequent temperatures, even if inferior to max_temp?
Thanks for the reply, the data provided does not have like that(inferior) but I would like to know if it is inferior as well.
I get an error when I run this:
data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;
Log:
NOTE: Invalid data for city in line 30 1-2.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
30 DC 89 100
city=. temp=89 max_temp=100 _ERROR_=1 _N_=1
NOTE: Invalid data for city in line 31 1-2.
31 DC 91 100
city=. temp=91 max_temp=100 _ERROR_=1 _N_=2
NOTE: Invalid data for city in line 32 1-2.
32 DC 98 100
city=. temp=98 max_temp=100 _ERROR_=1 _N_=3
NOTE: Invalid data for city in line 33 1-2.
33 DC 101 100
city=. temp=101 max_temp=100 _ERROR_=1 _N_=4
NOTE: Invalid data for city in line 34 1-2.
34 DC 105 100
city=. temp=105 max_temp=100 _ERROR_=1 _N_=5
Edit your original post with valid SAS code. Hint: city is a character variable.
Put as much effort into your post as you expect of others in their reply (I'm not trolling, perhaps that's a harsh way of saying test your code before you post).
Edit:
Ok, this is what I tried, but I'm not going to put further effort into it. Perhaps you can run with this and get it to work.
data have;
input city temp max_temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;
proc summary data=have (where=(temp <= max_temp)) nway;
by city notsorted;
var temp;
output out=avg mean=avg;
run;
You are absolutely right, I could have spent some more time in posting the data and verified the code myself. It has been while I logged in SAS communities and almost forgot the rules. thanks for reminder and for the code.
Once you solved the problem of generating data set have, consider the following code:
data want;
set have;
by city;
retain high_temp_found 0;
if first.city then high_temp_found=0;
if high_temp_found=0 and temp>max_temp then high_temp_found=1;
if high_temp_found=0 then output;
run;
This code generate a new variable HIGH_TEMP_FOUND whose value will be retained. I.e., it will not be automatically reset to missing with every incoming observation. Set it to zero at the start of each city. For each observation that it is 0, see if it should be changed to 1 (if temp>max_temp). This value of 1 will be retained to the end of the city, so no testing of subsequent records is needed. Finaly, only if the record-in-hand still has a value of 0 then output.
Note this will even delete records with temp<=max_temp, if they follow the first observation having temp>max_temp.
You're missing a $ sign on you Input statement for City. Otherwise your Data step to read the data in is essentially fine.
Here's my approach for what your asking for, below.
data have;
input city $
temp
max_temp;
IF Temp < Max_Temp;
datalines;
DC 89 100
DC 91 100
DC 98 100
DC 101 100
DC 105 100
NY 86 100
NY 98 100
LA 78 100
LA 89 100
LA 106 100
LA 107 100
;
run;
PROC SQL;
SELECT City
,AVG(Temp) AS Avg_Temp FORMAT 8.1
FROM Have
GROUP BY City
;
QUIT;
If you look in the Data step, you'll see:
IF Temp < Max_Temp;
This is called a "subsetting IF". Only records meeting the criteria of the IF statement will be allowed to continue in the processing. With this one concise statement, I can filter out all records that exceed the maximum temperature to be considered.
Next, I use SQL to get the average. The group by clause aggregates the data by city, and then I use an aggregate function, AVG, to compute the average.
Results:
Jim
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.