BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cthcon84
Calcite | Level 5
Hello, i created a table, where i have 5 variables and 10000 observations (showing here only 10 of them). Now i want to have  the number of people whose turnover is above average.
 data WORK.TABELLE;
   infile datalines dsd truncover;
   input name:$30. prename:$30. plz:$5. birthday:DATE9. tunrover:32.;
   format birthday DATE9.;
 datalines;
 Schmidt Emma 48951 30SEP1966 44805
 Schmidt Emma 84453 17SEP1953 58865
 Schmidt Emma 76142 08DEC1988 10980
 Schmidt Emma 77190 07OCT1990 83461
 Schmidt Emma 31853 28APR1997 63172
 Schmidt Emma 54647 08MAY1977 72697
 Schmidt Emma 90476 01JAN1962 1924
 Schmidt Emma 31518 25AUG1996 24680
 Schmidt Emma 91016 08FEB1971 37587
 Schmidt Emma 22758 25JUN1996 67124
 
I tried it with proc means mean and then a data step with if turnover>average. But the answer is 10000, which is obviously wrong.
Thank you for help!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Please test your data step a bit more carefully next time. When I run it as written all the data ends up as text in the Nave variable because of DSD.

Also look at your spelling for the last variable.

 

One way:

data WORK.TABELLE;
   infile datalines  truncover;
   input name:$30. prename:$30. plz:$5. birthday:DATE9. turnover:32.;
   format birthday DATE9.;
 datalines;
 Schmidt Emma 48951 30SEP1966 44805
 Schmidt Emma 84453 17SEP1953 58865
 Schmidt Emma 76142 08DEC1988 10980
 Schmidt Emma 77190 07OCT1990 83461
 Schmidt Emma 31853 28APR1997 63172
 Schmidt Emma 54647 08MAY1977 72697
 Schmidt Emma 90476 01JAN1962 1924
 Schmidt Emma 31518 25AUG1996 24680
 Schmidt Emma 91016 08FEB1971 37587
 Schmidt Emma 22758 25JUN1996 67124
 ;

proc sql;
   create table want as 
   select *, (turnover > mean(turnover)) as aboveaverage
   from work.tabelle
   ;
quit;

The SQL calculates the overall mean and compares each value to that mean. The result is 1 (is above average) or 0 (not above average)

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

Use PROC STDIZE. This will subtract the mean from every observation, then you can count the number that are positive (above average).

 

proc stdize data=tabelle method=mean out=tabelle1;
    var tunrover;
run;

proc format;
     value highlow low-<0='Below Avg' 0='Avg' 0<-high='Above Avg';
run;

proc freq data=tabelle1;
     table tunrover;
     format tunrover highlow.;
run;

 

 

For your future benefit, saying something didn't work and not showing us the code you used, is not helpful. For us to help you, we need to see the code that didn't work. (We don't need to see the code that doesn't work here, only because I have shown code that uses a different approach)

 

Another comment: don't try to write your own DATA step code that does something that SAS has already included in a PROC. Why? Because the PROC has been thoroughly tested and debugged by SAS, so you don't have to do those things. You know the PROC works properly, whereas if you write your own DATA step code, you can get it wrong, as you have seen in this case. 

--
Paige Miller
ballardw
Super User

Please test your data step a bit more carefully next time. When I run it as written all the data ends up as text in the Nave variable because of DSD.

Also look at your spelling for the last variable.

 

One way:

data WORK.TABELLE;
   infile datalines  truncover;
   input name:$30. prename:$30. plz:$5. birthday:DATE9. turnover:32.;
   format birthday DATE9.;
 datalines;
 Schmidt Emma 48951 30SEP1966 44805
 Schmidt Emma 84453 17SEP1953 58865
 Schmidt Emma 76142 08DEC1988 10980
 Schmidt Emma 77190 07OCT1990 83461
 Schmidt Emma 31853 28APR1997 63172
 Schmidt Emma 54647 08MAY1977 72697
 Schmidt Emma 90476 01JAN1962 1924
 Schmidt Emma 31518 25AUG1996 24680
 Schmidt Emma 91016 08FEB1971 37587
 Schmidt Emma 22758 25JUN1996 67124
 ;

proc sql;
   create table want as 
   select *, (turnover > mean(turnover)) as aboveaverage
   from work.tabelle
   ;
quit;

The SQL calculates the overall mean and compares each value to that mean. The result is 1 (is above average) or 0 (not above average)

PaigeMiller
Diamond | Level 26

Although SQL is a good choice if you have to do this for only one variable, if you have many variables, PROC STDIZE and PROC FREQ together will be a much better choice than SQL. In SQL, you have to type the formula (turnover > mean(turnover)) as aboveaverage for each variable, so its a lot more typing than PROC STDIZE/PROC FREQ, and a lot more opportunities to make a typographical error or outright mistakes.

 

I think people should think not of SQL but the appropriate SAS PROCs when they need to do statistics on many variables. I encourage people to learn how to do this without SQL even in the case where there is only one variable; learning how to do this without using SQL is a good habit to get into (in my opinion).

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2964 views
  • 0 likes
  • 3 in conversation