BookmarkSubscribeRSS Feed
sharvey8
Calcite | Level 5

I am trying to run this code but it won't recognize the percentages so every answer is coming out at Yes for selective.

DATA University;

INFILE 'c:/Users/sharvey8/Desktop/University(1).txt' DLM='09'X DSD FIRSTOBS=2;

INPUT School :$40. State $ Location :$15. GradRate $ Cost $ Acceptance $ Undergrad $;

RUN;

PROC PRINT DATA=University;

RUN;

 

DATA University;

set University;

IF Acceptance <= 25 THEN Selective='Yes';

  ELSE IF Acceptance > 25 THEN Selective='No';

  RUN;

  

PROC PRINT DATA=University;

RUN;

11 REPLIES 11
Reeza
Super User
  1. The variable acceptance was read in a character variable. Math and aggregate calculations cannot be done on a character variable. Either change that in the INPUT statement based on your input data (not shown so we don't know) or convert it after the fact in the data step.
  2. Percentages are often represented as between 0 and 1 numerically, where 25% is really 0.25. Check the formatted and underlying value of the variable you want to analyze.
  3. Do not code using the same data set name in the DATA and SET statement. This overwrites your data and it makes it harder to debug and find issues.

You may need to tweak the informat to read in the percentages correctly but this is likely close to what you need. I would recommend you fix the data at the import stage though, not do a conversion after the fact if possible. It's a cleaner method.

*new data set name on the output;
DATA University_Categorized;
set University;

*convert to numeric;
acceptance_num = input(acceptance, percent12.);

*categorize;
IF acceptance_num <= 0.25 THEN Selective='Yes';
ELSE IF acceptance_num > 0.25 THEN Selective='No';

run;

  
PROC PRINT DATA=University_Categorized;
RUN;

@sharvey8 wrote:

I am trying to run this code but it won't recognize the percentages so every answer is coming out at Yes for selective.

DATA University;

INFILE 'c:/Users/sharvey8/Desktop/University(1).txt' DLM='09'X DSD FIRSTOBS=2;

INPUT School :$40. State $ Location :$15. GradRate $ Cost $ Acceptance $ Undergrad $;

RUN;

PROC PRINT DATA=University;

RUN;

 

DATA University;

set University;

IF Acceptance <= 25 THEN Selective='Yes';

  ELSE IF Acceptance > 25 THEN Selective='No';

  RUN;

  

PROC PRINT DATA=University;

RUN;


 

CurtisMackWSIPP
Lapis Lazuli | Level 10

It would have been more helpful if you had included the data.  I am assuming there are "%" in the Acceptance variable.  Since you are reading it with a "$", it is coming in as a string and this comparison will not work.  You need to use a percent informat like 

percent5.

So something like this.

 

DATA University;

INFILE datalines DLM=',' DSD ;
informat Acceptance percent5. cost dollar6.2;
INPUT School :$40. State $ Location :$15. GradRate $ Cost $ Acceptance  Undergrad $;

datalines;
MySchool,WA,Seattle,3,$300.23,%15,29
YourSchool,WA,Tacoma,3,$400.23,%20,29
;

RUN;
sharvey8
Calcite | Level 5

When I change the percentage informat like this: 

 

DATA University;

INFILE 'c:/Users/sharvey8/Desktop/University(1).txt' DLM='09'X DSD FIRSTOBS=2;

INPUT School :$40. State $ Location :$15. GradRate $ Cost $ Acceptance percent5. Undergrad $;

RUN;

 

PROC PRINT DATA=University;

RUN;

 

I get this Screen Shot 2020-10-01 at 4.19.48 PM.png

CurtisMackWSIPP
Lapis Lazuli | Level 10

What does the data in that field look like?

sharvey8
Calcite | Level 5

Screen Shot 2020-10-01 at 4.29.37 PM.png

Reeza
Super User
Show your log, it will have the information needed to debug this. You likely need a different informat.
sharvey8
Calcite | Level 5

My log does not have anything weird popping up as an error.Screen Shot 2020-10-01 at 4.31.23 PM.png

Reeza
Super User

From the data import step? There's no errors there? You just showed the log from the PROC PRINT not the data import step.

And please post it as text, not an image.

 


@sharvey8 wrote:

My log does not have anything weird popping up as an error.Screen Shot 2020-10-01 at 4.31.23 PM.png


 

sharvey8
Calcite | Level 5
I got this:

Undergrad=974 _ERROR_=1 _N_=20
NOTE: 50 records were read from the infile 'c:/Users/sharvey8/Desktop/University(1).txt'.
The minimum record length was 44.
The maximum record length was 86.
NOTE: The data set WORK.UNIVERSITY has 50 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.04 seconds
sharvey8
Calcite | Level 5
20 CHAR Indiana University-Bloomington.IN.Bloomington.78%.$13,428.77%.32991 67
ZONE 4666666256676776772466666667660440466666667660332023323330332033333
NUMR 9E491E105E96523949D2CFFD9E74FE99E92CFFD9E74FE97859413C4289775932991
School=Indiana University-Bloomington State=IN Location=Bloomington GradRate=. Cost=13,428
Acceptance=. Undergrad=2991 _ERROR_=1 _N_=19
NOTE: Invalid data for GradRate in line 21 27-31.
NOTE: Invalid data for Acceptance in line 21 39-43

Assuming there is something wrong with my GradRate and Acceptance.
Reeza
Super User
I would have expected that to have much more in it.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1070 views
  • 0 likes
  • 3 in conversation