BookmarkSubscribeRSS Feed
rajhwkeudkslap
Calcite | Level 5

Hello I have a question about trying to set a new data set from a raw dataset

 

This is part of my SAS code

 

data newdata;

input name$ age maj$ score

cards;

sara 25 nursing .9

kim 26 dance 1

charlie 21 psychology 4.3

anna 18 dance .45

run; 

 

I have to make a new data set using if then statements using the SET statement so I try to do this

 

data newdata; 

set newdata_2;

if maj nursing psychology then newmaj = sci

if maj dance then newmajo = art

if score <1 then newscore = low

if score <=1 then newscore = mid

if score <=3 then newscore = high;

run; 

 

I get errors and it does not work. any help would be appreciated! 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

If you get errors in the log, show us the ENTIRE log from this code. Do not show us parts of the log. Please copy the log as text and paste it into the window that appears when you click on the </> icon.

PaigeMiller_0-1663012019648.png

 

If you get incorrect output, show us the incorrect output and explain or show us what the desired output will be.

--
Paige Miller
rajhwkeudkslap
Calcite | Level 5
okay sorry i am new to this website. i posted the log code and i realized some errors in the code that i fixed and will reply with new code in code icon
rajhwkeudkslap
Calcite | Level 5
163  data newdata;
164  input name$ age maj$ score;
165  cards;

NOTE: The data set WORK.NEWDATA has 4 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


170  run;
171  data newdata;
172  set newdata_2;
ERROR: File WORK.NEWDATA_2.DATA does not exist.
173  if maj nursing psychology then newmaj = sci;
            -------
            388
            76
174  if maj dance then newmajo = art;
            -----
            388
            76
ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

175  if score <1 then newscore = low;
176  if score <=1 then newscore = mid;
177  if score <=3 then newscore = high;
178  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.NEWDATA may be incomplete.  When this step was
         stopped there were 0 observations and 5 variables.
WARNING: Data set WORK.NEWDATA was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


rajhwkeudkslap
Calcite | Level 5
data newdata;
input name$ age maj$ score;
cards;
sara 25 nursing .9
kim 26 dance 1
charlie 21 psychology 4.3
anna 18 dance .45
run; 
data newdata; 
set newdata_2;
if maj nursing psychology then newmaj = sci;
if maj dance then newmajo = art;
if score <1 then newscore = low;
if score <=1 then newscore = mid;
if score <=3 then newscore = high;
run; 
ballardw
Super User

Character values must appear inside quotes, single or double but the quotes are needed and must be the same, otherwise thinks that you mean that it is the name of a variable.

If you want to see if a variable is equal to a single value:

if maj='nursing' then <do something>

IF you want to see if a variable is one of a list of values then the operator is IN

if maj IN ( 'nursing' 'psychology') then newmaj = 'sci';

or instead of

if maj dance then newmajo = art;

 I think you want (assumes you want the same named variable as the SCI goes into)

if maj= 'dance' then newmaj = 'art';

When you use ranges of numeric values you quite often want to use an if/then/else and be pretty specific about the ranges. All of your values are <=3 so everything with your code would end up in the High

<1 and <=1 only differ by the 1 being included. Do you only want a value of 1 to be "mid"? if so

you may be wanting:(but I doubt it as you don't have anything that assigns a value to newscore for the 4.3 score. So I think you need to describe in words which range is for which value).

length   newscore $ 4;
if score <1 then newscore = 'low';
else if score =1 then newscore = 'mid';
else if score <=3 then newscore = 'high';

SAS by default sets the length of a new character value based on the first use. Since the first time you use newscore would be assign 'low' it would have a length of 3 characters which means that 'high' will not all fit.

 

Tom
Super User Tom
Super User

Let's break down what your code asked SAS to do.

First you have a data step to create a work dataset named NEWDATA.

SAS does not care that you forgot to add a space after the end of the variable name MAJ since $ is not a valid character to include in a variable name it was able to work out that you meant the variables NAME and MAJ to be defined a character instead of numeric.  Since you did not set a length for NAME and MAJ it will default to length of $8.  You also did not end the lines of data, but since you have a line with a semicolon on it SAS will use that as marking the end of the data and just ignore the other characters like "RUN"  on the line.

 

A more complete data step might look like this instead:

data newdata;
  length name $7 age 8 maj $10 score 8;
  input name age maj score;
cards;
sara 25 nursing .9
kim 26 dance 1
charlie 21 psychology 4.3
anna 18 dance .45
; 

You then tried to run another data step to replace NEWDATA with a new dataset with same name.   That step will read from a dataset name NEWDATA_2 that you never defined before.  Does NEWDATA_2 exist?  Does it have the same variables as NEWDATA?

 

Perhaps you meant instead to create NEWDATA_2 by reading in NEWDATA?

data newdata_2; 
  set newdata;

The rest of the data step appears to be an attempt to write IF/THEN statements.

 

Let's look at the last 3 first since they at least look like valid SYNTAX.

if score <1 then newscore = low;
if score <=1 then newscore = mid;
if score <=3 then newscore = high;

So when SCORE is less than 1 you set NEWSCORE to the value LOW.  But there is no variable named LOW in the NEWDATA dataset.  So SAS will create a new variable and default its value to missing.  Similarly for the variables MID and HIGH.

 

Did you want variable NEWSCORE to be character?  If so you should first define it as such.  What length will it need to store the longest possible value?  Perhaps only 4 bytes to store 'high'?

length newscore $4;
if score <1 then newscore = 'low';
if score <=1 then newscore = 'mid';
if score <=3 then newscore = 'high';

Now we need to look a the logic error of these three statements when taken together.  If SCORE is less then 1 then NEWSCORE is set to 'low' by the first IF/THEN. It will then be set to 'mid' by the second and finally it will be set to 'high' by the last.

You should probably use some ELSE statements in there so that when one test succeeds the other tests are skipped.

if score <1 then newscore = 'low';
else if score <=1 then newscore = 'mid';
else if score <=3 then newscore = 'high';

Now the only logic errors are what to do with missing values of SCORE.  With this code those will cause NEWSORE to be set to 'low' since a missing value is smaller than any actual number.  And what about values of SCORE that are larger than 3?  Currently for those the value of NEWSCORE will be blank since none of the conditions will be true.

 

Perhaps you meant that values larger than 3 should by HIGH (your example data has a value of  4.3).   In that case you might want to do something like this instead.

if score >3 then newscore = 'high';
else if score >= 1 then newscore = 'mid';
else if score >= 0 then newscore = 'low';

Now  values larger than 3 will have NEWSCORE='high' and values from 1 to 3, inclusive, will ahve NEWSCORE='mid' and values from zero to less than 1 will have newscore = 'low' and missing values and negative values will have newscore=' '.

 

Now back to the other two IF / THEN statements.

if maj nursing psychology then newmaj = sci;
if maj dance then newmajo = art;

These are invalid syntax since you cannot have two or three variables listed in a row without operators between them.

 

I suspect that you only wanted MAJ to be a variable reference and the others to string literals so as to test if the value of MAJ is one of those other strings.  You can use the IN operator for lists of one or more values.  And if the list is only one value you can use the = (equality test) operator.

 

Also do you want to create two different new variables?  NEWMAJ and NEWMAJO ? Or just one?  And again how long should it be defined to store the longest value it will need to hold?

length newmaj $3 ;
if maj in ('nursing' 'psychology') then newmaj = 'sci';
else if maj='dance' then newmaj = 'art';

The RUN; statement looks right.  Since there is no in-line data to end the code for the data step adding a RUN will let SAS (and other programmers reading your code) that you have finished defining the data step.

 

Patrick
Opal | Level 21

To add to all the explanations already provided: For recoding values using SAS formats instead of if/then/else constructs is another and often quite efficient option. 

With formats it's often also not necessary to create new variables because many SAS procedures allow for direct use of formats (see examples below).

data have;
  input name$ age maj :$20. score;
  cards;
sara 25 nursing .9
kim 26 dance 1
charlie 21 psychology 4.3
anna 18 dance .45
;
run;

proc format;
  value $maj
    'nursing','psychology' = 'sci'
    'dance'                = 'art'
    other                  = 'other'
    ;
  value score
    low -< 1 = 'low'
    1 -< 3   = 'mid'
    3 - high = 'high'
    ;
run;

/* create new variable using formats */
data want;
  set have;
  newmaj=put(maj,$maj.);
  newscore=put(score,score.);
run;


/** examples using original values with formats for reporting */
proc print data=have;
  format maj $maj. score score.;
run;
 
proc freq data=have;
  format maj $maj.;
  table maj;
run;

Patrick_0-1695863337061.png

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1433 views
  • 5 likes
  • 5 in conversation