- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to create a new variable that has a value of 'A' for the first 30 observations (from 1 to 30), and a value of 'B' for the remaining 30 observations (from 31 to 60). Here it is my code and the data set:
DATA draft;
INFILE "/folders/myfolders/2017NBADraft2.txt"
DLM=',' FIRSTOBS=2 DSD MISSOVER;
INPUT LastName $ FirstName $ Team $ Position $ Birthdate :ANYDTDTE10. Height Wingspan Weight College $ Year $;
RUN;
PROC PRINT DATA=draft;
RUN;
DATA draft;
SET draft;
IF Obs < 31 then New = 'A';
ELSE IF Obs > 30 then New = 'B';
RUN;
PROC PRINT DATA=draft;
RUN;
This is assigning A to all the variables and creating a new variable 'Obs'... how can I optimize this code in order to generate the desired output?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is no variable OBS in your data therefor it is assigned as missing value which is the lowest posibble value
and obviousley less then 31.
Change OBS by _N_, which is an internal counter, and you'll get what you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There is no variable OBS in your data therefor it is assigned as missing value which is the lowest posibble value
and obviousley less then 31.
Change OBS by _N_, which is an internal counter, and you'll get what you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I agree with @Shmuel, but you don't need the extra if. i.e.:
DATA draft;
SET draft;
IF _n_ < 31 then New = 'A';
ELSE New = 'B';
RUN;
Art, CEO, AnalystFinder.com