I have the following data below. I would like to choose the obs where
first_year eq 1. I would like my data set to look like the table below
could someone assist me with this?
data data1;
infile datalines delimiter = ',';
input customerid $ year regdate:mmddyy10. completedate:mmddyy10. first_ssn
last_ssn first_year last_year;
format regdate completedate mmddyy10.;
datalines;
100, 2019,04/30/2019, 01/24/2018, 1,0,1,0
100,2018,12/31/2018,12/16/2019, 0,0,0,0
100,2018,11/30/2018,12/16/2019, 0,0,0,1
100,2019, 08/31/2019,12/16/2019,0,0,1,0
;
run;
customerid | year | regdate | completedate | first_ssn | last_ssn | first_year | last_year |
100 | 2019 | 4/30/2019 | 1/24/2018 | 1 | 0 | 1 | 0 |
100 | 2019 | 8/31/2019 | 12/16/2019 | 0 | 0 | 1 | 0 |
@luvscandy27 wrote:
The dataset above was created using first. last. I was trying to remove the obs I don't need using first.last.
I tried the code below but it does not keep all the obs where first_year eq 1:
data want;
set datal;
by ssn first_year;
if first.first_year;
run;
The IF FIRST.<variable name> syntax returns the first row of a sorted set and not all rows with a specific value.
Try one of the options below for what you want.
data data1;
infile datalines delimiter = ',';
input customerid $ year regdate:mmddyy10. completedate:mmddyy10. first_ssn
last_ssn first_year last_year;
format regdate completedate mmddyy10.;
datalines;
100, 2019,04/30/2019, 01/24/2018, 1,0,1,0
100,2018,12/31/2018,12/16/2019, 0,0,0,0
100,2018,11/30/2018,12/16/2019, 0,0,0,1
100,2019, 08/31/2019,12/16/2019,0,0,1,0
;
run;
proc print data=data1(where=(first_year=1));
run;
data want1;
set data1(where=(first_year=1));
run;
data want2;
set data1;
where first_year=1;
run;
data want3;
set data1;
if first_year=1;
run;
Generic form to create a subset of data based on an existing variable for a given value:
Data want;
set have;
where variable=1;
run;
You may not even need to create a new data set as you can use data set options such as
Proc print data=have (where=(variable=1));
run;
to select the records to use as needed.
The dataset above was created using first. last. I was trying to remove the obs I don't need using first.last.
I tried the code below but it does not keep all the obs where first_year eq 1:
data want;
set datal;
by ssn first_year;
if first.first_year;
run;
@luvscandy27 wrote:
The dataset above was created using first. last. I was trying to remove the obs I don't need using first.last.
I tried the code below but it does not keep all the obs where first_year eq 1:
data want;
set datal;
by ssn first_year;
if first.first_year;
run;
The IF FIRST.<variable name> syntax returns the first row of a sorted set and not all rows with a specific value.
Try one of the options below for what you want.
data data1;
infile datalines delimiter = ',';
input customerid $ year regdate:mmddyy10. completedate:mmddyy10. first_ssn
last_ssn first_year last_year;
format regdate completedate mmddyy10.;
datalines;
100, 2019,04/30/2019, 01/24/2018, 1,0,1,0
100,2018,12/31/2018,12/16/2019, 0,0,0,0
100,2018,11/30/2018,12/16/2019, 0,0,0,1
100,2019, 08/31/2019,12/16/2019,0,0,1,0
;
run;
proc print data=data1(where=(first_year=1));
run;
data want1;
set data1(where=(first_year=1));
run;
data want2;
set data1;
where first_year=1;
run;
data want3;
set data1;
if first_year=1;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.