I have the following data below. I would like to choose the obs where
first_year eq 1. I would like my data set to look like the table below
could someone assist me with this?
data data1;
infile datalines delimiter = ',';
input customerid $ year regdate:mmddyy10. completedate:mmddyy10. first_ssn
last_ssn first_year last_year;
format regdate completedate mmddyy10.;
datalines;
100, 2019,04/30/2019, 01/24/2018, 1,0,1,0
100,2018,12/31/2018,12/16/2019, 0,0,0,0
100,2018,11/30/2018,12/16/2019, 0,0,0,1
100,2019, 08/31/2019,12/16/2019,0,0,1,0
;
run;
customerid | year | regdate | completedate | first_ssn | last_ssn | first_year | last_year |
100 | 2019 | 4/30/2019 | 1/24/2018 | 1 | 0 | 1 | 0 |
100 | 2019 | 8/31/2019 | 12/16/2019 | 0 | 0 | 1 | 0 |
@luvscandy27 wrote:
The dataset above was created using first. last. I was trying to remove the obs I don't need using first.last.
I tried the code below but it does not keep all the obs where first_year eq 1:
data want;
set datal;
by ssn first_year;
if first.first_year;
run;
The IF FIRST.<variable name> syntax returns the first row of a sorted set and not all rows with a specific value.
Try one of the options below for what you want.
data data1;
infile datalines delimiter = ',';
input customerid $ year regdate:mmddyy10. completedate:mmddyy10. first_ssn
last_ssn first_year last_year;
format regdate completedate mmddyy10.;
datalines;
100, 2019,04/30/2019, 01/24/2018, 1,0,1,0
100,2018,12/31/2018,12/16/2019, 0,0,0,0
100,2018,11/30/2018,12/16/2019, 0,0,0,1
100,2019, 08/31/2019,12/16/2019,0,0,1,0
;
run;
proc print data=data1(where=(first_year=1));
run;
data want1;
set data1(where=(first_year=1));
run;
data want2;
set data1;
where first_year=1;
run;
data want3;
set data1;
if first_year=1;
run;
Generic form to create a subset of data based on an existing variable for a given value:
Data want;
set have;
where variable=1;
run;
You may not even need to create a new data set as you can use data set options such as
Proc print data=have (where=(variable=1));
run;
to select the records to use as needed.
The dataset above was created using first. last. I was trying to remove the obs I don't need using first.last.
I tried the code below but it does not keep all the obs where first_year eq 1:
data want;
set datal;
by ssn first_year;
if first.first_year;
run;
@luvscandy27 wrote:
The dataset above was created using first. last. I was trying to remove the obs I don't need using first.last.
I tried the code below but it does not keep all the obs where first_year eq 1:
data want;
set datal;
by ssn first_year;
if first.first_year;
run;
The IF FIRST.<variable name> syntax returns the first row of a sorted set and not all rows with a specific value.
Try one of the options below for what you want.
data data1;
infile datalines delimiter = ',';
input customerid $ year regdate:mmddyy10. completedate:mmddyy10. first_ssn
last_ssn first_year last_year;
format regdate completedate mmddyy10.;
datalines;
100, 2019,04/30/2019, 01/24/2018, 1,0,1,0
100,2018,12/31/2018,12/16/2019, 0,0,0,0
100,2018,11/30/2018,12/16/2019, 0,0,0,1
100,2019, 08/31/2019,12/16/2019,0,0,1,0
;
run;
proc print data=data1(where=(first_year=1));
run;
data want1;
set data1(where=(first_year=1));
run;
data want2;
set data1;
where first_year=1;
run;
data want3;
set data1;
if first_year=1;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.