I have a data set that has IDs, dates, (other variables excluded below), and a "type" variable:
data A;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;
If an ID ONLYhas a type value of "0", I want those id's observations to go to one dataset. If an id EVER has values of "1" I want those observations to go to another data set. See below:
data zerotype;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
4 2020-01-30 0
;
run;
data onetype;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
5 2018-10-10 1
5 2019-10-11 1
;
run;
How could I do this in a datastep? Thank you!
And here another option.
data zerotype onetype;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(type=1))');
h1.defineKey('id');
h1.defineDone();
end;
set have;
if h1.check()=0 then output onetype;
else output zerotype;
run;
There are probably more elegant ways but you could try this:
data Have;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;
data onetype
zerotype;
keep id date type;
merge Have (in = H1)
Have (in = H2
where = (type2 = 1)
keep = id type
rename = (type = type2))
;
by id;
retain onetype;
if first.id then do;
if H2 then onetype = 1;
else onetype = 0;
end;
if onetype then output onetype;
else output zerotype;
run;
That works, but your data step is too complex.
You can just use the IN= flag (or the TYPE2 copy of TYPE) to drive the decision.
data onetype
zerotype
;
keep id date type;
merge Have (in = H1)
Have (in = H2
keep = id type
rename = (type = type2)
where = (type2 = 1)
)
;
by id;
if H2 then output onetype;
else output zerotype;
drop type2;
run;
@Tom ,
Will this work with the WHERE= that you have:
where = (type2 = 1)
or will it require referring to the original variable name?
where = (type = 1)
@Astounding wrote:
@Tom ,
Will this work with the WHERE= that you have:
where = (type2 = 1)
or will it require referring to the original variable name?
where = (type = 1)
Short answer is No.
The Drop=/Keep=,Rename=, and Where= dataset options are processed in alphabetical order. So if you have the RENAME= option then there is no longer a variable named TYPE to reference in the WHERE= option. You need to reference it with the new name.
And if you do not have the RENAME= option so that the original name of TYPE is still used then the values of TYPE in the resulting dataset will be jumbled. However you could fix that by reordering the datasets in the SET statement so that the copy that has all of the observations is listed last, so the real value of TYPE will overwrite the copy that only has 1's.
data onetype zerotype ;
merge have(in=type1 keep=id type where=(type=1)) have;
by id;
if type1 then output onetype;
else output zerotype;
run;
Sort by ID and DESCENDING type. Then if there's any 1's it will be at the top and you can decide where to output the record.
data A;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;
proc sort data=a;
by id descending type;
run;
data zerotype onetype;
set a;
by id descending type;
retain output_type;
length output_type $8.;
if first.id and type=1 then output_type='one';
else if first.id and type=0 then output_type='zero';
if output_type = 'zero' then output zerotype;
if output_type = 'one' then output onetype;
drop output_type;
run;
proc sort data=zerotype; by id date; run;
proc sort data=onetype; by id date; run;
@sasgorilla wrote:
I have a data set that has IDs, dates, (other variables excluded below), and a "type" variable:
data A; input id date :yymmdd10. type; format date yymmdd10.; datalines; 1 2020-01-01 0 1 2018-07-06 0 2 2015-02-15 1 3 2020-02-01 0 3 2021-07-05 1 3 2021-08-09 0 4 2020-01-30 0 5 2018-10-10 1 5 2019-10-11 1 ; RUN;
If an ID ONLYhas a type value of "0", I want those id's observations to go to one dataset. If an id EVER has values of "1" I want those observations to go to another data set. See below:
data zerotype; input id date :yymmdd10. type; format date yymmdd10.; datalines; 1 2020-01-01 0 1 2018-07-06 0 4 2020-01-30 0 ; run; data onetype; input id date :yymmdd10. type; format date yymmdd10.; datalines; 2 2015-02-15 1 3 2020-02-01 0 3 2021-07-05 1 3 2021-08-09 0 5 2018-10-10 1 5 2019-10-11 1 ; run;
How could I do this in a datastep? Thank you!
And here another option.
data zerotype onetype;
if _n_=1 then
do;
dcl hash h1(dataset:'have(where=(type=1))');
h1.defineKey('id');
h1.defineDone();
end;
set have;
if h1.check()=0 then output onetype;
else output zerotype;
run;
This worked perfectly and was the briefest way to do. Admittedly, this is the most difficult code for me to understand as I don't recognize things like dcl, hash, h1 and the definekey/definedone. I see it is known as the hash object and I will look into it further.
Thanks again!
If you are not familiar with Hash Table, try PROC SQL:
data A;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;
proc sql;
create table zero_type as
select * from A where id not in (select id from A where type=1);
create table one_type as
select * from A where id in (select id from A where type=1);
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.