Solved: How to output to differing data sets if a value is present across any ...

sasgorilla

I have a data set that has IDs, dates, (other variables excluded below), and a "type" variable:

data A;
input id date :yymmdd10.  type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;

If an ID ONLYhas a type value of "0", I want those id's observations to go to one dataset. If an id EVER has values of "1" I want those observations to go to another data set. See below:

data zerotype;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
4 2020-01-30 0
;
run;

data onetype;
input id date :yymmdd10.  type;
format date yymmdd10.;
datalines;
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
5 2018-10-10 1
5 2019-10-11 1
;
run;

How could I do this in a datastep? Thank you!

Patrick

And here another option.

data zerotype onetype;
	if _n_=1 then
		do;
			dcl hash h1(dataset:'have(where=(type=1))');
			h1.defineKey('id');
			h1.defineDone();
		end;
	set have;
	if h1.check()=0 then output onetype;
	else output zerotype;
run;

View solution in original post

SASKiwi

There are probably more elegant ways but you could try this:

data Have;
input id date :yymmdd10.  type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;
data onetype
     zerotype;
  keep id date type;
  merge Have (in = H1)
        Have (in = H2 
              where = (type2 = 1)
              keep = id type
              rename = (type = type2))
        ;
  by id;
  retain onetype;
  if first.id then do;
     if H2 then onetype = 1;
     else onetype = 0;
  end;
  if onetype then output onetype;
  else output zerotype; 
run;

Tom

That works, but your data step is too complex.

You can just use the IN= flag (or the TYPE2 copy of TYPE) to drive the decision.

data onetype
     zerotype
;
  keep id date type;
  merge Have (in = H1)
        Have (in = H2 
              keep = id type
              rename = (type = type2)
              where = (type2 = 1)
             )
  ;
  by id;
  if H2 then output onetype;
  else output zerotype;
  drop type2;
run;

Astounding

@Tom ,

Will this work with the WHERE= that you have:

 where = (type2 = 1)

or will it require referring to the original variable name?

 where = (type = 1)

Tom

@Astounding wrote:

@Tom ,

Will this work with the WHERE= that you have:
 where = (type2 = 1)
or will it require referring to the original variable name?
 where = (type = 1)

Short answer is No.

The Drop=/Keep=,Rename=, and Where= dataset options are processed in alphabetical order. So if you have the RENAME= option then there is no longer a variable named TYPE to reference in the WHERE= option. You need to reference it with the new name.

And if you do not have the RENAME= option so that the original name of TYPE is still used then the values of TYPE in the resulting dataset will be jumbled. However you could fix that by reordering the datasets in the SET statement so that the copy that has all of the observations is listed last, so the real value of TYPE will overwrite the copy that only has 1's.

data onetype zerotype ;
  merge have(in=type1 keep=id type where=(type=1)) have;
  by id;
  if type1 then output onetype;
  else output zerotype;
run;

Reeza

Sort by ID and DESCENDING type. Then if there's any 1's it will be at the top and you can decide where to output the record.

data A;
input id date :yymmdd10.  type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;

proc sort data=a;
by id descending type;
run;

data zerotype onetype;
set a;
by id descending type;

retain output_type;
length output_type $8.;

if first.id and type=1 then output_type='one';
else if first.id and type=0 then output_type='zero';

if output_type = 'zero' then output zerotype;
if output_type = 'one' then output onetype;

drop output_type;

run;

proc sort data=zerotype; by id date; run;
proc sort data=onetype; by id date; run;

@sasgorilla wrote:

I have a data set that has IDs, dates, (other variables excluded below), and a "type" variable:
data A;
input id date :yymmdd10.  type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;
If an ID ONLYhas a type value of "0", I want those id's observations to go to one dataset. If an id EVER has values of "1" I want those observations to go to another data set. See below:
data zerotype;
input id date :yymmdd10. type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
4 2020-01-30 0
;
run;

data onetype;
input id date :yymmdd10.  type;
format date yymmdd10.;
datalines;
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
5 2018-10-10 1
5 2019-10-11 1
;
run;
How could I do this in a datastep? Thank you!

Patrick

And here another option.

data zerotype onetype;
	if _n_=1 then
		do;
			dcl hash h1(dataset:'have(where=(type=1))');
			h1.defineKey('id');
			h1.defineDone();
		end;
	set have;
	if h1.check()=0 then output onetype;
	else output zerotype;
run;

sasgorilla

This worked perfectly and was the briefest way to do. Admittedly, this is the most difficult code for me to understand as I don't recognize things like dcl, hash, h1 and the definekey/definedone. I see it is known as the hash object and I will look into it further.

Thanks again!

Ksharp

If you are not familiar with Hash Table, try PROC SQL:

data A;
input id date :yymmdd10.  type;
format date yymmdd10.;
datalines;
1 2020-01-01 0
1 2018-07-06 0
2 2015-02-15 1
3 2020-02-01 0
3 2021-07-05 1
3 2021-08-09 0
4 2020-01-30 0
5 2018-10-10 1
5 2019-10-11 1
;
RUN;

proc sql;
create table zero_type as
select * from A where id not in (select id from A where type=1);

create table one_type as
select * from A where id  in (select id from A where type=1);
quit;

How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

Re: How to output to differing data sets if a value is present across any observations by id

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away