Finding first occurence in a dataset

Sandy10 · Posted 05-04-2020 09:34 PM

Hi. I have a data set like below. I need to find the date where the first occurrence of 100 occurs for a ID. Keep the counter as 1. Transpose is not working. So couldn't use first. and last. Its a huge dataset. Thanks

Id	30-Apr-18	31-May-18	30-Jun-18
1	120	100	200
2	50	200	75
3	89	90	120

novinosrin · Posted 05-04-2020 09:44 PM

Please take a look at wHICHN/CHOOSEN group of Functions.

Sandy10 · Posted 05-05-2020 12:55 AM

Guess that might not give wat I wnat. It counts from backwards oif there is a negative value. I don want that. The table do have negative value.

ChrisNZ · Posted 05-06-2020 12:04 AM

> the first occurrence of 100

> Guess that might not give wat I wnat.

Guess you didn't look properly or explain your problem properly. @novinosrin 's solution solves your need.

Also please check your typos (I make many typos too, so I feel free to mention that!).

Reading further, you did indeed misrepresent your problem.

Please try to avoid that to save everyone's time. Especially your helpers' time.

High-Performance SAS Coding - Third Edition

mkeintz · Posted 05-05-2020 12:57 AM

Please present your data as a sas data step. It looks like you have variables named '30-Apr-18'n, '31-May-18'n, and '30-Jun-18'n. Possible but extremely unlikely. So show us what it actually looks like - help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Kurt_Bremser · Posted 05-05-2020 02:36 AM

Transpose works for me, see:

options validvarname=any;

data have;
input Id $ '30-Apr-18'n '31-May-18'n '30-Jun-18'n;
datalines;
1 120 100 200
2 50 200 75
3 89 90 120
;

proc transpose data=have out=trans;
by id;
var _numeric_;
run;

data long;
set trans;
date = input(_name_,date9.);
format date yymmddd10.;
drop _name_;
rename col1=value;
run;

Define what you see as "huge". Be precise in numbers: number of observations, number of variables, observation size.

Also post what you expect as a result. Do you only want to keep certain observations, do you want to set a variable, what if a value of 100 does not appear for a certain id; what do you mean by "keep counter as 1"?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Sandy10 · Posted 05-05-2020 03:41 AM

Hi Kurt. Thansk for yoiur reply. I require below output: basically retireve date valuw where first occurence occured.

ID firstoccurenceofmorethan100
1 30apr2018
2 31may2018
3 30jun2018

sustagens · Posted 05-05-2020 03:56 AM

Add this

data want;
set long;
by id date;
where value>100;
if not first.id then delete;
drop value;
run;

Kurt_Bremser · Posted 05-05-2020 03:59 AM

Another example what I mean by "being precise":

in your initial post, you asked for "the date where the first occurrence of 100 occurs", but now you name your new variable "firstoccurenceofmorethan100", so you are looking for 100 or greater, and not just exactly 100; this is substantial information!

So now we have to add a sort (just to be sure) and a data step that does the selection:

options validvarname=any;

data have;
input Id $ '30-Apr-18'n '31-May-18'n '30-Jun-18'n;
datalines;
1 120 100 200
2 50 200 75
3 89 90 120
;

proc transpose data=have out=trans;
by id;
var _numeric_;
run;

data long;
set trans;
date = input(_name_,date9.);
format date yymmddd10.;
drop _name_;
rename col1=value;
run;

proc sort data=long;
by id date;
run;

data want;
set long (where=(value >= 100));
by id;
if first.id;
run;

proc print data=want noobs;
var id date value;
run;

Result:

Id	date	value
1	2018-04-30	120
2	2018-05-31	200
3	2018-06-30	120

Edit: complete code (original post had old contents of clipboard).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 05-05-2020 11:05 PM

I don't think I see the need for a proc transpose.

If

Each ID has only one observation (i.e. one row with some number of dates)
The variable names that represent dates are already in chronological order

then I think it should be a single data step:

options validvarname=any;
data have;
input Id $ '30-Apr-18'n '31-May-18'n '30-Jun-18'n;
datalines;
1 120 100 200
2 50 200 75
3 89 90 120
4  1  2   3
;

data want (keep=id date value);
  set have ;
  array dts {*} _numeric_;
  do i=1 to dim(dts) until (value>=100);
    value=dts{i};
  end;
  if i>dim(dts) then value =.;
  else date=input(vname(dts{i}),date9.);
  format date date9.;
run;

This program also assumes that the only numeric variables are the ones whose names represent dates.

Also I put in a 4th row with no qualifying values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Kurt_Bremser · Posted 05-06-2020 12:44 AM

Of course we can solve this with the original structure, but compare your array code with this:

data want;
set long (where=(value >= 100));
by id;
if first.id;
run;

Maxim 33: Intelligent Data Makes for Intelligent Programs, and I take every opportunity to show newbies how to properly design data.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

Re: Finding first occurence in a dataset

SAS Innovate 2025: Call for Content

Classroom Training Available!