BookmarkSubscribeRSS Feed
Sandy10
Calcite | Level 5

Hi. I have a data set like below. I need to find the date where the first occurrence of 100 occurs for a ID. Keep the counter as 1. Transpose is not working. So couldn't use  first. and last. Its a huge dataset. Thanks

 

Id30-Apr-1831-May-1830-Jun-18
1120100200
25020075
38990120
10 REPLIES 10
novinosrin
Tourmaline | Level 20

Please take a look at  wHICHN/CHOOSEN group of Functions. 

Sandy10
Calcite | Level 5

Guess that might not give wat I wnat. It counts from backwards oif there is a negative value. I don want that. The table do have negative value.

ChrisNZ
Tourmaline | Level 20

> the first occurrence of 100 

> Guess that might not give wat I wnat.

Guess you didn't look properly or explain your problem properly. @novinosrin 's solution solves your need.

Also please check your typos (I make many typos too, so I feel free to mention that!).

 

Reading further, you did indeed misrepresent your problem.

Please try to avoid that to save everyone's time. Especially your helpers' time.

mkeintz
PROC Star

Please present your data as a sas data step.  It looks like you have variables named '30-Apr-18'n, '31-May-18'n, and '30-Jun-18'n.   Possible but extremely unlikely.  So show us what it actually looks like - help us help you.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

Transpose works for me, see:

options validvarname=any;

data have;
input Id $ '30-Apr-18'n '31-May-18'n '30-Jun-18'n;
datalines;
1 120 100 200
2 50 200 75
3 89 90 120
;

proc transpose data=have out=trans;
by id;
var _numeric_;
run;

data long;
set trans;
date = input(_name_,date9.);
format date yymmddd10.;
drop _name_;
rename col1=value;
run;

Define what you see as "huge". Be precise in numbers: number of observations, number of variables, observation size.

Also post what you expect as a result. Do you only want to keep certain observations, do you want to set a variable, what if a value of 100 does not appear for a certain id; what do you mean by "keep counter as 1"?

Sandy10
Calcite | Level 5
Hi Kurt. Thansk for yoiur reply. I require below output: basically retireve date valuw where first occurence occured.

ID firstoccurenceofmorethan100
1 30apr2018
2 31may2018
3 30jun2018
sustagens
Pyrite | Level 9

Add this

data want;
set long;
by id date;
where value>100;
if not first.id then delete;
drop value;
run;
Kurt_Bremser
Super User

Another example what I mean by "being precise":

in your initial post, you asked for "the date where the first occurrence of 100 occurs", but now you name your new variable "firstoccurenceofmorethan100", so you are looking for 100 or greater, and not just exactly 100; this is substantial information!

 

So now we have to add a sort (just to be sure) and a data step that does the selection:

options validvarname=any;

data have;
input Id $ '30-Apr-18'n '31-May-18'n '30-Jun-18'n;
datalines;
1 120 100 200
2 50 200 75
3 89 90 120
;

proc transpose data=have out=trans;
by id;
var _numeric_;
run;

data long;
set trans;
date = input(_name_,date9.);
format date yymmddd10.;
drop _name_;
rename col1=value;
run;

proc sort data=long;
by id date;
run;

data want;
set long (where=(value >= 100));
by id;
if first.id;
run;

proc print data=want noobs;
var id date value;
run;

Result:

Id	date	value
1	2018-04-30	120
2	2018-05-31	200
3	2018-06-30	120

 

 

Edit: complete code (original post had old contents of clipboard).

 

mkeintz
PROC Star

I don't think I see the need for a proc transpose.

 

If

  1. Each ID has only one observation (i.e. one row with some number of dates)
  2. The variable names that represent dates are already in chronological order

then I think it should be a single data step:

 

options validvarname=any;
data have;
input Id $ '30-Apr-18'n '31-May-18'n '30-Jun-18'n;
datalines;
1 120 100 200
2 50 200 75
3 89 90 120
4  1  2   3
;

data want (keep=id date value);
  set have ;
  array dts {*} _numeric_;
  do i=1 to dim(dts) until (value>=100);
    value=dts{i};
  end;
  if i>dim(dts) then value =.;
  else date=input(vname(dts{i}),date9.);
  format date date9.;
run;

This program also assumes that the only numeric variables are the ones whose names represent dates.

 

Also I put in a 4th row with no qualifying values.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

Of course we can solve this with the original structure, but compare your array code with this:

data want;
set long (where=(value >= 100));
by id;
if first.id;
run;

Maxim 33: Intelligent Data Makes for Intelligent Programs, and I take every opportunity to show newbies how to properly design data.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3518 views
  • 1 like
  • 6 in conversation