BookmarkSubscribeRSS Feed
Mscarboncopy
Pyrite | Level 9

Hi

 

I have a variable "date" that can be filled out a max of 175 times for each ID, but I get anywhere from 10 to 50 maybe one or two Ids would have 175 dates. I need to be able to run frequencies to find out - for each id- what was the last date filled out. 

 

Is there a data step where I can create a variable "X" that would allow me to then run frequencies for the last date to know which date was the last filled out for each ID?

 

I was thinking of the if first then but use if last instead, would that work ?

 

if last.date then X= date;

 

I am thinking this would single out the last date for each ID in one general command. Am I thinking correctly ?

 

I also need to do a similar thing but in this next case I need to sum all of the other variables, but only those that were coded as 1.

 

I will give the example of variable "Y"

 

Variable Y could have been filled out 175 times but I can get anything from 1 to 175 for each of my Ids.

I already did an array to be able to single out the ones that were filled out as 1 but now I need to be able to sum them and they will vary per ID. 

So one Id will have 10 - and another would have 5 etc..It is a stacked file.

 

I need to have a way to sum only the variables that were filled out (=1) and leave the missing out of the sum. To do this for each ID would take too long, as it varies. I would need to know which ones to keep and which ones to leave out of the sum.

Is there a general command that would allow me to create a variable that would be the sum of that variable Y ? but only for the ones that were filled out =1? Leaving the missing out? 

 

 

TIA

 

 

15 REPLIES 15
Astounding
PROC Star

For each ID, do you have up to 175 observations?

 

Or each each ID, do you have 175 variables, some of which may contain a missing value?

 

Either way, the programming is straightforward, but the answer depends on what the data looks like.

Mscarboncopy
Pyrite | Level 9

I have 175 possible dates of which some might be missing and I need to single out the last date for each ID.

 

For my other variables - not the date variables but the variables I created an array for - I have about 200 variables. Variable Y was one example of these 200 variables. 

 

These 200 variables file is a stacked file, so I will have one ID with 5 entries for variable Y

I will have another ID with 100 entries for variable Y and another with 50 entries for variable Y and I wanted to be able to have the total without having to sum the up manually. 

 

I have about 230 Ids so it would be a lot to do the sum Id by Id and for 200 different variables ! 

 

Thanks.

 

 

 

 

 

Astounding
PROC Star

If you set up arrays in a DATA step, computations might be relatively simple.  For example:

 

data want;

set have;

array dates {175} date1-date175;

array yvals {175} y1-y175;

 

total_y = sum(of yvals{*});

lastdate = max(of dates{*});

run;

 

The logic for LASTDATE might depend on whether the dates are always in order, and whether any missing values might appear in the middle of the set of values.  This sample code captures the latest date, regardless of which variable contains that value.

novinosrin
Tourmaline | Level 20

Hi @Mscarboncopy  Can you plz post a sample of your data and your required output?

Mscarboncopy
Pyrite | Level 9

It looks something like this but I have 175 dates and 200 + ids.  I need to be able to have the last date filled out for each ID - thus I thought of creating a variable X that would be the last date for each, as you can see the last date can be anywhere.

IDdate1date2date3date4date5date6date7date8date9date10date11
11/1/2002/2/2006         
21/2/20032/2/20053/3/200810/10/2009       
32/6/20073/3/200810/10/200811/11/200812/19/201012/20/2011     
43/7/20084/4/20095/5/2010        
55/5/20106/6/201110/10/200811/11/200812/19/201012/20/20111/1/20123/3/20125/5/2013  
66/6/20111/1/20123/5/2013        
71/1/20052/2/20053/3/200810/10/20099/9/2010      
810/3/200011/13/20016/16/20057/7/200812/19/201012/20/20111/1/20123/3/20125/5/201310/3/201411/3/2014
911/11/200112/19/20116/7/20127/14/2012       
102/2/2006          
novinosrin
Tourmaline | Level 20

ok, lead us from here

 

data have;
infile cards truncover;
input ID	(date1	date2	date3	date4	date5	date6	date7	date8	date9	date10	date11) (:mmddyy10.);
format date: mmddyy10.;
cards;
1	1/1/2002	2/2/2006	 	 	 	 	 	 	 	 	 
2	1/2/2003	2/2/2005	3/3/2008	10/10/2009
;

data want;
set have;
array t(*) date:;
last_date=t(n(of t(*)));
format last_date mmddyy10.;
run;

is my understading correct?

Mscarboncopy
Pyrite | Level 9
data want;
set have;
array t(*) date:;
last_date=t(n(of t(*)));
format last_date mmddyy10.;
run;

I don't know. I don't understand the array, shouldn't the array be all of the date variables ?  

Someone else suggested this and I can understand it a little better this way:

 

array dates {175} date1-date175;

lastdate = max(of dates{*});

 

Would this be the same idea of what you were thinking? I have not been able to try it yet but will do asap.

Thanks

 

 

ballardw
Super User

@Mscarboncopy wrote:
data want;
set have;
array t(*) date:;
last_date=t(n(of t(*)));
format last_date mmddyy10.;
run;

I don't know. I don't understand the array, shouldn't the array be all of the date variables ?  

Someone else suggested this and I can understand it a little better this way:

 

array dates {175} date1-date175;

lastdate = max(of dates{*});

 

Would this be the same idea of what you were thinking? I have not been able to try it yet but will do asap.

Thanks

 

 


The array definition of: array t(*) date:;

uses another of the SAS shorthand lists the colon following the variable name in DATE: part says "use all variables whose names start with DATE.

ballardw
Super User

@Mscarboncopy wrote:

It looks something like this but I have 175 dates and 200 + ids.  I need to be able to have the last date filled out for each ID - thus I thought of creating a variable X that would be the last date for each, as you can see the last date can be anywhere.

ID date1 date2 date3 date4 date5 date6 date7 date8 date9 date10 date11
1 1/1/200 2/2/2006                  
2 1/2/2003 2/2/2005 3/3/2008 10/10/2009              
3 2/6/2007 3/3/2008 10/10/2008 11/11/2008 12/19/2010 12/20/2011          
4 3/7/2008 4/4/2009 5/5/2010                
5 5/5/2010 6/6/2011 10/10/2008 11/11/2008 12/19/2010 12/20/2011 1/1/2012 3/3/2012 5/5/2013    
6 6/6/2011 1/1/2012 3/5/2013                
7 1/1/2005 2/2/2005 3/3/2008 10/10/2009 9/9/2010            
8 10/3/2000 11/13/2001 6/16/2005 7/7/2008 12/19/2010 12/20/2011 1/1/2012 3/3/2012 5/5/2013 10/3/2014 11/3/2014
9 11/11/2001 12/19/2011 6/7/2012 7/14/2012              
10 2/2/2006                    

Do you ever have gaps in the variables such as Date5 is missing but Date6 is not?

Do you have cases where Date5 > Date6? Would you want Date5 as the result in such a case when Date6 was the last value recorded? Or would Date5>Date6 indicate an error condition (one if not both of Date5 and Date6 were entered incorrectly) and something else should be done?

Mscarboncopy
Pyrite | Level 9

No gaps and the dates should be in chronological order but yes there could be errors (I was not thinking about that) where one date might be out of chronological order. But those should not be the norm,

 

So a few cases could have an instance where the last date is NOT the actual last date because it could have been coded out of chronological order.

 

Thanks

 

ballardw
Super User

@Mscarboncopy wrote:

No gaps and the dates should be in chronological order but yes there could be errors (I was not thinking about that) where one date might be out of chronological order. But those should not be the norm,

 

So a few cases could have an instance where the last date is NOT the actual last date because it could have been coded out of chronological order.

 

Thanks

 


So what should be done when out of order values are encountered?

 

I asked the question because of the potential of the MAX value, or a not-tested "last variable", might be problem case. And requires different code evaluate and consider.

 

The n(of t(*)) approach suggested by @novinosrin gets the right most element when there are no gaps (why I asked) but does not consider actual values.

Something like

if max(of t(*)) ne t[ n(of t(*))] then do <whatever you want when the largest value is not the last value>

end;

may be wanted for the potential out of order for some data entry errors.

Mscarboncopy
Pyrite | Level 9

Good point, however, I want the max value not necessarily the last entry in the case of an error - so I thought that the max value was a good idea, since I need to know what was the last date (as in the most recent, i.e., max). The last entry should have been the most recent date (max) but might not have been (in case of an error). 

Thanks again,

 

 

Tom
Super User Tom
Super User

There is no need for ARRAY if you just want the MAX value. Just pass the variable list to the MAX() function instead of using it to define the array.

last_date = max(of date1-date175);
total_Y = sum(of y1-y175);

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1572 views
  • 0 likes
  • 5 in conversation