Hi guys,
Need some help with this please!
If I have the following data
User ID | Dates |
23576 | 08/10/1998 |
23576 | 08/11/1998 |
23576 | 08/12/2000 |
23576 | 09/10/1998 |
85028 | 07/09/1976 |
85028 | 07/10/1967 |
85028 | 10/07/1967 |
13857 | 19/03/2001 |
13857 | 18/03/2001 |
What I want is all possible pairs of the dates associated with that user ID.
So for ID 23576 I would want table like this:
User ID | Date A | Date B |
23576 | 08/10/1998 | 08/11/1998 |
23576 | 08/10/1998 | 08/12/2000 |
23576 | 08/10/1998 | 09/10/1998 |
23576 | 08/11/1998 | 08/12/2000 |
23576 | 08/11/1998 | 09/10/1998 |
23576 | 08/12/2000 | 09/10/1998 |
Hope that makes sense.
Thank you so much!
See if this helps:
data in;
userid=1; value=1; output;
userid=1; value=2; output;
userid=1; value=3; output;
userid=2; value=10; output;
userid=2; value=11; output;
run;
data outds;
set in;
by userid;
array vals(10) _temporary_;
retain cnt 0;
if first.userid then do;
cnt = 0;
end;
cnt + 1;
vals(cnt) = value;
if last.userid then do;
do i = 1 to cnt-1;
do j = i+1 to cnt;
value1 = vals(i);
value2 = vals(j);
output;
keep userid value1 value2;
end;
end;
end;
run;
You could also join the table with itself but this would produce more rows than you want (per your example)
data in;
userid=1; value=1; output;
userid=1; value=2; output;
userid=1; value=3; output;
userid=2; value=10; output;
userid=2; value=11; output;
run;
proc sql;
create table outsql as
select a.*, b.value as value2
from in a, in b
where a.userid = b.userid and a.value ne b.value
;
quit;
Hi,
Thank you for your response. How would I do this in data steps instead of proc sql please?
You are correct, the size of the array should be set to the maximum number of rows possible for a single userid. I provided just an example to illustrate the solution.
SQL is good at building cartesian joins, you just need the correct comparison:
data have;
infile datalines dlm="09"x;
input User_ID $ Dates:ddmmyy10.;
datalines;
23576 08/10/1998
23576 08/11/1998
23576 08/12/2000
23576 09/10/1998
85028 07/09/1976
85028 07/10/1967
85028 10/07/1967
13857 19/03/2001
13857 18/03/2001
;
proc sql;
create table want as
select
a.user_id,
a.dates as date_a format=yymmdd10.,
b.dates as date_b format=yymmdd10.
from have a inner join have b
on a.user_id = b.user_id and a.dates lt b.dates
;
quit;
Also note how data is presented in a data step with datalines, this makes it much easier to develop and test code. Please do so in the future.
For cartesian joins, SQL is the tool of choice. My code is tested and works. Where is your problem?
@Feefee wrote:
I'm getting a redefinition of local variable error.
Also, I am not using sas but a similar product and the sql isnt very liked. So I am trying to write it in a datastep instead but struggling. Any help would be appreciated please 🙂
Buying cheap means buying double, says my grandma 😉
One modern data step solution is the use of a hash object, which grows as needed (using the same "have" I used earlier):
data want2;
if 0 then set have;
format date_a date_b yymmdd10.;
if _n_ = 1
then do;
declare hash h (multidata:"yes");
declare hiter hi ("h");
h.definekey("user_id");
h.definedata("user_id","dates");
h.definedone();
end;
do until (last.user_id);
set have;
by user_id notsorted;
rc = h.add();
end;
do until (last.user_id);
set have (rename=(dates=date_a));
by user_id notsorted;
rc = hi.first();
do while (rc = 0);
date_b = dates;
if date_a < date_b then output;
rc = hi.next();
end;
end;
rc = h.clear();
keep user_id date_a date_b;
run;
data have; input element$ 1-2 qt; datalines; A 1 B 2 Cc3 D 4 Ee5 ; data want; set have end=last nobs=nobs; array x{999} $ 32 _temporary_; x{_n_}=element; if last then do; do i=1 to nobs-1; var1=x{i}; do j=i+1 to nobs; var2=x{j};output; end; end; end; keep var1 var2; run;
I found this, and it works but I want to apply this to each User_ID and not the entire list. I am trying method with first and last but I am getting null pointer exception.
This is what I have so far:
data want; set have; retain total_rows 0 a 1; by User_ID; total_rows = total_rows +1; a = a +1; if first.User_ID then do; a=1; array x{999} $ 32 _temporary_; x{_n_}=dates; end; if last.User_ID then do; do i = 1 to total_rows-1; dates=x{i}; do j=rownum+1 to total_rows; var2=x{j}; output; end; end; end; run;
It works find until the do = i bit..
I have formatted etc its just the last bit I am struggling to get 😞
Have you seen my sample code that I posted a while ago at https://communities.sas.com/t5/SAS-Programming/Find-pairs-of-a-single-variable-from-first-and-last/m... ?
Isn't it what you are looking for?
The only thing "wrong" with my code is that you need to increase the "vals" array size (in my example I set to "10") to some big number that will never be reached of events per userid
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.