BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asgee
Obsidian | Level 7

Hi all,

 

I have some repeated measures data where each ID could potentially have 2 rows of data. A sample is shown below. 

ID

Time

Data

1234

T1

1

1234

T2

1

5555

T1

0

6777

T2

.

9876

T1

0

9876

T2

0

1000

T1

1

2000

T2

1

8888

T1

1

8888

T2

.

9000

T1

0

9000

T2

.

1010

T1

.

1010

T2

.

 

I'm trying to reduce each ID down to 1 observation each. The criteria is that I'm prioritizing their data on Time 2 (Time=T2). So if they have non-missing data on T2, I pick that row as that ID's observation. 

 

I used the code below to re-sort the data in descending "Time", so that the row where Time=T2 appears first. After that, it's easy for me to just take the first row (which should be the T2 row if they have 2 rows of data) as the observation for each ID. 

proc sort data=test;
by ID descending time;
run;
	
data want;
set test;
by ID;
if first.ID;
run;

My issue now are the data where they have missing data on T2, but not on T1. Looking at the sample data, if I proceeded with my code above ^, the ID "9000" (as well as ID 8888), would have their T2 observation picked where "Data" is missing. However, when you look at their T1 observation, ID 9000 has existing Data in T1 that's "0".

 

Non-missing data would mean like any value that's not ".". Therefore, Data=0 would count as non-missing data.

 

Basically, if the ID has 2 rows, prioritize picking their T2 row - if there's data. If there isn't, look at their T1 rowif there is data, pick that row instead. If all rows for Data are missing for that ID, then stick with picking their T2 row. 

 

I'd like to hopefully produce a table like the one below:

ID

Time

Data

1234

T2

1

5555

T1

0

6777

T2

.

9876

T2

0

1000

T1

1

2000

T2

1

8888

T1

1

9000

T1

0

1010

T2

.

 

Any help would be very much appreciated, thanks.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Thank you @asgee  Please post the best representative sample so that we don't have to go back and forth. Of course, it may take some time to prepare a mock data, however it's worth spending some time to think through carefully before posting the data, so that we are done in one shot 🙂

 

Btw, no worries it's no biggie-


proc sql;
 create table want(drop=_n_) as
 select *,  max(Time='T2' and data>.) as _n_
 from have
 group by id
 having _n_  and Time='T2' or  not _n_ and time='T1' 
 or count(distinct time)=1;
quit;

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

I didn't test this, but you can. I am 99.998% sure it will work 😉

 

proc sort data=test(where=(not missing(data))) out=test1;
by ID descending time;
run;
	
data want;
set test1;
by ID;
if first.ID;
run;

 

--
Paige Miller
asgee
Obsidian | Level 7
Hi @PaigeMiller, thanks for the response! I tested your code but I think the proc sort at the start removed all the missing data 😞 I think I'd still want to keep the missing data if the ID has all their rows missing. However, it did fix the issue of the T2 being missing but T1 as not. Will try to test out and see if I can adjust this if anything. Thanks again.
novinosrin
Tourmaline | Level 20

Hi @asgee  Fun SQL stuff-



data have;
input ID	Time $	Data;
cards;
1234	T1	.
1234	T2	31
5555	T1	10
6777	T2	.
9876	T1	64
9876	T2	12
1000	T1	43
2000	T2	9
8888	T1	40
8888	T2	.
9000	T1	90
9000	T2	.
;

proc sql;
 create table want(drop=_n_) as
 select *,  max(Time='T2' and data) as _n_
 from have
 group by id
 having _n_  and Time='T2' or not _n_ and time='T1' 
 or  count(distinct time)=1;
quit;
asgee
Obsidian | Level 7

Hi @novinosrin, thanks for the response! The code appears to work through all of the criteria, however I think it flips the last criteria where if the ID has two observations with existing data (T1 and T2 are non-missing).

My fault for not mentioning that the original "Data" variable has values in the format of (0, 1 or .). So for example:
- If you were to change the data on ID 9876 to have two rows of non-missing data on both T1 and T2, but those data are "0" (meaning T1=0, T2=0). They'd be considered "non-missing" as they're not "."
-When I ran your code, I think for ID's like these, it prioritized picking the T1 row instead of the T2 row. I guess since both rows were 0, it instead picked the first in line.

Otherwise, the code works perfectly when selecting the issue of T2 being missing, but T1 isn't (i.e. the ID 9000 issue). It also picks up the ID's with only single rows.

I'll keep testing out your code and see if I can adjust it. Maybe its just a resorting error on my end... Thanks again.

asgee
Obsidian | Level 7
@novinosrin I've edited my question above to reflect the "Data" variable. Perhaps the values being (0,1 or .) make a big difference...
novinosrin
Tourmaline | Level 20

Thank you @asgee  Please post the best representative sample so that we don't have to go back and forth. Of course, it may take some time to prepare a mock data, however it's worth spending some time to think through carefully before posting the data, so that we are done in one shot 🙂

 

Btw, no worries it's no biggie-


proc sql;
 create table want(drop=_n_) as
 select *,  max(Time='T2' and data>.) as _n_
 from have
 group by id
 having _n_  and Time='T2' or  not _n_ and time='T1' 
 or count(distinct time)=1;
quit;
asgee
Obsidian | Level 7
Thanks so much @novinosrin, yes I realized my mistake afterwards - my apologies for that 😞 . The code above works perfectly, really appreciate your help! Will be more wary the next time I post.

Thanks again!

novinosrin
Tourmaline | Level 20

And just to account for -"If all rows for Data are missing for that ID, then stick with picking their T2 row. " May be this is more prudent-


proc sql;
 create table want(drop=_n_) as
 select *,  max(Time='T2' and data>.) as _n_
 from have
 group by id
 having _n_  and Time='T2' or  not _n_ and time='T1' and n(data)
 or  not _n_ and time='T2' and n(data)=0
 or count(distinct time)=1;
quit;
asgee
Obsidian | Level 7
Thanks for this! The code works great and accounts for that as well, thanks again @novinosrin! Really appreciate the help.
Ksharp
Super User
data have;
input ID	Time $	Data;
cards;
1234
T1
1
1234
T2
1
5555
T1
0
6777
T2
.
9876
T1
0
9876
T2
0
1000
T1
1
2000
T2
1
8888
T1
1
8888
T2
.
9000
T1
0
9000
T2
.
1010
T1
.
1010
T2
.
;

data temp;
 set have;
 missing=missing(data);
run;
proc sort data=temp;by id descending missing time;run;
data want;
 set temp;
 by id;
 if last.id;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2940 views
  • 4 likes
  • 4 in conversation