Hello, the result of this code is giving me 0 observations. The data does have duplicate rows per subjid so I expect there to be resulting rows. The data is sorted correctly. When I do "if first.subjid" I get the right number of rows. It's just the "not" part that doesn't seem to be working. Can anyone see what I am doing wrong? I'm so frustrated haha
@cbig wrote:
Hello, the result of this code is giving me 0 observations. The data does have duplicate rows per subjid so I expect there to be resulting rows. The data is sorted correctly. When I do "if first.subjid" I get the right number of rows. It's just the "not" part that doesn't seem to be working. Can anyone see what I am doing wrong? I'm so frustrated haha
proc sort data = ae_06c; by studyid site subjid; run;data ae_06d; *has a 1;set ae_06c ;by studyid site subjid;if not first.subjid ;run;
Hello, @cbig ! For this post and really for all your future posts, please provide us with some sample data that illustrates the problem. That would be very helpful and would likely result in you getting faster and more correct answers. Here are examples and instructions of how to provide us with sample data.
Without an example data set, I cannot figure out why your code doesn't work. It seems like it should work, however the main reason I can think of why it doesn't work is that your data isn't really what you said it was. So providing an example data set that illustrates the problem would straighten all of this out.
Also, there are possibly important clues in the log. When your code is not behaving, please ALWAYS provide us with a copy of the log. Please copy the log for this DATA step (not just the possible ERRORs and WARNINGs but the entire log for this DATA step)and paste it into the Window that appears when you click on the </> icon.
What do you mean by not working? Does your new dataset have zero observations? Is that what you expected.
Your data step will only have observations when there are observations that have equal values for all three of those variables since the one you referenced SUBJID is the last of the three in the BY statement. And the first observation in those BY groups will not be output.
If you wanted a dataset that has only one observation per BY group you should use one of these:
if first.subjid;
if last.subjid;
If you wanted a dataset that has all of the duplicate observations for the BY groups that have duplicates then you should use:
if not (first.subjid and last.subjid);
Which will remove the BY groups that only have one observation.
Perhaps you are confused about what FIRST. means when there are multiple BY variables? FIRST.SUBJID is true when any of the three variables change value.
It is not clear what you expect to see in the resulting dataset. Here's an example using sashelp.class, sorted by sex and age to display what "not first.variable means:
proc sort data=sashelp.class out=work.class_sorted;
by Sex Age;
run;
data work.selection;
set work.class_sorted;
by Sex Age;
not_first = not first.Age;
run;
Just FYI ...
Two PROC SORT options that you might find useful:
BR,
Koen
If all your combinations of
studyid site subjid
are unique, then
first.subjid
will always be true, and the whole condition always false. Your empty result dataset proves that all combinations are unique and you have no duplicates.
Multiple entries for a given subjid are obviously contained in different studies/sites.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.