Hello everyone,
I have the below sample data. Have more records in original dataset. Just providing few of them.
data lb;
input subjid testcd blfl;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
;
run;
WANT:
I would like to retrieve all the testcd records that have multiple blfl records per subject like below.
For example, in the selected highlighted records, all the records from the "ALB" testcd are inlcuded where there are multiple records for "BLFL".
If there is single records per subjid per parameter per blfl, it should NOT be considered.
Likewise, if there is single record per blfl, it should NOT be considered.
Please help me!
Thanks!
For each subjid/testcd
This is facilitated by having two objects of the SET command: the first filtering only for bifl="Y', the second with no filtering.
data lb;
input subjid testcd $3. blfl :$1.;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
run;
data want (drop=_:);
set lb (where=(blfl='Y') in=iny)
lb (in=inall);
by subjid testcd;
if first.testcd then _ny=0;
_ny+iny;
if inall and _ny>1;
run;
I'm sorry but I don't understand this:
I would like to retrieve all the testcd records that have multiple blfl records per subject like below.
Do you mean "I would like to retrieve all the testcd records that have multiple blfl records per subject like below." In a SAS data set, blfl exists on every record. Or do you mean "I would like to retrieve all the testcd records that have multiple blfl values per subject like below." Or do you mean something else?
So why is 4 ALB not included in the output?
@RAVI2000 wrote:
I want to out if there are multiple "Y" values per testcd.
Important information like this MUST always be part of your initial post.
Keep in mind that a missing value is still a value.
@RAVI2000 wrote:
Because it just have one value for "BLFL". I want to out if there are multiple "Y" values per testcd.
Although @Kurt_Bremser has already explained, please be aware that this was my source of confusion too ... that a missing value is still a different value than "Y" and so if a group has "Y" and missing that is still multiple values.
It does not look like you want the groups with multiple observations.
having count(*) > 1
that would keep ALL of the observations.
Nor those with multiple distinct values.
having count(distinct BLFL) > 1
that would not keep the second group you kept.
Looks like you want to keep the groups that have multiple non-missing values. So you could test SUM(not missing(blfl)).
Or perhaps
proc sql;
select *
from lb
group by subjid, testcd
having count(blfl) > 1
;
quit;
If unlike your posted data step the BLFL variable is character then you might use:
having SUM('Y'=blfl) > 1
especially if you somehow managed to get some of the observations set to '.' instead of ' ' (which is what SAS would consider missing for a character variable).
But if BLFL is a numeric variable, like you posted, that is either regular missing or special missing .Y then I would go with the SUM( BLFL = .Y ) as the test instead since both . and .Y as considered missing by the MISSING() function.
missing y ;
data lb;
input subjid testcd $ blfl;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
;
proc sql ;
select * from lb group by 1,2 having sum(blfl=.y) > 1 ;
quit;
@RAVI2000 wrote:
Hi Tom,
Thanks for the insight. Does this also include missing . values as well?
What I want here is, for instance if you consider the testcd ALB for subject 1, it has multiple "Y" values for that testcd. I want to output all the records which also include the missing values of the "BLFL" as well.
Where ever we have multiple records of "Y" per testcd, I want to output all the records for that subject.
The test is being done on the aggregate value for the group, so all observations in the group will have the same value, so yes it will select the values with missing values.
But as I noted '.' is NOT a missing value for a character variable. In a character variable SAS considers a value that only contains spaces as missing. You can use a single period to indicate a missing character value in a text file because the normal character informat, $ , will convert a single period into spaces.
The "BLFL" is a character variable. Sorry I have mistakenly give "." instead of blank spaces. With your code I was able to extract the records that have multiple BLFL = "Y". But the records that have missing/ blank values are not being extracted.
My output looks something like below after using your code.
subjid testcd blfl
1 ALB Y
1 ALB Y
2 ALP Y
2 ALP Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
For example, I also want to include the first three obs of the blank records of the BLFL along with the BLFL = "Y" for a particular testcd.
Again, I want to extract the data where there are multiple records per testcd. The 1st testcd "ALB" with subjid "1" is considered since there is 2(multiple) "Y" flags for "BLFL". The "ALB" for subjid "4" is not considered because only one "BLFL" is present.
Note: whenever there are multiple blfl, I want all the records for that testcd including the missing values of BLFL.
Works with the data you posted.
Let's add two variables to your date. One is a binary flag EXPECT to indicate which observations should be "kept". The other is a record counter so we can force SQL to output the observations in the same order as they came in.
data lb;
input subjid testcd $3. blfl :$1. expect ;
row+1;
cards;
1 ALB . 1
1 ALB . 1
1 ALB . 1
1 ALB Y 1
1 ALB Y 1
1 ALP . 0
1 ALP . 0
1 ALP Y 0
2 ALB . 0
2 ALB . 0
2 ALB . 0
2 ALB Y 0
2 ALP Y 1
2 ALP Y 1
3 ALB Y 0
3 ALP Y 1
3 ALP Y 1
3 ALP Y 1
4 ALB . 0
4 ALB . 0
4 ALB . 0
4 ALB Y 0
4 ALP . 1
4 ALP . 1
4 ALP . 1
4 ALP Y 1
4 ALP Y 1
4 ALP Y 1
4 ALT Y 1
4 ALT Y 1
;
proc sql;
create table test1 as select * from lb group by 1,2 having count(blfl) > 1 order by row;
create table test2 as select * from lb group by 1,2 having sum('Y'=blfl) > 1 order by row;
quit;
proc compare data=test1 compare=test2;
id row;
run;
proc compare data=test1 compare=lb(where=(expect));
id row ;
run;
For each subjid/testcd
This is facilitated by having two objects of the SET command: the first filtering only for bifl="Y', the second with no filtering.
data lb;
input subjid testcd $3. blfl :$1.;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
run;
data want (drop=_:);
set lb (where=(blfl='Y') in=iny)
lb (in=inall);
by subjid testcd;
if first.testcd then _ny=0;
_ny+iny;
if inall and _ny>1;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.