BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
RAVI2000
Lapis Lazuli | Level 10

Hello everyone,
I have the below sample data. Have more records in original dataset. Just providing few of them.

data lb;
input subjid testcd blfl;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
;
run;

WANT:

I would like to retrieve all the testcd records that have multiple blfl records per subject like below.

RAVI2000_0-1666133054378.png

For example, in the selected highlighted records, all the records from the "ALB" testcd are inlcuded where there are multiple records for "BLFL".

If there is single records per subjid per parameter per blfl, it should NOT be considered.

Likewise, if there is single record per blfl, it should NOT be considered.

Please help me!

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

For each subjid/testcd

  1. Read and count all the obs with bifl='Y'.
  2. Read all obs, keeping only those with count from #1 greater than 1.

 

This is facilitated by having two objects of the SET command: the first filtering only for bifl="Y', the second with no filtering.

data lb;
input subjid testcd $3. blfl :$1.;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
run;

data want (drop=_:);
  set lb (where=(blfl='Y') in=iny)
      lb (in=inall);
  by subjid testcd;

  if first.testcd then _ny=0;
  _ny+iny;

  if inall and _ny>1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

13 REPLIES 13
PaigeMiller
Diamond | Level 26

I'm sorry but I don't understand this:

 

I would like to retrieve all the testcd records that have multiple blfl records per subject like below.

 

Do you mean "I would like to retrieve all the testcd records that have multiple blfl records per subject like below." In a SAS data set, blfl exists on every record. Or do you mean "I would like to retrieve all the testcd records that have multiple blfl values per subject like below." Or do you mean something else?

--
Paige Miller
RAVI2000
Lapis Lazuli | Level 10
Sorry for the misinterpretation. Yes, the second one please.
"I would like to retrieve all the testcd records that have multiple blfl values per subject like below."
PaigeMiller
Diamond | Level 26

So why is 4 ALB not included in the output?

--
Paige Miller
RAVI2000
Lapis Lazuli | Level 10
Because it just have one value for "BLFL". I want to out if there are multiple "Y" values per testcd.
Kurt_Bremser
Super User

@RAVI2000 wrote:
I want to out if there are multiple "Y" values per testcd.

Important information like this MUST always be part of your initial post.

 

Keep in mind that a missing value is still a value.

RAVI2000
Lapis Lazuli | Level 10
Yes Sir, any insight please
PaigeMiller
Diamond | Level 26

@RAVI2000 wrote:
Because it just have one value for "BLFL". I want to out if there are multiple "Y" values per testcd.

Although @Kurt_Bremser has already explained, please be aware that this was my source of confusion too ... that a missing value is still a different value than "Y" and so if a group has "Y" and missing that is still multiple values.

--
Paige Miller
Tom
Super User Tom
Super User

It does not look like you want the groups with multiple observations.

having count(*) > 1

that would keep ALL of the observations.

 

Nor those with multiple distinct values.

having count(distinct BLFL) > 1

that would not keep the second group you kept.

 

Looks like you want to keep the groups that have multiple non-missing values.   So you could test SUM(not missing(blfl)).

Or perhaps

proc sql;
select * 
  from lb
  group by subjid, testcd 
  having count(blfl) > 1
;
quit;

If unlike your posted data step the BLFL variable is character then you might use:

having SUM('Y'=blfl) > 1

especially if you somehow managed to get some of the observations set to '.' instead of ' ' (which is what SAS would consider missing for a character variable).

 

 

But if BLFL is a numeric variable, like you posted, that is either regular missing or special missing .Y then I would go with the SUM( BLFL = .Y ) as the test instead since both . and .Y as considered missing by the MISSING() function.

missing y ;
data lb;
input subjid testcd $ blfl;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
;
proc sql ;
select * from lb group by 1,2 having sum(blfl=.y) > 1 ;
quit;

 

 

 

RAVI2000
Lapis Lazuli | Level 10
Hi Tom,

Thanks for the insight. Does this also include missing . values as well?

What I want here is, for instance if you consider the testcd ALB for subject 1, it has multiple "Y" values for that testcd. I want to output all the records which also include the missing values of the "BLFL" as well.
Where ever we have multiple records of "Y" per testcd, I want to output all the records for that subject.
Tom
Super User Tom
Super User

@RAVI2000 wrote:
Hi Tom,

Thanks for the insight. Does this also include missing . values as well?

What I want here is, for instance if you consider the testcd ALB for subject 1, it has multiple "Y" values for that testcd. I want to output all the records which also include the missing values of the "BLFL" as well.
Where ever we have multiple records of "Y" per testcd, I want to output all the records for that subject.

The test is being done on the aggregate value for the group, so all observations in the group will have the same value, so yes it will select the values with missing values.

 

But as I noted '.' is NOT a missing value for a character variable.  In a character variable SAS considers a value that only contains spaces as missing.  You can use a single period to indicate a missing character value in a text file because the normal character informat,  $ , will convert a single period into spaces.

RAVI2000
Lapis Lazuli | Level 10

The "BLFL" is a character variable. Sorry I have mistakenly give "." instead of blank spaces. With your code I was able to extract the records that have multiple BLFL = "Y". But the records that have missing/ blank values are not being extracted.

My output looks something like below after using your code.
subjid testcd blfl
1 ALB Y
1 ALB Y
2 ALP Y
2 ALP Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y

For example, I also want to include the first three obs of the blank records of the BLFL along with the BLFL = "Y" for a particular testcd.
Again, I want to extract the data where there are multiple records per testcd. The 1st testcd "ALB" with subjid "1" is considered since there is 2(multiple) "Y" flags for "BLFL". The "ALB" for subjid "4" is not considered because only one "BLFL" is present.

Note: whenever there are multiple blfl, I want all the records for that testcd including the missing values of BLFL.

Tom
Super User Tom
Super User

Works with the data you posted.

Let's add two variables to your date.  One is a binary flag EXPECT to indicate which observations should be "kept". The other is a record counter so we can force SQL to output the observations in the same order as they came in.

data lb;
  input subjid testcd $3. blfl :$1. expect ;
  row+1;
cards;
1 ALB . 1
1 ALB . 1
1 ALB . 1
1 ALB Y 1
1 ALB Y 1
1 ALP . 0
1 ALP . 0
1 ALP Y 0
2 ALB . 0
2 ALB . 0
2 ALB . 0
2 ALB Y 0
2 ALP Y 1
2 ALP Y 1
3 ALB Y 0
3 ALP Y 1
3 ALP Y 1
3 ALP Y 1
4 ALB . 0
4 ALB . 0
4 ALB . 0
4 ALB Y 0
4 ALP . 1
4 ALP . 1
4 ALP . 1
4 ALP Y 1
4 ALP Y 1
4 ALP Y 1
4 ALT Y 1
4 ALT Y 1
;

proc sql;
create table test1 as select * from lb group by 1,2 having count(blfl) > 1 order by row;
create table test2 as select * from lb group by 1,2 having sum('Y'=blfl) > 1 order by row;
quit;
proc compare data=test1 compare=test2;
 id row;
run;
proc compare data=test1 compare=lb(where=(expect));
 id row ;
run;

 

mkeintz
PROC Star

For each subjid/testcd

  1. Read and count all the obs with bifl='Y'.
  2. Read all obs, keeping only those with count from #1 greater than 1.

 

This is facilitated by having two objects of the SET command: the first filtering only for bifl="Y', the second with no filtering.

data lb;
input subjid testcd $3. blfl :$1.;
cards;
1 ALB .
1 ALB .
1 ALB .
1 ALB Y
1 ALB Y
1 ALP .
1 ALP .
1 ALP Y
2 ALB .
2 ALB .
2 ALB .
2 ALB Y
2 ALP Y
2 ALP Y
3 ALB Y
3 ALP Y
3 ALP Y
3 ALP Y
4 ALB .
4 ALB .
4 ALB .
4 ALB Y
4 ALP .
4 ALP .
4 ALP .
4 ALP Y
4 ALP Y
4 ALP Y
4 ALT Y
4 ALT Y
run;

data want (drop=_:);
  set lb (where=(blfl='Y') in=iny)
      lb (in=inall);
  by subjid testcd;

  if first.testcd then _ny=0;
  _ny+iny;

  if inall and _ny>1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 940 views
  • 4 likes
  • 5 in conversation