Select first records from the dataset

cho16 · Posted 08-09-2022 01:02 PM

Hi ..

Need a help on this.

I have a dataset with the following attributes: id, typ(XN, IN), and sysdat.
If an ID contains both XN and IN, only XN records should be retrieved.
If the ID contains XN records, only XN records must be retrieved.
If the ID conatins IN records , only IN records must be retrieved.

Input :

ID typ sysdat

123 IN 2210128

123 XN 2410128

124 XN 2210128

125 IN 2210128

Ouput :

ID typ sysdat

123 IN 2210128

124 XN 2210128

125 IN 2210128

Thanks in Advance..

Tom · Posted 08-09-2022 01:18 PM

Your rules don't make sense. The first IF is a subset of the last IF. Perhaps you meant to include ONLY in the last one like you did for the middle one?

But also your expected results do not match the rules.

So 123 has both so only the XN observation should be kept, but your output is showing that one of the two IN observations was kept instead.

And 125 has only IN observations, but you only selected one of them. How did you decide which one to keep?

cho16 · Posted 08-09-2022 01:23 PM

Sorry typo mistake

If an ID contains both XN and IN, only IN records should be retrieved.

cho16 · Posted 08-09-2022 01:24 PM

Sorry typo mistake...If an ID contains both XN and IN, only XN records should be retrieved.

Kurt_Bremser · Posted 08-10-2022 01:47 AM

@cho16 wrote:
Sorry typo mistake...If an ID contains both XN and IN, only XN records should be retrieved.

Then your output for 123 is wrong.

Please review your requirements and expected output so they make sense. In particular, define a selection rule when multiple observations meet the condition.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 08-09-2022 03:01 PM

I'm as confused about the rules here as @Tom, so this is somewhat a shot.

data have;
input ID typ $ sysdat;
datalines;
123 IN 2210128 
123 IN 2210128 
123 XN 2410128 
124 XN 2210128 
125 IN 2210128 
125 IN 2210128 
;

proc sql;
   create table want as
   select distinct * from have 
   group by ID
   having typ = min(typ)
   ;
quit;

Result:

ID   typ  sysdat
123  IN   2210128  
124  XN   2210128 
125  IN   2210128

The DATA to DATA Step Macro
Blog: SASnrd

cho16 · Posted 08-09-2022 03:01 PM

Hi ..

Need a help on this.

I have a dataset with the following attributes: id, typ(XN, IN), and sysdat.
If an ID contains both XN and IN, only IN records should be retrieved.
If the ID contains XN records, only XN records must be retrieved.
If the ID conatins IN records , only IN records must be retrieved.

Input :

ID typ sysdat

123 IN 2210128

123 XN 2410128

124 XN 2210128

125 IN 2210128

Ouput :

ID typ sysdat

123 IN 2210128

124 XN 2210128

125 IN 2210128

Thanks in Advance..

PeterClemmensen · Posted 08-09-2022 03:03 PM

I'm as confused about the rules here as @Tom in the other thread.

However, try this.

data have;
input ID typ $ sysdat;
datalines;
123 IN 2210128 
123 IN 2210128 
123 XN 2410128 
124 XN 2210128 
125 IN 2210128 
125 IN 2210128 
;

proc sql;
   create table want as
   select distinct * from have 
   group by ID
   having typ = min(typ)
   ;
quit;

Result:

ID   typ  sysdat
123  IN   2210128  
124  XN   2210128 
125  IN   2210128

The DATA to DATA Step Macro
Blog: SASnrd

mkeintz · Posted 08-10-2022 09:39 AM

For each ID, interleave all the observations such that any "IN" records will precede any "XN" records. Then just keep the first individual record for each ID:

data have;
input ID typ $ sysdat;
datalines;
123 IN 2210128 
123 IN 2210128 
123 XN 2410128 
124 XN 2210128 
125 IN 2210128 
125 IN 2210128 
run;
data want;
  set have (where=(typ='IN'))
      have (where=(typ='XN'));
  by id;
  if first.id;
run;

This assumes dataset HAVE is sorted by ID. But it doesn't matter what the order is within each ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

andreas_lds · Posted 08-10-2022 12:50 AM

I have merged your posts. Please don't double-post question.

Ksharp · Posted 08-10-2022 08:26 AM

data have;
input ID typ $ sysdat;
datalines;
123 IN 2210128 
123 IN 2210128 
123 XN 2410128 
124 XN 2210128 
125 IN 2210128 
125 IN 2210128 
;

proc sql;
create table want as
select distinct *,case when count(distinct typ)>1 and typ='IN' then 1
when count(distinct typ)=1 then 1 else 0 end as flag
 from have
  group by id
   having calculated flag=1
;
quit;

Select first records from the dataset

Re: Select first records from the dataset

Re: Select first records from the dataset

Re: Select first records from the dataset

Re: Select first records from the dataset

Re: Select first records from the dataset

Select a first observations from the data

Re: Select a first observations from the data

Re: Select a first observations from the data

Re: Select first records from the dataset

Re: Select first records from the dataset

Classroom Training Available!