BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JH74
Obsidian | Level 7

Hello.  I'm having trouble with writing code to get a subset of data.  

 

Here's what I have:

Data looks like this (I have thousands of records, but an example).

RECORDDate_Start
1234566/1/2023
1234566/1/2023
1234566/1/2023

I need to find unique values for the record and the date_start.  So, for this example, I would want only 1 unique record in a subset to have the record and the date_start, like this:

RECORDDate_Start
1234566/1/2023

 

I also have this:

RECORDDate_Start
12345676/1/2023
12345676/2/2023
12345676/3/2023

For this, I'd expect to have 3 unique records in the new subset, since all are unique with respects to the record and the date_start.  

 

I'm relatively new to SAS, so it's a bit of a challenge for me.  I've tried sorting the set by record and date_start.  Then creating a new variable that's retained, using first. etc, but I'm getting every record of the main data set exported to my subset data set.  

 

Any ideas are appreciate.  

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

SQL is easy enough.

 

data have;
    input RECORD $ Date_Start mmddyy10.;
    format date_start mmddyys10.;
    cards;
1234560 6/1/2023
1234561 6/1/2023
1234561 6/1/2023
1234567 6/1/2023
1234567 6/2/2023
1234568 6/1/2023
1234568 6/2/2023
1234568 6/3/2023
1234569 6/1/2023
1234569 6/2/2023
1234569 6/2/2023
1234569 6/3/2023
1234569 6/3/2023
1234569 6/3/2023
;

proc sql;
    create table tableA as 
    select * 
    from have 
    where record in (select record 
                    from have 
                    group by record 
                    having max(Date_Start)=min(Date_Start)
                    );
                    
                    
    create table tableA as 
    select * 
    from have 
    where record not in (select record 
                    from have 
                    group by record 
                    having max(Date_Start)=min(Date_Start)
                    );
quit;

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26
proc sort data=have;
    by record date_start;
run;

data want;
    set have;
    by record date_start;
    if first.date_start;
run;
--
Paige Miller
JH74
Obsidian | Level 7

Thanks for this.  I have done this and I think I have not made my initial question clear enough.  I have to get two separate subsets.  One where the subset consists of instances like this:

RECORDDate_Start
1234566/1/2023
1234566/1/2023
1234566/1/2023

So, this scenario would go to a new dataset called A.

 

Then instances where it is like this (so one record and multiple start dates for that record):

RECORDDate_Start
12345676/1/2023
12345676/2/2023
12345676/3/2023

This would go to a new dataset called B.  

 

So, I need to new sets, one that contains instances where one record and consistent dates.  Another that contains instances where one record and various dates.  I apologize for not being more detailed.  

 

Reeza
Super User
What would the input dataset look like?
JH74
Obsidian | Level 7

The input dataset has 3 total variables.  Only these 2 are relevant though, record and date_start.

PaigeMiller
Diamond | Level 26

@JH74 wrote:

I have done this and I think I have not made my initial question clear enough.  I have to get two separate subsets.  One where the subset consists of instances like this:

RECORD Date_Start
123456 6/1/2023
123456 6/1/2023
123456 6/1/2023

So, this scenario would go to a new dataset called A.

 

Then instances where it is like this (so one record and multiple start dates for that record):

RECORD Date_Start
1234567 6/1/2023
1234567 6/2/2023
1234567 6/3/2023

This would go to a new dataset called B.  

 

So, I need to new sets, one that contains instances where one record and consistent dates.  Another that contains instances where one record and various dates.  I apologize for not being more detailed.  

 


And its still not clear to me. You are phrasing the rule in terms of consistent dates, and another case where there are various dates. What happens if a RECORD has both, say three records for RECORD 45678 on 6/1/2023 and then one record for 45678 on 6/2/2023 and two records for 45678 on 6/3/2023.

 

Please give an explanation that covers all possibilities, or if a situation is impossible and never happens, state those cases as well.

--
Paige Miller
JH74
Obsidian | Level 7

Thanks for your question.  The data can have any number of combinations.  The record can be one line or 50 lines.  However, the record value is always consistent whether 1 line or 50.  Each record can have either the same date_start value, or multiple different date_start values.  

RECORDDate_Start
12345606/1/2023
12345616/1/2023
12345616/1/2023
12345676/1/2023
12345676/2/2023
12345686/1/2023
12345686/2/2023
12345686/3/2023
12345696/1/2023
12345696/2/2023
12345696/2/2023
12345696/3/2023
12345696/3/2023
12345696/3/2023

 

So, for records where the date_start is the same, those need to go to data set A.  For records where the date_start varies within the same record, those records need to go to data set B.  So I will have 2 data sets.  One contains records where the date_start never changes for any unique record.  The other contains records where the date_start varies for the unique record.  

 

Example of what is needed:

Data Set A 
RECORDDate_Start
12345606/1/2023
12345616/1/2023
12345616/1/2023

 

Data Set B 
RECORDDate_Start
12345676/1/2023
12345676/2/2023
12345686/1/2023
12345686/2/2023
12345686/3/2023
12345696/1/2023
12345696/2/2023
12345696/2/2023
12345696/3/2023
12345696/3/2023
12345696/3/2023
Reeza
Super User

SQL is easy enough.

 

data have;
    input RECORD $ Date_Start mmddyy10.;
    format date_start mmddyys10.;
    cards;
1234560 6/1/2023
1234561 6/1/2023
1234561 6/1/2023
1234567 6/1/2023
1234567 6/2/2023
1234568 6/1/2023
1234568 6/2/2023
1234568 6/3/2023
1234569 6/1/2023
1234569 6/2/2023
1234569 6/2/2023
1234569 6/3/2023
1234569 6/3/2023
1234569 6/3/2023
;

proc sql;
    create table tableA as 
    select * 
    from have 
    where record in (select record 
                    from have 
                    group by record 
                    having max(Date_Start)=min(Date_Start)
                    );
                    
                    
    create table tableA as 
    select * 
    from have 
    where record not in (select record 
                    from have 
                    group by record 
                    having max(Date_Start)=min(Date_Start)
                    );
quit;
JH74
Obsidian | Level 7

Thanks for your reply.  I have a question about the cards part.  I'm not familiar with that.  But, I will not know all of the values in the record or date_start field.  There are thousands of records in the dataset.  

 

Does that impact the cards code where you listed out the values?  

 

Reeza
Super User
That part is just to make fake data to test. Point it to your actual data set instead.
Reeza
Super User
Do you have other variables? If not SQL is an option.

If you do, stick with a data step as demonstrated by PaigeMiller.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 670 views
  • 1 like
  • 3 in conversation