BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emma2021
Quartz | Level 8

Not necessary earliest because the "flag" variable is important as well. 

ballardw
Super User

@Emma2021 wrote:

Not necessary earliest because the "flag" variable is important as well. 


I have to interpret your statement " If not missing then get the row that associated with smallest date's row " as referring to the Flag variable. You do not provide any example/description or use for what to do with a missing flag variable value.

Since none of the Subjid=1 with missing Flag values were included why is the Subjid=2 with a missing flag value included in the output. 

 

You really have not very clearly described the exact roll the value of the Flag variable plays. Providing example output without clear rules is an incomplete description. I can provide programming that would create the exact example that you show for the given input but it very likely would not work in a general sense because the the actual process or rules have not been provided that indicate why the record with

2 6jul2021 .

is wanted in the output but the records like

1 1jul2021 .

are not wanted.

 

 

Emma2021
Quartz | Level 8
Combination of date and flag should be used:
1. Check 5 days lag
2. If lag is within 5 days then take the first row:
But here can be flag missing or not missing,
If all missing, then just take the first date
If some not missing, then take the not missing first flag
If all not missing flag, then take the first date



When as below data:



data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;





Wanted should be as below:

1 4jul2021 1
1 10jul2021 2
2 1jul2021 1
2 6jul2021 .







But when as below data:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
1 18jul2022 1
2 1jul2021 1
2 6jul2021 .
;
run;





Then it should be as below:

1 1jul2021 .
1 4jul2022 1
2 1jul2021 1
2 6jul2021 .
Kurt_Bremser
Super User
  1. Provide data in a working data step, as already shown
  2. Use the tool that sits between your ears; try to adapt the already provided solutions to your scenario, and show us that code. Otherwise we might get a feeling that you are only lazy and looking for people to do your work for free.
Emma2021
Quartz | Level 8

Thank you, but it is wrong because the 5 is date difference (LAG) should be used not _n_. 

yabwon
Onyx | Level 15

Why is it wrong?

B.

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Emma2021
Quartz | Level 8

Combination of date and flag should be used:
1. Check 5 days lag
2. If lag is within 5 days then take the first row:
But here can be flag missing or not missing,
If all missing, then just take the first date
If some not missing, then take the not missing first flag
If all not missing flag, then take the first date

 

When as below data:

 

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;

 

 

Wanted should be as below:

1 4jul2021 1

1 10jul2021 2

2 1jul2021 1
2 6jul2021 .

 

 

 

But when as below data:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
1 18jul2022 1
2 1jul2021 1
2 6jul2021 .
;
run;

 

 

Then it should be as below:

1 1jul2021 .

1 4jul2022 1

2 1jul2021 1
2 6jul2021 .

ballardw
Super User

Still not explaining the rules clearly.

 


@Emma2021 wrote:

When as below data:

 

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 1
2 6jul2021 .
;
run;

 

 

Wanted should be as below:

1 4jul2021 1

1 10jul2021 2

2 1jul2021 1
2 6jul2021 .

 

 

 

But when as below data:

data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
1 18jul2022 1
2 1jul2021 1
2 6jul2021 .
;
run;

 

 

Then it should be as below:

1 1jul2021 .

1 4jul2022 1

2 1jul2021 1
2 6jul2021 .


 

Emma2021
Quartz | Level 8

Combination of date and flag should be used:
1. Check 5 days lag
2. If lag is within 5 days then take the first row:
But here can be flag missing or not missing,
If all missing, then just take the first date
If some not missing, then take the not missing first flag
If all not missing flag, then take the first date…

Emma2021
Quartz | Level 8
Your code is working great, only it does not work when I have all flag is missing such as below:
2 1jul2021 .
2 6jul2021 .

Then it should keep just using the 5 days lag and keep both of those rows, but your code is removing both of them, otherwise it works great. Can you fix your code, please? Thank you
Patrick
Opal | Level 21

@Emma2021 

I must be missing something because the logic you describe and the have/want samples you give us don't match - at least not for how I read things. 

Below result using the logic as I understand it. Is this what you're after? If not can you please explain in detail what would need to be different - and please in other words and not just re-posting what you've provided already.

Btw: A "flag" has normally only two values - True/False, Y/N, 1/0.... Your variable "flag" is clearly not a flag.

data have;
  input id date date9. flag;
  format date date9.;
  cards;
1 1jul2021 .
1 1jul2021 .
1 4jul2021 1
1 10jul2021 4
1 16jul2021 2
1 18jul2021 1
2 1jul2021 .
2 6jul2021 .
;

/* data inter:
   - id rows with non missing flag OR
   - all rows for id's where flag is always missing
*/
data inter;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(not missing(flag)))');
      h1.defineKey('id');
      h1.defineDone();
    end;
  set have;
  if not missing(flag) or h1.check() ne 0;
run;

data want;
  set inter;
  by id date;
  dt_dif=dif(date);
  if first.id or dt_dif>=5 then output;
  drop dt_dif;
run;

proc print data=want;;
run;

Patrick_0-1641349301212.png

 

Emma2021
Quartz | Level 8
Thank you so much.
The flag variable is an positive integer (1, 2 etc. that smallest can be 1).
Emma2021
Quartz | Level 8
Your code is working but in below example it does not working:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 18jul2021 1
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
2 1jul2021 .
2 6jul2021 .
;


Then the wanted data should be below:
1 1jul2021 .
1 18jul2021 1
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
2 1jul2021 .
2 6jul2021 .
Patrick
Opal | Level 21

@Emma2021 wrote:
Your code is working but in below example it does not working:
data have;
input id date date9. flag;
format date date9.;
cards;
1 1jul2021 .
1 1jul2021 .
1 18jul2021 1
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
2 1jul2021 .
2 6jul2021 .
;


Then the wanted data should be below:
1 1jul2021 .
1 18jul2021 1
1 4jul2022 1
1 10jul2022 4
1 16jul2022 2
2 1jul2021 .
2 6jul2021 .

 


?? - Then please re-formulate the logic very clearly.

a: You want the date calculation based on the original sort order of your data and not sorted by date within an ID? or something else.

b: Why do you select 1Jul2021 given your logic states "If some not missing, then take the not missing first flag". Does this logic only apply for the same date within an id? That needs more clarification.

 

Please also try and spend the time to create sample data that covers all the possible cases. It's not very helpful to get these "your code is working but it's not working" and then you post some modified sample data.

 

Patrick_0-1641355325287.png

 

Emma2021
Quartz | Level 8
It was not working only for below example:
Have Data:
1 1jul2021 .
1 1jul2021 .
1 18jul2021 1


Wanted Data:
1 1jul2021 .
1 18jul2021 1

In other words, id=1 has many different data, (1) within 5 days all flag is missing, then it should take the first date; (2) within 5 days some may missing and some not, then take the first date without missing; (3) within 5 days, all have flag values, then just take the first date.

Your code is working for (2) and (3) but 1–see above example it does not working. Thank you

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 33 replies
  • 1786 views
  • 8 likes
  • 8 in conversation