flag values

Smitha9 · Posted 04-17-2023 10:11 AM

Hi,

I have a dataset of 3000 ID's. I need to flag specific 200ID's as "Not sure". Since it is huge number of ID's, I am not sure how to flag the specific 200 ID's. this should be done everyweek.

ballardw · Posted 04-17-2023 10:28 AM

What are the rules involved with setting the flags?

Where do the "specific 200" id values come from?

Are the ID values duplicated in the data?

What would the flag values look like?

Smitha9 · Posted 04-17-2023 10:33 AM

Data set A

ID

1

2

3

4

5

dataset B

ID

2

3

4

I want to flag the dataset B ID's in Dataset A

Output dataset A:

ID Flag

1

2 1

3 1

4 1

5

I have the dataset which has around 2000ID's and to be flagged ID's are around 200.

ballardw · Posted 04-17-2023 01:19 PM

One way:

Data setA;
  input ID;
datalines;
1
2
3
4
5
;
 
data setB;
  input ID;
datalines;
2
3
4
;

proc sort seta;
  by id;
run;
proc sort setb;
  by id;
run;

data want;
  merge seta setb (in=inb);
  by id;
  if inb then flag=1;
run;

First please note the DATA step as a way to provide example data.

The sets are sorted to allow use of the BY Id in the Want data step.

The data set option In= creates a temporary variable (not added to the set) that indicates whether a particular data set contributes to the current observation and is valued 1/0 for yes/no or true/false if your prefer. Since the concern is that Setb contribute the match then that is on the set that needs in. We can test the value of the value of the In variable to do things conditionally such as set this flag.

If you had wanted a 1/0 coding flag value on each observation you would change the code to

data want;
  merge seta setb (in=inb);
  by id;
  flag=inb;
run;

Note: IF the ID value is duplicated in both sets this will not work as desired and there will be a warning in the Log.

Tom · Posted 04-17-2023 01:44 PM

This is a simple MERGE operation.

So if you have two datasets like this:

data A;
  input ID ;
cards;
1
2
3
4
5
;

data B;
  input ID;
cards;
2
3
4
;

You can easily run a data step to make a NEW dataset that includes your flag variable.

data want;
  merge a(in=in1) b(in=in2);
  by id;
  if in1;
  flag=in2;
run;

Results:

Obs    ID    flag

 1      1      0
 2      2      1
 3      3      1
 4      4      1
 5      5      0

Quentin · Posted 04-17-2023 10:29 AM

Can you describe your question a bit more, and add some example data? You can make it a small example, say a datasets of 5 ID's and you want to flag 3 of them.

Do you have a dataset of the records you want to flag? If so, could you merge your "flagme" dataset onto your main dataset, to identify the records that need to be flagged?

flag values

Re: flag values

Re: flag values

Re: flag values

Re: flag values

Re: flag values

flag values

Re: flag values

Re: flag values

Re: flag values

Re: flag values

Re: flag values

Registration is open