Hi. I have a data set that contains some variables that I need to identify if it's the first and only occurrence, or first of multiple, and create new fields in a new data set that can be reviewed further. The data sets have thousands of lines.
Here's an example of what I am starting with:
id | line | exclude_ind | reimb_amt |
12345 | 01 | 00 | 20 |
12345 | 02 | 00 | 0 |
54321 | 01 | 01 | 10 |
54321 | 02 | 01 | 10 |
54321 | 03 | 00 | 5 |
34567 | 01 | 05 | 0 |
34567 | 02 | 05 | 15 |
34567 | 03 | 09 | 20 |
34567 | 04 | 03 | 10 |
From the above data set, I want a field named id_count, which identifies unique id's. Each unique id gets a value = 01.
I want to create a field called exclude_ind_count, which is made of a value = 01 for each unique value in the exclude_ind field, within the unique id field.
So, the resulting new data set looks like this:
id | id_count | line | exclude_ind | exclude_ind_count | reimb_amt |
12345 | 01 | 01 | 00 | 01 | 20 |
12345 | 00 | 02 | 00 | 00 | 0 |
54321 | 01 | 01 | 01 | 01 | 10 |
54321 | 00 | 02 | 01 | 00 | 10 |
54321 | 00 | 03 | 00 | 01 | 5 |
34567 | 01 | 01 | 05 | 01 | 0 |
34567 | 00 | 02 | 05 | 00 | 15 |
34567 | 00 | 03 | 09 | 01 | 20 |
34567 | 00 | 04 | 03 | 01 | 10 |
So, each id_count field should, when summed, = 1 for each unique id. Basically I want a count of how many unique id's are in my data set.
And, as shown, within each unique id, each unique instance of exclude_ind has a resulting exclude_ind_count = 01. So, when summed, I can get an idea of how many id's have either 1 exclude_ind, or multiple exclude_ind.
I think this is what you want, or close. You may need to sort by id exclude_ind.
data have;
infile cards dlm='09'x dsd;
input id line exclude_ind reimb_amt;
cards;
12345 01 00 20
12345 02 00 0
54321 01 01 10
54321 02 01 10
54321 03 00 5
34567 01 05 0
34567 02 05 15
34567 03 09 20
34567 04 03 10
;;;;
run;
data want;
set have;
by id exclude_ind notsorted;
id_count = first.id;
exclude_ind_count = first.exclude_ind;
run;
Is that id_count supposed to be numeric or character? Not at all clear as showing two characters makes it a bit odd.
Is your data sorted by ID?
If your data is at least grouped by ID (all the ID values in adjacent rows) AND the Exclude_ind are also grouped together then creating NUMERIC values for this is easy. If not grouped then you may need to provide an example of that data and what you expect, which might require sorting the data.
data have; input id line exclude_ind reimb_amt; datalines; 12345 01 00 20 12345 02 00 0 54321 01 01 10 54321 02 01 10 54321 03 00 5 34567 01 05 0 34567 02 05 15 34567 03 09 20 34567 04 03 10 ; data want; set have; by id exclude_ind notsorted; id_count=first.id; exclude_ind_count=first.exclude_ind; run;
If you must see two digits for those variables then assign a Z2. format. Of you want a character version then use
put(first.var,z2.)
When you use BY group processing in a data step SAS provides automatic numeric 1/0 valued (1=true 0=false) variables FIRST. and LAST. for each variable on the By statement indicating whether the current observation is the first or last of that group.
I think this is what you want, or close. You may need to sort by id exclude_ind.
data have;
infile cards dlm='09'x dsd;
input id line exclude_ind reimb_amt;
cards;
12345 01 00 20
12345 02 00 0
54321 01 01 10
54321 02 01 10
54321 03 00 5
34567 01 05 0
34567 02 05 15
34567 03 09 20
34567 04 03 10
;;;;
run;
data want;
set have;
by id exclude_ind notsorted;
id_count = first.id;
exclude_ind_count = first.exclude_ind;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.