BookmarkSubscribeRSS Feed
hellorc
Obsidian | Level 7

Hello SAS community, I have a dataset which looks like:

data have;
input ID state $ city $ salary @@;
datalines;
1 A A 100
1 A B 100
1 A C 101
1 B D 102
2 B E 99
2 B F 99
2 B G 99
3 A C 88
4 C H 120
4 D J 110
;
run;

For each id, I would like to compute all unique values of a variable, how many times a variable change from all rows; if possible, I would also need to compute how much is the change if the variable is numeric.

 

Let me elaborate using id=1 as an example, from the 4 observations for id=1, the unique salary values are 100, 101, and 102, so salary changed 2 times from all rows for id=1, and the changes are 1 and 2.

 

Using id=4 and city as another example, the output for would be H and J, and city changed 1 time.

 

I am still new to SAS and I really am having difficulty thinking of a logic to work out those outputs for each id. I tried using first.id and last.id to check but that wouldn't include the 'middle' observations. Can this be done via data step, or is SQL required? Might someone be willing to provide some assistance?

 

Thank you

 

 

 

 

 

 

4 REPLIES 4
ballardw
Super User

You should include an example of how you expect the final output to appear.

 

There may be some other considerations on how that output would be generated/reported.

For example your discussion of City did not include State at all. If the State changes would you expect there to not be a change in City? Should that actually be considered a change?

What if the value changes back such as the following. Does the city actually change 5 times? That would depend on the ORDER of the data and it might be that for some reason the order has gotten changed from a different one where all the city H values were in sequence.

4 C H 120
4 D J 110
4 C H 99
4 D J 100
4 C H 88
4 D J 90
hellorc
Obsidian | Level 7
Hello Ballardw, thank you so much for your response! Apparently I did not think deep enough, I apologize. After confirming with my teammate, the dataset is in the format that when there are multiple rows for the same subject, at least the city would be different. The state can repeat or change back and forth, but city is ALWAYS unique (So for the same id, there will not be rows with both same state and city). So if there are 4 rows for a subject, the 4 rows would have 4 different cities.

Using the example in my question and consider "state" as the variable of interest, I am hoping in general the output would be something like:
id: 1
unique state: A, B; changed 1 time
unique cities: A, B, C, D; changed 3 times
unique salary: 100, 101, 102; changed 2 times
...
similarly for other id's. The example is not perfect as I realized. Consider the state in your example with id=4 and 6 rows, the output would be:
id:4
unique state: C,D; changed 5 times.
salary: 120, 110, 99, 100, 88, 90; changed 5 times. (hopefully the difference for each change can also be computed).

There is also a date variable which is how the data is sorted by originally, we will stick with this order.

I am really stuck on how to generalize the logic since the dataset contains about 200 subjects. Any advice is welcomed, thank you in advance!
Quentin
Super User

If  you're familiar with SQL, I would think of using COUNT of SELECT DISTINCT as a starting point.  Your concept 'changed X times' is just the number of distinct values - 1.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
ballardw
Super User

@Quentin wrote:

If  you're familiar with SQL, I would think of using COUNT of SELECT DISTINCT as a starting point.  Your concept 'changed X times' is just the number of distinct values - 1.


The difference values of the Salary variable will still require a data step to process the data in order.

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 717 views
  • 0 likes
  • 3 in conversation