BookmarkSubscribeRSS Feed
deleted_user
Not applicable
This issue is tricky so here goes:

I am working with complaint data. Each complaint is called a "case." Each case has a unique case number that never changes. Associated with the case number is a case type id that DOES change when the level of the complaint changes. Complaint levels are assigned based on the severity of the complaint and can go up or down. In other words if someone calls up with what seems like a major complaint but it is discovered while looking into the complaint that the complaint is really based on something very minor, the level of complaint will be ramped down. If, however, someone calls up with what seems like a minor complaint but it is discovered that there is really a major issue involved, the complaint type will be ramped up. Make sense? So, when cases change type, they are given a new "type ID." This type ID is usually a four-digit number. To really muddy the water, within each case, there can be multiple complaints. While all of these live under the same case number, they will each be assigned a different code identifying the general nature of the complaint. I am trying to calculate the total number of COMPLAINTS, not cases. The problem is that in my dataset, all of the type IDs come in. So I have type IDs that include those first assigned to a case as well as those last assigned to a case. I need to select just the highest number type IDs sitting under each case number. If this were SQL, I would use a MAX function (just return the MAX type ID for each case). Not sure how to do this in SAS. Please remember that the type ID is a one to many relationship so there could be, for example, four of the same type IDs in one case each associated with a different complaint. I sure hope this makes sense. Thanks in advance! Message was edited by: coa125
19 REPLIES 19
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
You have SAS PROC SQL to consider using, if that approach is more comfortable, however it has limitations when comparing to SAS DATA step programming. The SAS support http://support.sas.com/ website has SAS-hosted documentation and supplemental technical and conference reference material which will be useful.

Scott Barry
SBBWorks, Inc.

SAS SQL Procedure DOC: Introduction to the SQL Procedure
http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/a002536894.htm

SAS SQL Procedure DOC: Comparing PROC SQL with the SAS DATA Step
http://support.sas.com/documentation/cdl/en/sqlproc/62086/HTML/default/a001409810.htm

SAS Language Reference: Concepts - Data Step Processing
http://support.sas.com/documentation/cdl/en/lrcon/61722/HTML/default/a001281588.htm

Step-by-Step Programming with Base SAS Software - Understanding DATA Step Processing
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a001304324.htm
deleted_user
Not applicable
Thanks for the links but I've been there and have not found an answer to my problem. I just need a way to select the maximum value of my type_id AND have the script bring in all associated complaint observations. IF and WHERE don't get it because the ID is randomly generated and so I can't use GT to get there.

Thanks anyway.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Consider using a DATA step approach -- after sorting your file as needed, then use the DATA step with a SET and a BY statement. Use either IF FIRST. and/or LAST. to identify what observation(s) you want to output.

Scott Barry
SBBWorks, Inc.
deleted_user
Not applicable
Thank you. I did try this but am still getting every type_ID ever associated with the case, not just the type_ID that came up first (or last) in the sort.

I really tried looking for a way to use a GT statement but the type_IDs are all over the place number-wise.

Thank you, anyway.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
You really need to share your code - preferably the SAS-generated log output and include this SAS DATA step statement:

PUTLOG _ALL_;


The data handling situation is not complicated, as explained -- so the issue here is getting your SAS programming code to handle it.

Scott Barry
SBBWorks, Inc.
Patrick
Opal | Level 21
Not sure if I understand your data - and narratives are sometimes just too hard to read.
Some example data would be helpful (i.e. you provide the first data step creating the data) and then show an example how the result should look like.


Hope the following helps:


data have;
input case_id type_id complaint_code:$1.;
datalines;
1 1 a
1 2 a
1 1 b
1 2 b
1 3 b
1 1 c
1 1 d
2 1 a
2 2 a
2 1 b
2 2 b
2 1 c
;

/* data step version */
proc dataset data=have;
by case_id complaint_code;
run;

data want(keep=Ncomplaints);
set have end=last;
by case_id complaint_code;
if last.complaint_code then
do;
Ncomplaints+1;
end;
if last then output;
run;

proc print data=want;
run;

/* SQL version */
proc sql;
create view haveV as select distinct case_id,complaint_code from have;
select count(*) as Ncomplaints from haveV;
quit;

Regards, Patrick
deleted_user
Not applicable
Thank you, Scott and Patrick.

This is the data I get. It's a subset, obviously. Note that I am getting ALL Type_IDs. I need just the highest Type_IDs associated with each case. Remember that this is a one to many relationship. My log is in a separate reply.

Case_No Type_ID Com_Type Init_Desc
8-0092 11615 75 79
8-0092 10380 74 79
8-0092 7182 75 79
8-0092 7182 75 79
8-0092 7182 75 79
8-0092 7182 75 79
8-0092 7182 75 79
8-0093 7152 74 79
8-0094 7153 75 80
8-0094 7153 75 80
8-0095 7154 74 79
8-0096 7155 74 79
8-0097 7156 74 79
8-0098 7157 74 79
8-0100 7201 75 79
8-0101 7183 75 79
8-0101 7183 75 79
8-0101 7183 75 79
8-0101 7183 75 79
8-0102 9636 76 79
8-0102 7160 75 79
8-0103 7202 74 79
8-0105 11637 76 80
8-0105 9686 75 80
8-0105 7225 76 80
8-0106 9634 74 79
8-0106 7161 76 79
8-0107 9635 74 79
8-0107 7162 76 79
8-0108 7226 76 79
8-0108 7226 76 79
deleted_user
Not applicable
8-0101 7183 75 79
8-0101 7183 75 79
8-0101 7183 75 79
8-0101 7183 75 79

Does this represent one complaint that has been serviced 4 times or 4 complaints of the same kind of thing within a single case?
deleted_user
Not applicable
This represents one case that has four complaints - four complaints of the same type.
deleted_user
Not applicable
Ok, now either things are clearing or getting really confusing leading to impossible.


8-0105 11637 76 80
8-0105 9686 75 80
8-0105 7225 76 80

Is this 1 complaint in one case where the level of the complaint was changed, or 3 separate complaints, or 2 complaints for a case where one has changed its level?

Are these the only columns to work with?

Does each record, each observation in SAS speak, represent a unique complaint within a given case?
deleted_user
Not applicable
Thank you. That's where I am, too.

In the meantime, this is one case where onecomplaint changed levels 3 times. In this particular case, I would only want the observation with type_id 11637 returned in my dataset.

In cases where there are multiple type_IDs, it is only the highest value type_ID that I am after.

You nailed the problem on the head. There is no identifier in this database that is absolutely unique, i.e., that is assigned to one observation (row) only. My workaround is to pull the data into Excel and simply delete the rows I don't want.
deleted_user
Not applicable
Your comment about a unique identifier for a single row of data is confusing or could be misleading. If each row/record/observation represents a unique thing, then the rowid is the unique identifier for that row. If on the other hand, which I think is the case, a case can consist of mulitiple complaints, an a complaint can have multiple rows, and there is no column or set of columns that uniquely identifies a complaint, then it is impossible for a computer to know how to count count complaints. How can it know when a complaint consists of one, two or three rows. Of the data that I have pulled from your set and asked questions about, you have 2 cases, with apparently 5 complaints.

So, the question is, how do you, the human, know that in one case you had 4 separate complaints, and in the other only 1 complaint that existed in 3 states?
deleted_user
Not applicable
The case number tells me it is all one case. When the type_ID is the same for several observations, this tells me how many complaints were assigned to the case. When a type_ID becomes a larger number within the same case number, it tells me that the complaint level changed.

So, for example, in case 08-0092, you see that the type_ID was 7182 for 5 rows. This means this case started with 5 complaints. Then to one and then one again. In this case, the complaint went from 5 complaints to 1 complaint while the complaint level also changed.
deleted_user
Not applicable
Is it always true that if there is a change in the type_ID that there is always a consolidation of the complaints to a single complaint?

Is it possible that if there are more than 1 original complaint, and that there is more than 1 new type_ID, that perhaps 2 or more of the original complaints changed, but that the original number of complaints stays the same?

Is it possible that one or more new complaints were added to the case, thus receiving new type_ID's?

From what you described, would you count 5 complaints for 08-0092 or only 1 complaint?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 923 views
  • 0 likes
  • 3 in conversation