Help using Base SAS procedures

Count duplicates in date order

Reply
N/A
Posts: 0

Count duplicates in date order

Hi

I have a dataset with admissions to hospitals in a 5 year period. I have the admissions numbered with an admission id number and an additional people id number to identify the different inidividuals in the dataset. Some individuals have more than one admission to the hospital and these cases will have different admission id numbers but the same people id number. I want to count how many admissions each person in the dataset has and order these according to dates.

Can anyone please help me.
Super Contributor
Posts: 474

Re: Count duplicates in date order

Posted in reply to deleted_user
For the counting, a simple sql query will do the job.

proc sql;
select ID_PERSON, count(*) as COUNT from ADMISSIONS group by ID_PERSON;
quit;

Not sure about what you wish to do with the date ordering.

Do you want to order the counts by the lastes admission date?
Or do you wish to simply order the ADMISSIONS dataset?

Cheers from Portugal.

Daniel Santos @ www.cgd.pt Message was edited by: Daniel Santos
Super Contributor
Super Contributor
Posts: 3,174

Re: Count duplicates in date order

Posted in reply to deleted_user
It's unclear if you already have a SAS dataset or an external file and in what format -- you can use a DATA step to read an external file, while creating a SAS numeric DATE type variable for your admin date. With that file read up, then SAS PROC SUMMARY can complete this task. Or you may want to explore using SAS PROC SQL to group the observations based on the admin ID.

There is SAS-host documentation and technical/conference papers on this type of topic, available at the SAS support http://support.sas.com/ website, either using the website SEARCH facility or a Google advanced search adding on the site:sas.com parameter to limit the search.

Scott Barry
SBBWorks, Inc.


SAS DATA Step Processing
http://support.sas.com/documentation/cdl/en/lrcon/61722/HTML/default/a001281588.htm

SAS 9.2 Language Concepts - About SAS Date, Time, and Datetime Values
http://support.sas.com/documentation/cdl/en/lrcon/61722/HTML/default/a002200738.htm
N/A
Posts: 0

Re: Count duplicates in date order

Posted in reply to deleted_user
Thank you for your responses!

I can see, that I havent been that clear about what I wanted, so I will try and explain it in another way - sorry!

In our dataset the cases are admissions. We would like to turn the dataset around to turn the individuals into the cases. Thereby we make the admissions into variables. If an individual has more than one admission, we would like to number these admissions according to admission date - and thereby make new variables, each indicating which number admission it is.

Hope that you can help me
Super Contributor
Posts: 474

Re: Count duplicates in date order

Posted in reply to deleted_user
Much more clearer now.

OK, forget my previous suggestion.

proc sort data = ADMISSIONS;
by ID_PERSON DATE;
run; /* sort */

data ADMISSIONS_C;
set ADMISSIONS;
by ID_PERSON;
if first.ID_PERSON the COUNT=0; /* reset counter */
COUNT+1; /* count admission, implicitly retained */
run;

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
Super Contributor
Super Contributor
Posts: 3,174

Re: Count duplicates in date order

Posted in reply to deleted_user
Honestly, your use of "variables" reference is still unclear, at least to me, in comparison to data rows (in SAS called observations). Best to paste in your reply some INPUT side data rows/columns and OUTPUT side (desired) rows/columns for review/comment and feedback.

For myself, I will hope to attempt to lead you with some ideas for SAS programming techniques -- others may decide to do the programming for you, which, in my opinion, defeats the opportunity (with each post) for learning the SAS language by trying.

Scott Barry
SBBWorks, Inc.
Super Contributor
Posts: 474

Re: Count duplicates in date order

Got the message Scott.

Not being english my natural language, it is sometimes difficult for me to explain some matters with letters, it's way more simple to just code it Smiley Happy

But I agree with you, and I'll make an effort to explain my suggestion rather than give them away.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
PROC Star
Posts: 1,759

Re: Count duplicates in date order

Fair point Scott.
On the other hand, I would argue that giving working code still allows the reader to learn from it (if the reader makes the effort obviously).
I would even argue that one can learn more from well-written code (even though definitions for this will vary) than from fiddling with badly understood language features.
Lastly, giving working code also proves the proposed solution actually works, rather than giving ideas that may or may not answer the question.
To each their own, as usual... Smiley Happy
N/A
Posts: 0

Re: Count duplicates in date order

Posted in reply to deleted_user
Thank you Daniel!

That was very helpful - and I think that I actually did learn from it :-)

However, I have entered a new problem.
After having sorted the admissions and created the new "count" variable, I have created a new variable for each admission, that is, I have said:

If count=0 then admission1="X";
If count=1 then admission2="X";
and so on..

The problem is now, that individuals, who have had more than one admission will appear in the new datasat more than one time. The viewtable looks like this:


Count Admission 1 Admission 2 Admission 3
Person A 1 x
Person A 2 x
Person A 3 x


What I would like, is that the individuals (here: person A) only appear one time in the view table - but that all information about each of the admissions is available. That is, a viewtable like this:

Admission 1 Admission 2 Admission 3
Person A x x x

Can anyone help me with this?

Thank you!
N/A
Posts: 0

Re: Count duplicates in date order

Posted in reply to deleted_user
Sorry - the tables i had put in, have changed. This was what I meant:

I want to change from this view table:
...............Count...........Admission1...........Admission2..........Admission3
Person A.....1.....................x
Person A.....2.................................................x
Person A.....3............................................................................x


To this:
...................Admission1............Admission2...............Admission3
Person A.............x............................x.............................x
Super Contributor
Posts: 474

Re: Count duplicates in date order

Posted in reply to deleted_user
OK.

I'll try not to give you that one free. Smiley Very Happy

They are several approaches for your question.

I'll point only one.

My understanding, is that you want to summarize your data by PERSON, but retaining the ADMISSIONs whenever there is an 'X' within the group.

So you simply need to work within the group, retaining the value of each admission.

the RETAIN statement will do precisely that. Hold the last variable value within each datastep iteration.

See the online documentation:
http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a000214163.htm

To summarize your data, you just need to work with the FIRST / LAST statement (which will indicate if the iteration is at the top, middle, or bottom of the group) and perform an explicit output when desired (here, tipically at the end).

See the online docmentation here:
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a001349233.htm

Combine these two features in one datastep, and you'll have your problem solved.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
Super Contributor
Posts: 474

Re: Count duplicates in date order

Posted in reply to deleted_user
Okay, you've tried. Fair enough. ;D

data OUT (drop = _Smiley Happy; /* drop old vars */
set IN (rename = (ADMISSION1=_A1 ADMISSION2=_A2 ADMISSION3=_A3));
by PERSON; /* input dataset MUST BE sorted by PERSON */
retain ADMISSION1 '' ADMISSION2 '' ADMISSION3 ''; /* new vars */

/* reset _A* vars for each PERSON group start */
if first.PERSON then do;
ADMISSION1=''; ADMISSION2=''; ADMISSION3='';
end;

/* coalescec returns the first non empty value of a list */
ADMISSION1=coalescec(_A1,ADMISSION1);
ADMISSION2=coalescec(_A2,ADMISSION2);
ADMISSION3=coalescec(_A3,ADMISSION3);

/* output last record only */
if last.PERSON;

run;

Now, in the above solution, new ADMISSION vars are created (retained) based on the old ones, which will be dropped at the end of the datastep.
At each group start, new vars are reset to '', and for each group row, the result of the coalescec function is assigned (coalescec function returns the first non empty value of a list). Finally, and since you need only row per group, output to the dataset is perform only at the last row of the group.

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
Super Contributor
Super Contributor
Posts: 3,174

Re: Count duplicates in date order

Posted in reply to DanielSantos
Without a LENGTH statement in the above code, you will have a 1-char variable for ADMISSION1-ADMISSION3.

Scott Barry
SBBWorks, Inc.
Super Contributor
Posts: 474

Re: Count duplicates in date order

Good point.

I assumed by christina's example ('X') that a 1 char length was enough.
If it isn't, you should explicitly alocate the new vars with the LENGTH statement.

http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a000218807.htm

Cheers from Portugal.

Daniel Santos @ www.cgd.pt.
N/A
Posts: 0

Re: Count duplicates in date order

Posted in reply to DanielSantos
Hi Daniel

Thank you for the coding! One character is enough.

I have som trouble getting the coding to work. I have started with the retain statement but I cannot make it work- even with a simple coding. This is what I am doing:

data out;
set in;
by person_id;
retain ad1;
ad1=admission1;
run;

After running this statement my ad1 variable is exactly the same as my admission1 variable- the ad1 has not retained any values. Do you know what I am doing wrong?

Thanks again for your help

Smiley Happy Cristina
Ask a Question
Discussion stats
  • 17 replies
  • 224 views
  • 0 likes
  • 4 in conversation