Re: Add record ID prior to proc transpose

LoisHNewMexico · Posted 01-18-2011 12:18 PM

I have a dataset with multiple lines per record. Each record is a healthcare encounter, identified using a combination of health record ID and date of encounter, which appear on the first line, only. I would like to add an Encounter ID variable to all lines of the encounter, and then use proc transpose BY EncounterID to get my diagnosis and procedure data onto the firstline. Here is basically what I've got (hopefully this will format to something legible):

LineNum HlthRecordID City ServiceDate Diagnosis Procedure

1 mc21898 Cty, NM 4/28/2007 4.74 368.47
2 . . . 86.35 326.87
3 mc59002 Cty, UT 8/8/2007 88.61 966.07
4 . . . 49.22 625.9
5 . . . 333.18
6 . . . 922.09
7 . . . 847.55
8 mc21898 Cty, NM 6/4/2007 41.17 869.38
9 . . . 43.26 193.39
10 . . . 86.35 799.5
11 . . . 4.06 641.92
12 . . . 536.27
13 . . . 971.87
14 mc14473 Cty, AZ 8/16/2007 64.11 60.2
15 . . . 91.56 663.38
16 . . . 91.58 800.32
17 . . . 684.24
18 . . . 529.19
19 . . . 747.11
20 . . . 849.6

format data in message

Message was edited by: LoisH New Mexico

Message was edited by: LoisH New Mexico oops, I meant proc transpose!

Message was edited by: LoisH New Mexico

sbb · Posted 01-18-2011 12:45 PM

Suggest looking at a SAS DATA step with "BY GROUP PROCESSING" where you will have a "counter variable" incremented with each new "condition" which you would define, based on your SAS variables. In your code, you will want to consider using a code piece like:

IF FIRST. THEN DO;
* your assignment logic here ;
END;

possibly using a RETAIN statement to increment your Encounter_ID variable, finally resetting it back to some initial-value with each new "condition".

You will find the SAS support http://support.sas.com/ website, with SAS-hosted DOC and supplemental technical / conference, can provide guidance and code examples. As well, there are forum archives on this topic of creating an "ID variable" for tracking purposes.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search arguments, this topic / post:

data step programming by group processing site:sas.com

proc transpose id variable site:sas.com

data_null__ · Posted 01-18-2011 04:11 PM

This is similar to a program I have seen sometimes with data read from EXCEL spread sheets.

See if this example works for you.

[pre]
data test;
infile cards missover;
input LineNum HlthRecordID:$8. City &$16. ServiceDate:mmddyy. Diagnosis Procedure;
format servicedate mmddyy.;
cards;
1 mc21898 Cty, NM 4/28/2007 4.74 368.47
2 . . . 86.35 326.87
3 mc59002 Cty, UT 8/8/2007 88.61 966.07
4 . . . 49.22 625.9
5 . . . 333.18
6 . . . 922.09
7 . . . 847.55
8 mc21898 Cty, NM 6/4/2007 41.17 869.38
9 . . . 43.26 193.39
10 . . . 86.35 799.5
11 . . . 4.06 641.92
12 . . . 536.27
13 . . . 971.87
14 mc14473 Cty, AZ 8/16/2007 64.11 60.2
15 . . . 91.56 663.38
16 . . . 91.58 800.32
17 . . . 684.24
18 . . . 529.19
19 . . . 747.11
20 . . . 849.6
;;;;
run;
proc print;
run;
data filled;
if 0 then set test;
set test(rename=(HlthRecordID=_1 City=_2 ServiceDate=_3));
HlthRecordID = coalesceC(_1,HlthRecordID);
City = coalesceC(_2,City);
ServiceDate = coalesce(_3,ServiceDate);
drop _:;
run;
proc print;
run;
[/pre]

Ksharp · Posted 01-19-2011 12:40 AM

Hi.
Be honest,I can not understant what your mean totally.

>to get my diagnosis and procedure data onto the firstline

You said first line.But there are two variables (Diagnosis and Procedure) ,namely it will appear two lines not one line after proc transpose.

[pre]
data test;
infile cards missover dsd delimiter=' ';
input LineNum HlthRecordID : $8. City : $16. ServiceDate : mmddyy10. Diagnosis Procedure;
format servicedate mmddyy10.;
cards;
1 mc21898 Cty,NM 4/28/2007 4.74 368.47
2 86.35 326.87
3 mc59002 Cty,UT 8/8/2007 88.61 966.07
4 49.22 625.9
5 333.18
6 922.09
7 847.55
8 mc21898 Cty,NM 6/4/2007 41.17 869.38
9 43.26 193.39
10 86.35 799.5
11 4.06 641.92
12 536.27
13 971.87
14 mc14473 Cty,AZ 8/16/2007 64.11 60.2
15 91.56 663.38
16 91.58 800.32
17 684.24
18 529.19
19 747.11
20 849.6
;;;;
run;
data temp(keep=EncounterID Diagnosis Procedure);
set test;
retain EncounterId ;
if not missing(HlthRecordID) then EncounterId=catx('_',HlthRecordID,put(ServiceDate,mmddyy10.));
run;
proc sort data=temp;
by EncounterID;
proc transpose data=temp out=result;
by EncounterID;
run;
[/pre]

Ksharp

LoisHNewMexico · Posted 01-19-2011 11:46 AM

Each health system encocunter has several lines of data, with the identifying information, the first diagnosis code and the first procedure code on the first line, while the remaining lines have only the additional diagnosis and procedure codes. I'd like the diagnosis code on the first line to populate a variable called Diag1, the diagnosis code on the second line to populate a variable called Diag2, etc. And likewise with the procedure codes. The final result will be to have the identifying information AND all the diag and proc variables (up to Diag18 and Proc18) on one line per health system encounter.

I haven't had time to try any of the suggestions, yet. Thank you to everyone who has responded!

Ksharp · Posted 01-19-2011 09:59 PM

Hi.
OK. That would not be too difficult.
See whether it is what you want.

[pre]
data test;
infile cards missover dsd delimiter=' ';
input LineNum HlthRecordID : $8. City : $16. ServiceDate : mmddyy10. Diagnosis Procedure;
format servicedate mmddyy10.;
cards;
1 mc21898 Cty,NM 4/28/2007 4.74 368.47
2 86.35 326.87
3 mc59002 Cty,UT 8/8/2007 88.61 966.07
4 49.22 625.9
5 333.18
6 922.09
7 847.55
8 mc21898 Cty,NM 6/4/2007 41.17 869.38
9 43.26 193.39
10 86.35 799.5
11 4.06 641.92
12 536.27
13 971.87
14 mc14473 Cty,AZ 8/16/2007 64.11 60.2
15 91.56 663.38
16 91.58 800.32
17 684.24
18 529.19
19 747.11
20 849.6
;;;;
run;
data temp(keep=EncounterID Diagnosis Procedure);
set test;
retain EncounterId ;
if not missing(HlthRecordID) then EncounterId=catx('_',HlthRecordID,put(ServiceDate,mmddyy10.));
run;
data op(keep=encounterid varname value);
set temp;
array a{2} diagnosis procedure;
if encounterid ne lag(encounterid) then count=0;
count+1;
do i=1 to 2;
varname=cats(vname(a{i}),count);
value= a{i};
output;
end;
run;
proc sort data=op;
by encounterid;
run;
proc transpose data=op out=result(drop = _name_);
by encounterid;
id varname;
var value;
run;
proc print;run;
[/pre]

Ksharp

deleted_user · Posted 01-19-2011 01:39 PM

Hello,

my solution is based on reading data twice, first in order to create the maximum number of observation per id, and second to generate the transposed data:

data inp;
infile datalines missover;
input @;
if anyalpha(substr(_infile_,3,4)) then do;
input @3 id:$8. City $ 11-18 date:mmddyy. Diagnosis Procedure;
end;
else do;
input @30 Diagnosis Procedure;
end;
format date mmddyy.;
datalines;
1 mc21898 City, NM 4/28/2007 4.74 368.47
2 . . . 86.35 326.87
3 mc59002 City, UT 8/8/2007 88.61 966.07
4 . . . 49.22 625.9
5 . . . 333.18
6 . . . 922.09
7 . . . 847.55
8 mc21898 City, NM 6/4/2007 41.17 869.38
9 . . . 43.26 193.39
10 . . . 86.35 799.5
11 . . . 4.06 641.92
12 . . . 536.27
13 . . . 971.87
14 mc14473 Cty , AZ 8/16/2007 64.11 60.2
15 . . . 91.56 663.38
16 . . . 91.58 800.32
17 . . . 684.24
18 . . . 529.19
19 . . . 747.11
20 . . . 849.6
;
run;

data out ;
retain rec_id city_from date_service ;

do until (last);
set inp (keep=id) end=last;
by id notsorted;
if first.id and ^missing(id) then count=1;
else count+1;
end;

call symput('new',put(count,7.));

array diag{&new};
array proc{&new};

do until (n_last);
set inp (rename=(id= old_id City =old_City date=old_date)) end=n_last;
by old_id notsorted;

if first.old_id and ^missing(old_id) and ^missing(lag(Diagnosis)) then output;

if first.old_id and ^missing(old_id) then do;
n_count=1;
rec_id=old_id;
city_from=old_City;
date_service=old_date;
end;
else n_count+1;
Diag{n_count}=Diagnosis;
Proc{n_count}=Procedure;
if n_last then output;
end;

format date_service mmddyy10.;

drop old_id old_City old_date id count n_count Diagnosis Procedure;

run;

Marius

Add record ID prior to proc transpose