BookmarkSubscribeRSS Feed
GAL1986
Fluorite | Level 6

Hi Everyone, 

Thanks in advance for the help! I have a dataset with 827,304 observations and over 170 yes(1)/no(0) variables and I am trying to get one line per baby, per visit type, for each of the 170 different diagnoses. 

This is a sample of what I have

Baby_IDVisit_IDType of visitDx_dateDX_JaundiceDX_ZosterDX_SepsisDX_HepatitisDX_Well_baby
1231Delivery4/20/201910000
1231Delivery4/20/201900100
1233Ever5/8/201900010
1232Postnatal4/25/201901000

And this is what I want the data to look like

Baby_IDVisit_IDType of visitDx_dateDX_JaundiceDX_ZosterDX_SepsisDX_HepatitisDX_Well_baby
1231Delivery4/20/201910100
1233Ever5/8/201900010
1232Postnatal4/25/201901000


I would usually transpose the variables of interest to get one line per ID, but with over 170 variables, that doesn't seem feasible. And 

 

I think I am going down the wrong track here, but I have tried to use RETAIN
data Test_retain;

set test;
by baby_id;
retain highest;
if first.baby_id then highest=.;
highest=max(highest,DX_Jaundice);
if last.baby_ID then output;
run;

 

and some SQL code

 

Proc sql;
Create table Max as
Select baby_ID, Type_of_visit, DX_date, visit_ID, max(DX_Baby_Hear_Screen_Fail) as DX_Baby_Hear_Screen_Fail,
max(DX_Baby_Single_Live) as DX_Baby_Single_Live,
max(DX_baby_Hep_Vaccination) as DX_baby_Hep_Vaccination
From test2
Group by visit_ID ;
Quit;

 

If someone could point me in the right direction it would be greatly appreciated! 

3 REPLIES 3
Reeza
Super User
Use PROC MEANS instead. You're looking for the max of the diag variables and group by dates.
You can use short cut lists to refer to the diagnosis list.

https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
ballardw
Super User

A bit more code oriented approach to @Reeza's suggestion:

proc summary data=test nway;
   class baby_id visit_id typeofvisit ;
   var dx_jaundice -- dx_lastdxvariable;
   output out=want (drop=_type_ _freq_) max=;
run;

The above assumes that all of the dx variables you want are adjacent (sequential variable number order as reported by Proc contents) in the data set as implied by your post. The double dash -- above is used to indicate that property.

If your dx_date had started with different letters in the name could have used the dx_: list type to get all variables whose names start with dx_.

 

Assuming no visit_id crosses a date boundary that might not be an issue. If you need the date for later processing you could add the variable to the class statement BUT if there are two or more dates associated with a single visit_id this won't work properly as each date would be summarized separately.

GAL1986
Fluorite | Level 6

So sorry it took me so long to reply- THANK YOU!

This worked perfectly!!!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 5102 views
  • 0 likes
  • 3 in conversation