BookmarkSubscribeRSS Feed
rookie21
Obsidian | Level 7

Hello,

I have a dataset with repeated variables.
As my dataset was long, I used the datastep merge to make it wide, so one line with all data for the corresponding subjectID.
I have given the name of the variables with a number to differentiate between these.
It consists of four main datasets with different repeatings.
The four datasets, Im merging are:
- ana
- *implants (11 implant sets)
- *gen (10 gen sets)
- *lead (15 lead sets)

Below I show one of each of these - to illustrate the repeating events.

 



data secondimplant; set output_data;
where redcap_repeat_instrument = "implantation" and redcap_repeat_instance = 2;
idpatient = idpatient_2;
keep idpatient Impl_id Impl_id2 Impl_Date Impl_Date2 impl_h Impl_h2 Implw Impl_w2 inst inst2
implhwtype implhwtype2 implprtype implprtype2;
rename
Impl_id = Impl_id2
Impl_Date =Impl_Date2
Impl_h =Impl_h2
Implw =Impl_w2
inst = inst2
implhwtype = implhwtype2
implprtype =implprtype2;
run;

 

data secondgen; set output_data;
where redcap_repeat_instrument = "generator" and redcap_repeat_instance = 2;
idpatient = idpatient_3;
keep idpatient Idimpl Idimpl2 gentype gentype2 gensite gensite2 genprevsite genprevsite2 genstatus genstatus2
genexpdate genexpdate2 genreason genreason2 genprocedure genprocedure2 gen_date_2;
rename
Idimpl = Idimpl2
gentype = gentype2
gensite = gensite2
genprevsite = genprevsite2
genstatus = genstatus2
genexpdate = genexpdate2
genreason = genreason2
genprocedure = genprocedure2;
gen_date_2 = "";
run;

 

I have sorted all my set before merging.


My merging step looks like:

 


data ana2; merge ana firstimplant secondimplant thirdimplant fourthimplant fifthimplant
sixthimplant seventhimplant eigthimplant ninthimplant tenthimplant eleventhimplant firstgen
secondgen thirdgen fourthgen fifthgen sixthgen seventhgen eigthgen ninthgen tenthgen
firstlead secondlead thirdlead fourthlead fifthlead sixthlead seventhlead eightlead
ninthlead tenthlead eleventhlead twelfthlead thirtheenthlead fourteenthlead
fifteenthlead;
by idpatient;
run;

 

 

 

This works.

Now I would like to
1. copy a date from a variable(a date) in between the merged dataset to two other columns in same dataset if it meets my condition,
2. and afterwards line them chronologically with eldest date first and newest date latest for each subject
3. Hereafter I would like to rename the rows of variable called lead_1 - lead_15 with the corresponding data.

To 1. part:
The variable Impl_id1, Impl_id2, Impl_id3, .... Impl_id11 corresponds to Idimpl_1, Idimpl_2, Idimpl_3...Idimpl_10. Im a aware that impl_id goes to 11 while Idimpl goes to 10.
I think thats why, this code is not working properly.
If the number in Impl_id is equal to Idimpl, then I would like to copy the date in impl_date to gen_date.
If the number in Impl_id is equal to Idimpl_2, then I would like to copy the date in impl_date to lead_date.I tried this code:

 

 

proc sql;
create table C as
select a.*
from class1 as a
full join class2 as b
on a.Impl_id1 = b.Idimpl1;
quit;

 

 

However this does not copy any dates. I think I need a way to specify the numbering? I tried with prefix= but found out that only goes with proc transpose...


I also tried with the proq sql, where I separated the merged dataset again in 2 to see whether I could copy one date at a time.

 

proc sql;
create table C as
select a.*
from class1 as a
full join class2 as b
on a.Impl_id1 = b.Idimpl1;
quit;

So Im still stuck in first part 

 

 

This did not work as well.

 

Anyone who can help me get a bit further from here?

 

4 REPLIES 4
maguiremq
SAS Super FREQ

Okay, a couple things to tackle here:

 

1. We need a reproducible example. I'm not sure what the confidentiality is regarding your data, but this is a great macro to get it in a workable form for us: https://blogs.sas.com/content/sastraining/2016/03/11/jedi-sas-tricks-data-to-data-step-macro/ 

2. To me, it may be best to keep your data in long form. The amount of merges you're doing seems like unnecessary work.

 

I may not understand your issue, so (2) may be out of line, but (2) depends on (1).

PaigeMiller
Diamond | Level 26

Agreeing with @maguiremq 

 

It seems like you have already decided that a wide data set is best; let's not go there just yet. Start by describing the problem not in terms of SAS, but in terms of what data you have, and what the end result(s) should be — so no SAS code in the description, no PROC this or PROC that, just English words describing the problem.

 

We will need to see a portion of the starting data (in the long format), as SAS data step code, as you provided in one of your other threads. We will need to see some representation of the desired output.

--
Paige Miller
ballardw
Super User

A coding style comment that can affect complexity of SAS coding:

This step uses 1) ordinal words; first second third etc as 2) prefixes for what I suppose from your statements are related.


data ana2; merge ana firstimplant secondimplant thirdimplant fourthimplant fifthimplant
sixthimplant seventhimplant eigthimplant ninthimplant tenthimplant eleventhimplant firstgen
secondgen thirdgen fourthgen fifthgen sixthgen seventhgen eigthgen ninthgen tenthgen
firstlead secondlead thirdlead fourthlead fifthlead sixthlead seventhlead eightlead
ninthlead tenthlead eleventhlead twelfthlead thirtheenthlead fourteenthlead
fifteenthlead;
by idpatient;
run;

 SAS code can be much simplified by using the concept as the prefix: Implant, Gen, Lead or what have you and a suffix as a numeric: Gen1, Gen2, Gen3 (or if you prefer Gen01 Gen02 Gen03). Why you may ask at this point. SAS allows use of a couple of forms of shorthand variable name lists. One is prefix only which could be Gen: to reference all variable names that start with the letters Gen. That may be to inclusive for some uses so SAS allows an interval such as Gen1 - Gen10 to shorthand list all of Gen1, Gen2, Gen3 ... to Gen10. The same also applies to DATASET NAMES.

So instead of the above you could use (not that I am sure it is a good idea)

 

data ana2;
   merge ana
         implant1 - implant11
         gen1 - gen10
         lead1 - lead15
   ;
   by  idpatient;
run;

Which is 1) shorter to type, 2) easier to follow what is going on, especially when indenting is used in code 3) would be much easier to modify if you only needed implant1 through 7: change the 11 to 7 instead of deleting or adding a bunch of names.

If the data sets are in the same library (hint) you just use the libname with each: Thislib.implant1 - Thislib.implant11, which saves even more typing instead of Thislib.firstimplant.

If you are making data sets in any sort of iterative fashion it is likely much easier to name them with a suffix than an ordinal prefix as well.

 

I would say that instead of that renaming and merging your "secondimplant" set might be better off just adding a variable ImplantNumber=2 and similar but without a full description of the process how to use this stuff is problematic.

 

rookie21
Obsidian | Level 7

Thanks for all your comments 

 

I checked the link with macro-step to create data in presentable, but honestly I cound even not get that working. 

The data Im using is confidential, but I've taken 5 patients information and changed all which is referable. Now its anonoumys and can be reproduced by you. 

The data is now presented in long format as a CSV file. 

 

I also tried to change my variable names using number as suffix to shorten the code, and that helped . Thanks 😄

 

With ordinary English words what Im trying. 

- collect data for each patient compromised (Im not sure wide dataset is best, but I tried that with merge step because I was advised to), so I can make analyses. 

- Data includes 5 patient (record_id 1-5). 

- Each patient can have several implantations (impl_id). For each implantation there is corresponding data from the same operation, which is in redcap_repeat_instrument called generator and lead. A patient might have only one implantation with corresponding generator and multiple lead informations.

Only id  is common for those three instruments (implantatation + generator + lead) . The id names:  

 

                                         implantatation + generator +   lead 

the name of variable:         impl_id                idimpl         idimpl_2

I would like to make sure the date of implantation is copied to generator and lead data without loosing the corresponding data for that lead. (with long data each lead has its own row with all necessary information) 

 

As you saw from the code yesterday I was creating a new column called gen_date_X and the same for lead information. 

data seventhgen; set output_data;
where redcap_repeat_instrument = "generator" and redcap_repeat_instance = 7;
idpatient = idpatient_3;
keep idpatient Idimpl Idimpl7 gentype gentype7 gensite gensite7 genprevsite genprevsite7 genstatus genstatus7
genexpdate genexpdate7 genreason genreason7 genprocedure genprocedure7 gen_date_7;
rename
Idimpl = Idimpl7
gentype = gentype7
gensite = gensite7
genprevsite = genprevsite7
genstatus = genstatus7
genexpdate = genexpdate7
genreason = genreason7
genprocedure = genprocedure7;
Spoiler
gen_date_7 = "";

run;


I thougt I could use if and then statement. 
But as I have 11 implantations, 10 generators and 15 leads, I cant be sure they find each other correctly. 

one  impl_id (from implantation information) can be equal to 2 lead information. 

I would like: 

If impl_idX = idimplX then gen_dateX = impl_dateX

 

 

I then need them to be in chronologic order for lead, generator and implantion data, as I need to divide my cohort in patients with "first" lead beeing transvenous, endocardial, subcutaenous or unknown. 

To this purpose I made som small coding to classify each lead information in above mentioned groups according to variables: manufacture_2, modelname_2 and modelnumber_2. 

When I doubblechecked my code for the classifying of groups, I found out I need all the lead information with date to order them chronologically. 

I used this: 

 

proc print data =  ana2; 
where implprtype1 = "First";
run; 

 

rookie21_0-1638979952613.png

As seen here, 23+4 patients does not have this lead beeing the first. 

I cant assume all information on lead with suffix 1 neccesarily really was their shirt implantation. To solve this I wanted to create this date column to generator and lead information. 

 

The reason why I thought changing the set to wide is I could not see how to get SAS to read in different lines to search for information. 

Tried with proc sql to gather the information, but as you can understand, Im really new to this. 

Im not even sure the ordering of task is correct 

 

The main task is to compare outcomes of 4 different groups. I have to divide the patients based on their first leads to characterize the groups before comparing endpoints .. 

 

Hope it got a bit more clear ? 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 669 views
  • 4 likes
  • 4 in conversation