BookmarkSubscribeRSS Feed
CraigSmith
Obsidian | Level 7

Hi SAS community,

 

I'm working on a clinical risk tool using SAS. I have another dataset for the same set of developmental participants / study (HPV Serology).

 

Does anyone know if it's feasible to incorporate a second dataset into a programme and if so, how / what code would I use please? (attached pics of current programme code and file name of what I wish to add in - it's in excel) Screenshot (95).png

 

Directory:   

 

\\campus.gla.ac.uk\SSD_Home_Data_D\2298108S\My Documents\ARCAGE DATA\ARCAGE_Craig

 

Many thanks  

 

Screenshot (94).png

8 REPLIES 8
PaigeMiller
Diamond | Level 26

Yes, there are many ways to join data sets in SAS. Just about any type of join is possible.

 

We'd need more specific details about what you are trying to do ... the word "incorporate" is pretty vague. We'd also need to actually see (a portion of) the data sets involved, provided as SAS data step code (instructions) and not in any other format.

--
Paige Miller
CraigSmith
Obsidian | Level 7

Hi Paige,

 

Thanks for your reply. 

 

I see - I'm essentially trying to merge the datatsets so I have all the information in one place for study participants and analysis - i.e. combining the demographic, behavioural and clinical data I already have with the HPV serology in the excel file (both are from same study). 

 

Not sure what step code is sorry (sorry am a total newbie at this). Nothing crazy coding wise mostly descriptive analysis and logistic regression modeling. 

 

Many thanks,

 

Craig 

PaigeMiller
Diamond | Level 26

Again, in order for us to provide code that might do what you want ... we need more details than the simple word "merge". We need to know what type of "merge". There are probably 18.532 different types of merges. Explain. Give examples.

--
Paige Miller
CraigSmith
Obsidian | Level 7

Hi Paige,

 

Thanks for your reply. Sorry if this seems counterintuitive but as someone of a non-coding background when I say "merge" - what I mean is to have all my data in one place, ready for analysis. Not split between 2 files. Let's take hypothetical person X - I have their sex, dietary, smoking, *insert here* info. In this separate file I also have the HPV serostatus(es) for person X. I don't wish to manipulate or analyse the data in either just yet - just to have all my potential variables in the one place for person X. 

 

I don't wish to touch the variables or "merge" these. I'd simply like these variables to all be together in the one programme so I can mix and match for regression models e.g. below:

 

proc logistic data= ARCAGE;
FORMAT SMOK SMOK. DRINK DRINK. SEX SEX.;
CLASS SMOK ( REF = 'Never smoker')
DRINK (REF = '0: Never')
SEX ( REF = '2: Female' )
BMIC ( REF = '2' );

model CASE = AGE10 SEX prolev
SMOK SMOKP
DRINK alcoIC
VEGETC FRUITC
DENTIS BMIC /rocci RL SELECTION = F;
run;

 

Thanks for your understanding. 

 

Craig 

PaigeMiller
Diamond | Level 26
data arcage;
    merge dataset1 dataset2;
    by person;
run;

Since I keep asking you to show us (a portion of) the two data sets, and you haven't done so, the above is the best I can do right now. Unlikely you can use it as is, most likely it won't work for your actual data.

--
Paige Miller
Tom
Super User Tom
Super User

In general if you have two datasets they need to have a common ID variable (or set of variables) that uniquely identify the observations in each dataset.  Then you can simply use the data step MERGE statement to combine them.

 

So if you have a dataset let's call DEMOG that has ID and SEX and SMOK and a second dataset named HPV that has ID and CASE (which I assume is your HPV status) then to combine then you would do:

 

data analysis;
  merge demog hpv;
  by id;
run;

Note that the dataset need to be sorted by ID before this can work. 

 

You will need to provide more information to get more detailed answer.   I suspect that you might need to do extra work to extract the analysis variables from your serology data, such as determining the value of CASE from the raw values in the HPV dataset.

 

If the source data is in XSLX files then the first step will be to convert the sheet(s) into the XLSX file into dataset(s).  You could use PROC IMPORT or just try using the XLSX engine on a LIBNAME statement.

CraigSmith
Obsidian | Level 7
Brilliant Tom - this is what I meant. Sorry for the delayed reply - all the best.

Craig
ballardw
Super User

One potentially critical item is to ensure that the values of the variables that may be in common in the two sets, such as the identification variables that would be needed to match values for combining the data sets after they are in SAS are the same type.

Showing pictures of XLSX files are not particularly helpful as SAS will want to work with SAS data sets. Depending on the contents of your XLSX files you may have to control how the data is brought into SAS as there are many XLSX things that make the automated tools like Import tasks or code result in different data types. Which means more work on your part before the combining can be performed and the details of things like the variable types and names are needed to provide more than generic descriptions of possible solutions.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 532 views
  • 2 likes
  • 4 in conversation