BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CJ71
Calcite | Level 5

I need help appending the participant ID to the sas7bdat file name.

 

I am working with an ongoing study expected to include a final dataset of n=3000.  We need to convert per-participant hierarchical data from a long format to wide format (SAS code for a single participant complete), and then eventually concatenate all n=3000 per-participant wide-format files into a single large data file.  All variables across participants will be identical.

 

In saving these n=3000 per-participant, wide-format .sas7bdat files, I would like to be able to affix the participant ID to the end of the file so as to easily support concatenation in the future.  My goal is to save all per-participant, wide -format .sas7bdat files in the same folder, and append the participant ID to the file name to discriminate across the n=3000 files.  Example below for Participant_ID numbers 0001, 0002, 0003 . . . 3000:

 

I would greatly appreciate any recommendations for how to best solve the issue.

 

Goal:

C:\Users\folder\exam data\wideformat_0001.sas7bdat

C:\Users\folder\exam data\wideformat_0002.sas7bdat

C:\Users\folder\exam data\wideformat_0003.sas7bdat

C:\Users\folder\exam data\wideformat_3000.sas7bdat

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
%macro add_ID(lib=, dsn=, id_var=);

data _null_;
set &lib..&dsn. (obs=1);
call symputx("id_name", &id_var);
run;

proc datasets lib=&lib nodetails nolist;
change &dsn. =&dsn._&id_name;
run;
quit;

%mend;

data class1;
set sashelp.class;
where name = 'Alfred';
run;

options mprint symbolgen;
%add_id(lib=work, dsn=class1, id_var = name);

Use CALL EXECUTE to call it for every data set in your library.

View solution in original post

13 REPLIES 13
Reeza
Super User

I am working with an ongoing study expected to include a final dataset of n=3000. We need to convert per-participant hierarchical data from a long format to wide format (SAS code for a single participant complete), and then eventually concatenate all n=3000 per-participant wide-format files into a single large data file. All variables across participants will be identical.

Can you show that code? This seems like a very cumbersome process and modifying that code to work for everyone is likely easier.

CJ71
Calcite | Level 5

Thanks for the prompt response.  

 

My code below is modified based on a solution I found online to append today's date to the end of a file name (link listed below; others solutions related to dates also posted, but not to general variable names). 

https://stackoverflow.com/questions/49520782/creating-a-filename-with-todays-date-in-sas

 

%let filePath = "X:\Users\folder\exam data\wideformat(%sysfunc((PATIENT_ID)).SAS7bdat";
%put &=filepath;


data _null_;
call symputx('PATIENT_ID', PATIENT_ID);
run;

%let filePath = "X:\Users\folder\exam data\wideformat(&PATIENT_ID.).SAS7bdat";

data _null_;
put &filePath;
run;

 

Any suggestions are much appreciated!

Reeza
Super User
Let's back up a bit. Show us what you're starting with - ie 3000 individual SAS data sets with the name etc and IDs in a separate file, or variable in the data set? Or is it a single file you want to split into 3000 files while processing?

Then what do you want as your final output?

In SAS, splitting your data like this is unusual and rarely needed. By group processing can handle many different situations so your method could be simplified.
CJ71
Calcite | Level 5

I understand your concerns.  I realize database structure is incredibly important; however, my knowledge in this area is not as deep as I would like.

 

Data is clinical data from imaging equipment.  Per-participant data files are downloaded directly from imaging equipment as individual files where the filename is the participant ID (XML files; XML Mapper used to convert to SAS), and the “Patient ID” variable is included within the file.  Each per-participant file includes multiple different “scan types”, with repeated measures per scan (e.g., 25-30).  Some of the different “scan types” within a per-participant file have identical variable names.  XML Mapper/SAS produces around 25-30 separate files per participant (varies based on specific type of image acquired).  Code including basic XML conversion to SAS, and long to wide-format, is a couple hundred lines per image type acquired (overall, study image acquisition includes multiple image types, each image type with these same issue).  Each final per-participant file (wide format) includes approximately 150 variables, and no longer includes the participant ID in the file name (file name instead represents the general image type acquired).  I was hoping to append the “Patient ID” to each of these final file names, and then concatenate into a single database.

 

The ultimate goal for final output is a dataset with n=3000 and the roughly 150 variables per observation (this would be repeated for each image type acquired within the study). 

 

XML Mapper processes individual files, but I believe based on what I have seen online that I may be able to process and append multiple per-participant XML files together using XML Mapper and SAS macros; generally, I have some code completed to do this (some issues with folder structure exist, so presently incomplete).    

 

I am still trying to determine the best time to bring all individual per-participant files together (pre- versus post-XML and SAS long-to-wide processing), so really welcome your comments.

 

Again, thanks for your help.

Tom
Super User Tom
Super User

Sounds like you want to start with a list of XML filenames.  Pull the ID from the name and run the actual file through your current process to convert the text in the semi-structured XML file into an actual rectangular dataset.  Then combine the those datasets.

So start with the list of files. something like:

data filelist;
   infile 'ls /myfolder/*.xml" pipe truncover ;
   input filename $256.;
   id = scan(filename,-2,'._');
run;

Create a macro from your current code that takes as input the filename, the id and the name of the dataset to finally generate.  Make sure it creates the exact same structure for the dataset no matter what content is in the XML file.  Make sure it include the ID value in one of the variables.

%macro readonexml(filename=,id=,out=);
.....
%mend;

Then use the list of files to generate the code to read the files and append the results to final dataset.

filename code temp;
data _null_;
  set filelist ;
  file code;
  put '%readonexml(' filename= :$quote. ',' id= ',out=next)' ;
  put 'proc append data=next base=all force; run;' ;
  put 'proc delete data=next; run;';
run;

Then run the generated code

proc delete data=all; run;
%include code / source2;

 Now the ALL dataset should have the data from all of  the files.

CJ71
Calcite | Level 5

I appreciate the response, Tom.

 

While I've worked with SAS for more than 15 yrs and usually fairly resourceful with online solutions, I have done little with macros -- so please forgive me.  This is stretching my knowledge! 🙂 

 

My original intent was to rename a .sas7bdat file to include the participant ID (i.e., "wideformat.sas7bdat"  to "wideformat_0001.sas7bdat"), and then concatenate together.  I anticipated I would rename each .sas7bdat file when completing the transition from XML file to rectangular file, so that all files could be stored in a single folder, and when any necessary data cleaning is complete for each file (e.g., one site sent an incomplete file with poor images/incomplete data, but has another file locally with complete data that can be resent), they could be concatenated into a single locked file for analysis.

 

But if I modify your code that uses the original XML files, I don't believe I am obtaining the file names with the participant ID inserted, but rather just retaining the ID (see output below; original XML file names (filenames) were C1000006 and C1000555).  I'm still trying to determine how I can perhaps use your coding suggestions below for my needs described above, so appreciate any further insight you have.

 

Obs       filename            id 
1         C1000006.xml     C1000006 
2         C1000555.xml     C1000555 

Here is my code (minimal changes -- only to include my X:\ file directory):
data filelist;

infile 'dir "X:\XML Data\SAS Trial XML files\*.xml" /b' pipe truncover ;
input filename $256.;
id = scan(filename,-2,'._');
run;
proc print; run;


%macro readonexml(filename=,id=,out=);
.....
%mend;


filename code temp;
data _null_;
set filelist ;
file code;
put '%readonexml(' filename= :$quote. ',' id= ',out=next)' ;
put 'proc append data=next base=all force; run;' ;
put 'proc delete data=next; run;';
run;

proc print; run;

Thanks again.

 
Kurt_Bremser
Super User

So what you want:

The ID derived from the XML file name should be part of the observation(s) read from that XML file, so in the final dataset you know which obs came from which patient.

 

Is that true?

 

If yes, import the XML to a temp dataset and add a final data step in your macro:

data &out.;
set temp;
length id $8;
id = "&id.";
run;

 

CJ71
Calcite | Level 5

Actually, I was hoping to include the ID variable in the final name of the .sas7bdat file eventually generated by the conversion from XML Mapper to rectangular SAS dataset.  That way I can look to each of the multiple .sas7bdat files and know which one belongs to each participant (and, they can all be stored in one common folder, as opposed to having to store each in a separate pt-specific folder). 

 

I already have the ID variable in the converted SAS rectangular datasets, just also wanting in the .sas7bdat file name, too.


Thank you, again.

Tom
Super User Tom
Super User

What part exactly are you having trouble with?

 

If you know the ID in advance then you can use it to both generate the name of the XML file to read and the SAS dataset to generate.  So your code might looks something like this.

%let id=60001;
....
"/mypath/myprefix&id..xml"
...
data mylib.myprefix_&id ;
   ...
run;
CJ71
Calcite | Level 5

Thank you!  I will try later this afternoon/evening when I have a chance and report back!

CJ71
Calcite | Level 5

Thank you -- this is very helpful.  My problem is not completely solved, but this overcomes some of stumbling blocks.

Reeza
Super User
%macro add_ID(lib=, dsn=, id_var=);

data _null_;
set &lib..&dsn. (obs=1);
call symputx("id_name", &id_var);
run;

proc datasets lib=&lib nodetails nolist;
change &dsn. =&dsn._&id_name;
run;
quit;

%mend;

data class1;
set sashelp.class;
where name = 'Alfred';
run;

options mprint symbolgen;
%add_id(lib=work, dsn=class1, id_var = name);

Use CALL EXECUTE to call it for every data set in your library.

CJ71
Calcite | Level 5

This macro and use of Proc Datasets with change statement help substantially.  I very much appreciate the clues towards Proc Datasets and the mprint, symblogen options. 

Thank you!

C

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 970 views
  • 2 likes
  • 4 in conversation