BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sabataged
Obsidian | Level 7
Hi Tom, I just noticed the code you posted earlier! I was trying to make the process faster by using the "LET" statement, but that might be what is causing the issue. Yes, I do have a "center" variable, and the first thing I do is write a data step that keeps only the observations from those centers. Anyhow, I will try and incorporate the code that you posted now. Thank you very much!
ballardw
Super User

How do you bring the data set into SAS?

 

If the file structure does not change then using a data step with appropriate IF statement to keep the records for the centers you are interested in and a KEEP or DROP statement to only have the variables you want might be the best approach in the long run. It sounds like you already know which variables you want to keep so should not be too hard.

 

I have a project that shares mostly common variables with some other organizations which have a few fields my data does not need. DROP statement takes care of that just fine.

 

 

sabataged
Obsidian | Level 7
I use proc import to bring in an Excel file. Unfortunately, the data structure changes each month! They keep adding/deleting variables, so I have no idea of knowing what to look for each time. There are a few core variables that are supposed to always stay the same, but they change the name of these variables sometimes. For example, in January, they named the gender column as "Gender" and in February they decided to change it to "Sex" and in March it became "GenderOfPatient". We have asked the centers to stay consistent, but not all of them listen. 😞
ballardw
Super User

@sabataged wrote:
I use proc import to bring in an Excel file. Unfortunately, the data structure changes each month! They keep adding/deleting variables, so I have no idea of knowing what to look for each time. There are a few core variables that are supposed to always stay the same, but they change the name of these variables sometimes. For example, in January, they named the gender column as "Gender" and in February they decided to change it to "Sex" and in March it became "GenderOfPatient". We have asked the centers to stay consistent, but not all of them listen. 😞

This sounds like it might be a case of getting management involved.

Any time data is to be shared between organizations, even parts of a single organization, it helps everyone if an actual formal agreement is made as to content, order and values of variables is done. That way you head off these headaches and can develop a process that is 1) reliable and 2) likely to save time and money.

 

I used to work for a company that did some contract work involving another companies data. When I got involved with that project I had to respond to the clients question about why there was a programming charge almost every month. When I explained that we had been receiving the data in different column order and that we had to modify the data to work for the project the client realized his budget was negatively affected because someone in his organization had not developed a standard approach. They fixed that and cut the cost of the project by about 5%. (And my programmer thanked me for not having to deal with the stupidity involved).

 

Document how much time it takes to fix this one time. Get a cost estimate from that time. Go to management with that information and frequency and you may see movement quickly.

 

Relying on the combination of Excel and Proc Import in production work is a recipe for multiple headaches.

Since proc import has to guess as to the field size and type after examining a very few rows of data you can have variables change lengths and type from month to month. Which means when those periodic data sets get combined for any reason you can have issues with 1) not combining at all due to errors if a field changes from character to numeric and 2) loss of data for character values due to varying lengths.

 

Save to CSV, use a data step to read data is much more predictable.

And in your case, the selection of data for specific centers and variables would stay the same.

 

I deal with anywhere from 10 to 30 Excel files per month. I save them to CSV format and use data steps to import them so the variable names, types, lengths and formats stay the same. That way when I combine the data across sites/ organizations I don't have issues related to different lengths and the process goes pretty smooth.

Plus if a field is supposed to be numeric and some idiot has entered text in the Excel file I get invalid data notices in the log I can use to back track to a source if needed.

sabataged
Obsidian | Level 7
I use proc import to bring in an Excel file. Unfortunately, the data structure changes each month! They keep adding/deleting variables, so I have no idea of knowing what to look for each time. There are a few core variables that are supposed to always stay the same, but they change the name of these variables sometimes. For example, in January, they named the gender column as "Gender" and in February they decided to change it to "Sex" and in March it became "GenderOfPatient". We have asked the centers to stay consistent, but not all of them listen.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 6398 views
  • 4 likes
  • 5 in conversation