Hello All,
I have a task to create a working population dataset. My working dataset A has row level information by age, race, sex, and county. I need to count records by these variables and then add required records from dataset B to have full population. Please offer any guidance or suggestion on how to do this. Thank you!
I am attaching the example in the spreadsheet here.
Sources | Total number | Category |
Census | 100 | Gender: Female; Age: 20; Race: W; County: Lancaster |
Working dataset | 67 | Gender: Female; Age: 20; Race: W; County: Lancaster |
Empty records need | 33 | Gender: Female; Age: 20; Race: W; County: Lancaster |
Census | 95 | Gender: Male; Age: 57; Race: B; County: Adams |
Working dataset | 50 | Gender: Male; Age: 57; Race: B; County: Adams |
Empty records need | 45 | Gender: Male; Age: 57; Race: B; County: Adams |
Most of us refuse to use data in Excel, because downloading Microsoft Office documents is a security risk.
Please provide a portion of your data as working SAS data step code (instructions), and not in other formats including Excel or screen captures.
Thank you for the response. Below is the structure I need to have from working
Sources | Total number | Category |
Census | 100 | Gender: Female; Age: 20; Race: W; County: Lancaster |
Working dataset | 67 | Gender: Female; Age: 20; Race: W; County: Lancaster |
Empty records need | 33 | Gender: Female; Age: 20; Race: W; County: Lancaster |
Census | 95 | Gender: Male; Age: 57; Race: B; County: Adams |
Working dataset | 50 | Gender: Male; Age: 57; Race: B; County: Adams |
Empty records need | 45 | Gender: Male; Age: 57; Race: B; County: Adams |
I have a task to create a working population dataset. My working dataset A has row level information by age, race, sex, and county. I need to count records by these variables and then add required records from dataset B to have full population.
What do you mean by "add required records from data set B"? You don't present anything that looks like data set B. Add to what? Where do "empty records" enter into this, that isn't explained either.
Also, repeating my request with emphasis
Please provide a portion of your data as working SAS data step code (instructions), and not in other formats including Excel or screen captures.
My apologies for not being clear, Paige!
data HaveA
datalines ID age gender race ethnicity county
1 12 M AmericanIndian Non-Hispanic Lancaster
2 26 M UNK Hispanic Douglas
3 34 M UNK UNK Sarpy
4 41 F Asian Non-Hispanic Douglas
data Have B is census dataset and has complete age, race , se, county, ethnicity information.
So if the count of combination of Age = 12, Gender = M, Race = AmericanIndian, Ethnicity = Non-hispanic, county = Lancaster is 60 and for the same combination in census file (HaveB) we have 100, I need to add 40 records with this combination to balance.
Lets say I have 1.4 million of these records. I need to count these by each combination type and then fill or add empty records or records from census population with complete population information. So, I would be completing 1.4 million from HaveA to 1.9 million from HaveB by using counts from these combinations. I hope I make some sense here. Thank you again.
Hello @SP01
It appears you are trying to join two datasets using common variable.
Consider using data step merge using common by variables.
Alternatively consider using Proc SQL join.
What does data set B look like? You only provided half, it not less, of the problem description.
I actually don't understand what your final result is supposed to look like and/or any rules that would be used to make it.
Note: in the long run you will find any data that places more than one value into a variable is extremely awkward to work with. So those Category values should likely be split into 4 (or possibly more, hard to tell from partial examples) variable, one each for Gender, Age, Race and County.
I don't understand what you mean. Why would need to create observations?
If you have a count of 65 and another count of 100 to figure out that the difference is 35 you just need to subtract.
proc sql;
create table want as
select category
, a.count as CENSUS
, b.count as PRESENT
, a.count - b.count as NOT_PRESENT
from census a
inner join present b
on a.category = b.category
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.