Hi,
I have a dataset (n=100,000+) with personID, gender, smoking status (Y/N/null), and beginyear and endyear of the period the smokers smoked. Am trying to structure the file so that i can do a frequency distribution on smoking status for every year between 1940 and 2009, and i'm thinking the easiest way to do that is to create a column for each year (1940, 1941...2009).
For each personID, i want to indicate the years they smoked using these rules: if smoking status is null, then the value for each year column would be null. If smoking status is 'N', then the value for each year column would be 'N'. If smoking status is Y, then for the years between beginyear and endyear (inclusive), the value for each year column would be 'Y', all the years before beginyear would be 'N', and all the years after endyear would be null. I'd like to run the frequency for all records, and also separately by gender. Can anyone help with code for this, or have a better approach to essentially produce a graph of smoking prevalence for each year between 1940-2009? Thanks in advance..
Do NOT create a column for each year. Especially if you want to graph distribution. Better would be to create one record for each year of smoking history. Then Year is available as an axis variable and you can display frequency or barcharts or box plots for each year. Which really is a pain to do if you have to reference 70 different variables.
Please do not post "data" in Excel files. Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
And what kind of graph are you envisioning?
i'm sorry but have not able to get the macro to work, and am unable to attach a sample sasdb. am hoping someone will be able to respond based on the table below? it's a fairly simple dataset..
For data presentation, can either be a table with % of smokers or line graph with the same..thank you..
personID | gender | smoke | begin | end |
8005 | M | |||
6305 | F | |||
7767 | M | |||
1435 | M | N | ||
8516 | M | Y | 1948 | 1966 |
9481 | M | |||
1659 | F | |||
2812 | M | |||
9282 | F | |||
9910 | M | |||
6619 | M | Y | 1963 | 1988 |
8120 | F | |||
5549 | M | |||
5567 | F | |||
7987 | F | |||
5012 | F | |||
7110 | M |
So how do expect to include the PersonID that have neither Y or N for the smoke variable?
What years to expect the N for smoke variable to represent? If we treat all of them as years 1940 to 2009 we may well be including years that individual was not alive. Or including smoking in the cradle...
If your example data is representative of your full 100,000+ records you appear to have years of information for about 11% of the records.
If you had a more complete data set I might do something like this to get one record per year:
data example; input personid $ sex $ smoke $ startyear endyear; datalines; 1 F Y 1960 1969 1 F N 1970 1995 2 M Y 1945 1986 3 M Y 1954 1973 3 M N 1974 1998 4 F Y 1958 1988 5 M Y 1961 1989 ; run; data want; set example; do year=startyear to endyear; output; end; drop startyear endyear; run; proc freq data=want; tables year*smoke; run;
Which would allow a number of different types of analysis. You could also get some graphs with data summarized on year for counts or percentages of interest.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.