Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
If you have a numeric variable that want as the suffix you are most of the way there.
Try:
proc transpose data=work.all_doses out=work.all_doses_trans (drop=_name_) prefix=D ; by id; id dose_yr; var dose_value; run;
That will start the value with the letter D and append the dose_yr.
I seldom worry about the actual order of values in a data set and use a reporting procedure to write the values to something like Excel as needed. But in this case, assuming all of your years are present in the data you can use something like
data work.all_doses_revised; length d1937-d2022 8; set work.all_doses_trans; run;
The list will create all of the years even if some years are missing in the data. Which would mean one or more variables with all missing values. If that is not desirable you could use sublists on the length statement such as if 1957 for some reason had no values
length d1937-d1956 d1958-d2022 8;
Hi,
Thanks to your help i've managed to get all the issues with the ICD codes in my dataset sorted and i've now got a basic file to test the Poisson regression risk modelling with - which is really really good.
I'm now moving on to dealing with some issues regarding the data quality and i would appreciate your suggestions as to how to tackle them as what i've learned so far doesn;t seem to give me any ideas.
Each of my 179Kish people have at least one annual dose measurement - sometimes many years of doses (these are radiation doses) with employers who participate in the study.
However, an individual may also have received doses with other employers not in the study prior to being employed by a participating employer.
For those people I don't know their annual doses prior to joing the study only the total dose received up until the year they joined the study - I call this the transfer dose.
below is an example of a typical person with a transfer dose - i have placed the transfer dose in the year prior to the first 'proper' dose and made it negative to identify it as such a dose.
e.g.
ID year Dose
8 1975 -11.0
8 1976 0.24
8 1977 0.57
8 1978 0.24
8 1979 0.13
8 1980 0.24
8 1981 0.13
8 1982 0.9
8 1983 0.79
8 1984 0.02
The problem is that if i use this persons data as it is then their initial dose rate is way to high and this will mess up the modelling.
What i want to do is create some years of simulated doses to better approximate what i expect this persons doses prior to 1976 to have been.
e.g.
year dose
1964 1.0
1965 1.0
1966 1.0
---
1975 1.0 - this being the year the trasfer dose was recorded
so that the simulated doses sum to the transfer dose value.
I have several criteria to use to decide how to spread the transfer dose over an appropriate number of years with appropriate values.
The main one is to calculate the average of the 'real' doses and then divide the transfer dose by that value to get a number of years (n). I would then record the average dose in the n years prior to the first real dose.
However, i do need to check if this means assigning simulated doses to years when they were less than 18 years of age - which would not be plausible and i would exclude the person.
i have created a table which i hope can be the basis of the calculations i need called work.correct_doses - see attached
Here is what think i want/need to do:
1) work through the table by ID until I find someone with a transfer dose - i.e. a negative dose value
2) read all their dose information - which is a variable number of lines.
3) calculate the average of the 'real' dose information
4) calculate how many year (n) i need to accumulate the transfer dose at the average real dose rate.
5) check if that number of years would take start of exposure before age 18
ie. is 'age at 1st real dose - n' < 18
6) if yes - mark the ID for exclusion
7) if no - create the required number of extra dose years and add them to the 'real' years of doses in the table.
😎 return to point 1 until end of file.
There are just over 7K poeple with transfer doses and the full dose table is 1.7million lines - so its a bit of a big job. i attach a smaller sample of the data!
So any suggestions are very welcomed - if only i could do this in fortran it would be done by tea time! SAS is so much more complicated.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.