I am working on a secure server (which means I can not copy paste anything from it) with several large datasets that I have merged into one. for the current study, I need to use an ID number called pnr, a data and price from four different databases and a population number from 1-6. The aim is to create a total price for each ID and do statistical analyses on that. BUT I have encountered a problem. I have merged my datasets by the ID number and now each Id number has more that 20 lines each. (in a dataset with more than 18 000 ID numbers it makes it impossible to overview). I have tried many different ways of making it broad instead of long so that I can make a total variable but I have not succeeded. Hereunder I will try to go through the steps i have tried and done: data all; merge LMDB DRG SSSY DTSDCCG UDDA; by pnr; run; that creates a very long dataset with several million observations. Then I tried to do a proc sort step like this PROC SORT DATA=all DUPOUT=allnodup NODUPRECS ; BY pnr ; RUN ; proc print allnodup; run; which ended up running for more than a day using a lot of capacity unsuccessfully. then I tried this proc sql ; create table allnodup as select DISTINCT (pnr), price1, date1, price2, date2, price 3, date3 from all order by pnr ; quit ; which did eliminate some so then I did continue with with a proc transpose data=allnodup out=allw prefix=price1; by pnr; id date1; run; proc print data=allw; run; this gave we a lot errors in the log saying the "Id value...... occures twice" and finaly "to many bad ID values" I even tried to make to categories manually in a data step, defining payments/month but it seems as if there are seceral that have more price values every month. Hereunder I tried to create an example of how it looks: data all; infile datalines dsd truncover; input PNR:$13. Price1:8. date1:mmddyys10. price2:8. date2:mmddyys10. price3:8. date3:mmddyys10. population:8. ; datalines4; 887766,1500, 01012014,45000,01012014,55000,01052018,1 887766, 1703,03042015, 45000, 01012014, 55000,01052018,1 887766, 1703,03042015, 45000, 02012014, 55000,01052018,1 887766, 1703,03042015, 45000, 01012014, 55000,01052018,1 887766, 1703,03042015, 45000, 01012014, 5701,01052018,1 887755,3200, 04022018,245761,03042017,7001,02032016,2 887755,8700, 04022018,245761,03042017,7001,02032016,2 887755,0, 04022018,245761,03042017,7001,02032016,2 887755,3300, 04022018,245761,03042017,7001,02032016,2 887755,3200, 04022018,241,03052017,7001,02032016,2 887755,3200, 04022018,245761,03042017,0,02032017,2 887744,9763,01012014,99838,01032018,5056,10252018,5 887744,9700,01012015,99838,01032018,5056,10252018,5 887744,9763,01012014,99858,01032018,5056,10252018,5 887744,9763,01012014,55,01072016,5056,10252018,5 887733,3387,07162015,45000,02032016,30050,12012014,3 887733,3377,07162015,45000,02032016,30050,12012014,3 887733,3377,07162015,789,02032017,30050,12012014,3 887733,3377,07162015,45000,02032016,30050,12012014,3 887733,3377,07162015,1700,12052016,30050,12012014,3 887733,3377,07162015,45000,02032016,308,12011014,3 887733,1500,07162015,45000,02032016,30050,12012014,3 887733,1500,07192015,45000,02032016,30050,12012014,3 887722,4352,07152017,0,03042018,47531,11222014,6 887722,4372,07162017,0,03042018,47531,11222014,6 887722,5678,07152017,0,03042018,47531,11222014,6 887722,4352,07152017,750,03052018,47531,11222014,6 887722,4352,07152017,0,03042018,4700,11252014,6 ;;;; title "All"; proc print; run; Obs PNR Price1 date1 price2 date2 price3 date3 population 1 887766 1500 1012014 45000 1012014 55000 1052018 1 2 887766 1703 3042015 45000 1012014 55000 1052018 1 3 887766 1703 3042015 45000 2012014 55000 1052018 1 4 887766 1703 3042015 45000 1012014 55000 1052018 1 5 887766 1703 3042015 45000 1012014 5701 1052018 1 6 887755 3200 4022018 245761 3042017 7001 2032016 2 7 887755 8700 4022018 245761 3042017 7001 2032016 2 8 887755 0 4022018 245761 3042017 7001 2032016 2 9 887755 3300 4022018 245761 3042017 7001 2032016 2 10 887755 3200 4022018 241 3052017 7001 2032016 2 11 887755 3200 4022018 245761 3042017 0 2032017 2 12 887744 9763 1012014 99838 1032018 5056 10252018 5 13 887744 9700 1012015 99838 1032018 5056 10252018 5 14 887744 9763 1012014 99858 1032018 5056 10252018 5 15 887744 9763 1012014 55 1072016 5056 10252018 5 16 887733 3387 7162015 45000 2032016 30050 12012014 3 17 887733 3377 7162015 45000 2032016 30050 12012014 3 18 887733 3377 7162015 789 2032017 30050 12012014 3 19 887733 3377 7162015 45000 2032016 30050 12012014 3 20 887733 3377 7162015 1700 12052016 30050 12012014 3 21 887733 3377 7162015 45000 2032016 308 12011014 3 22 887733 1500 7162015 45000 2032016 30050 12012014 3 23 887733 1500 7192015 45000 2032016 30050 12012014 3 24 887722 4352 7152017 0 3042018 47531 11222014 6 25 887722 4372 7162017 0 3042018 47531 11222014 6 26 887722 5678 7152017 0 3042018 47531 11222014 6 27 887722 4352 7152017 750 3052018 47531 11222014 6 28 887722 4352 7152017 0 3042018 4700 11252014 6 pnr: represents a social security number for a person but made with a secret key costs are all in DKK dates are all in date format except the one I did not enter hereover that is month and year format, the population is a group defined elsewhere there are 6 different groups. So my aim is to create a total cost per person first per year and then overall and then compare between the populations. So I am now stranded in a situation where I am not sure what I can do? Maybe someone here can help me.
... View more