About EC27556

EC27556 · ‎05-10-2022

In other words this: 🙂

EC27556 · ‎05-10-2022

Sorry was meant to read: "I.e. the blank cells under scGroupA1 would be filled with scGroupA1 and the blank cell under scGroupA2 would read scGroupA2." So in the leftmost column (SC1) instead of scGroupA1 appearing once, it would appear in the blank cells underneath it too. And then under scGroupA2 the blank cell would also read as scGroupA2. I want the same thing done in SC2, SC3 and SC4 columns too with the scGroup values there. Is this possible?

EC27556 · ‎05-10-2022

Hi, Using this code: proc report data=Merged split='~'; column SC1 SC2 SC3 SC4 ap_month,(Probability); define ap_month / Across; define SC1 / Group; define SC2 / Group; define SC3 / Group; define SC4 / Group; define Probability / sum; format run; I have been able to create the following table: As you can see, there are missing cells under the SC Columns. I.e. under scGroupA1 there are blank cells. Is there any way of filling those blank cells? I.e. the blank cells under scGroupA2 would be filled with scGroupA1 and the blank space under scGroupA2 would read scGroupA2 etc. Thanks 🙂

EC27556 · ‎05-10-2022

Wow, legend, thank you!

EC27556 · ‎05-10-2022

I have date variables that are formatted dd/mm/yyyy. Is it possible to character variables where the values are equal to dd/mm/yyyy? (Context: I am trying to run a proc report and the date values are not showing up correctly (missing out the "/"s and each part is on a separate line. I'm hoping that by using character variables it might resolve this issue).

EC27556 · ‎05-09-2022

Never mind, I've worked it out! Using the output from the above proc tabulates you need to run the following code: data Merged; merge TableA(drop = _TYPE_ _PAGE_ _TABLE_) TableB(drop = _TYPE_ _PAGE_ _TABLE_); by GroupA GroupB GroupC GroupD Month; Probability=Count_Sum/One_Sum; run; proc report data=Merged; column GroupA GroupB GroupC GroupD month,(Probability); define Month / Across; define GroupA / Group; define GroupB / Group; define GroupC / Group; define GroupD / Group; define Probability / sum; run;

EC27556 · ‎05-09-2022

Thanks for this. Are you able to provide any example code? I have little experience with Proc Freq and Report!

EC27556 · ‎05-09-2022

Hi, I currently have 2 tables that I produce via proc tabulates using the below code: title 'TableA'; proc tabulate data = Example out = Numerator; class GroupA GroupB GroupC GroupD ap_month; var count one; table GroupA=''*GroupB=''*GroupC=''*GroupD=''*sum=''*(count*f=comma14.0),month /nocellmerge; run; title 'TableB'; proc tabulate data = Example out = Denominator; class GroupA GroupB GroupC GroupD month; var count one; table GroupA=''*GroupB=''*GroupC=''*GroupD=''*sum=''*(one*f=comma14.0),month /nocellmerge; run; The tables look like this: by dividing through one another you get probabilities and create a probability matrix. So what the table is basically saying is, if groupA=1 and GroupB,C,D=0 then probabilities in each month are the top line of the first table divided by the top line of bottom table. I would like to create some code that does this dividing to create the same table with probabilities without having to manually divide one table through another in excel. Does anyone know if this is possible? NB. the "count variable" represents whether someone has had a certain event (this is what the probabilities relate to - the probabilities of having this event) and the variable "one" is a constant variable that =1 for all observations in the data. If people are in different groups then the probability of having the event is different. Thanks

EC27556 · ‎01-26-2022

Yes, Ultimately, I have 100 datasets and would like the sample to always have 10% hit and keep all target incidences where possible. So I want something I can loop for all datasets. Unfortunately, sometimes I wont be able to use all of my target 'hit' observations because they already represent more than 10% of the aggregate datasets. in this case I would undersample the 'hits' to ensure I have 10% in the sample. For the most part though, the datasets at an aggregate level have less than 10% of data that has a hit for the target variable. I would like some code to oversample the target variables so I can create a sample with 10% observations that have a hit.

EC27556 · ‎01-26-2022

I have a 5m dataset. 95k have the target variable=1, the rest =0. I want to take a biased sample where I include all 95k cases and a selection of the =0 cases so the split will be 10% true and 90% false. Could anyone share some code to do this please? Thanks

EC27556 · ‎01-25-2022

Ok, thanks, so in order of nodes it would be - data source - sample - target profiler - tree? And how would the resulting tree look then? Say I had 1m observations in total and 10k had the event true (1 in 100). If I sampled so I had 90k where the event wasn't true (instead of 990k) and 10k where the event was true, how would the tree look? would the first node of the tree show 1=1% or 10%? Obviously I would like it to show 1% as that is the event proportion for the whole population.

EC27556 · ‎01-20-2022

I have datasets with 1 million observations and a mixture of variable types (i.e. categorical, interval etc.) Some datasets work great with decision trees - that is, where a larger proportion of data has the target variable "true" For example, my target variable is binary - 1 for true and 0 for false. In some cases, as few as 0.2% of cases have the target as true. When running DTs for these datasets, EMiner will not attempt to prune. How do I get around this issue? I want to be able to find the things that split the whole dataset - so if I sample 10,000, where 10% have the true target variable and 90% don't, although I will find a split, it will be biased toward my biased 10,000 sample... i.e. i want to be able to say that 100% of people in my 1m have the target variable true if they are blonde and have size 3 feet etc. Is it simply not possible to use decision trees when you have such a small proportion of data that have the target variable?

EC27556 · ‎01-18-2022

I have datasets of above 1m - where the number of observations where the target variable is "true" ranges from 20% to 0.1% When E Miner is constructing decision tree analysis, does it consider all 1m observations, or does it take a sample of the data when pruning? I'm slightly concerned that if E-miner is sampling data before conducting pruning activities then there is a significant chance that any splits will be biased if say very few of the 0.1% target are selected - in many cases where the % is very small (often <1%) e miner cannot produce a tree - is it possibly because it is not randomly selecting any of the 0.1% for example?. Linked to the above. Does anyone know what the optimal ratio of target 'hits' to 'non-hits' is with decision tree analysis? I.e. is around about 10% of your data having a hit for your target variable ok? I am considering of sampling my data before i conduct decision tree analysis so my data contains about 10% with the target variable true and 90% where it is not true.

EC27556 · ‎01-07-2022

Thanks, I actually created a range of 'test' macros using the old symput from the TEST.TEST data table: data _NULL_; set Test.Test nobs=TEST_TOT; call symput ('Test'||left(put(_n_,3.)),test); call symput ('TEST_TOT',TEST_TOT); run; if i were to use symputx would the test variables resolve correctly then? Have also worked out i can put in a %trim function to fix too!

EC27556 · ‎01-07-2022

See the below, I have a macro that exists that equals "1". Therefore, when I run the below I expect to see test="alpha1" in the created dataset. Instead I get "alpha 1". Why is this, and is there any way to get rid of these blank spaces between alpha and 1? data test; test="alpha&test1"; run;

Online Status	Offline
Date Last Visited	‎05-17-2022 01:27 PM

Re: How to create a new column in Proc Report?

How to create a new column in Proc Report?

How to create a macro using SYMPUTX if the name of the macro you want ...

Re: How to get rid of empty box when running Proc Report?

How to get rid of empty box when running Proc Report?

Re: How to get proc report to format missing values and correctly orde...

Re: How to get proc report to format missing values and correctly orde...

Re: How to get proc report to format missing values and correctly orde...

How to get proc report to format missing values and correctly order da...

Re: How to get proc tabulate to output a missing value?

Re: How to create a new column in Proc Report?

Re: How to create a new column in Proc Report?

Re: How to get rid of empty box when running Proc Report?

Re: How to get proc report to format missing values and correctly orde...

Re: How to get proc report to format missing values and correctly orde...

Re: How to merge appended values?

Re: How to suppress blank cells when using Proc Report?

Re: How to suppress blank cells when using Proc Report?

How to suppress blank cells when using Proc Report?

Re: How to convert a formatted SAS Date variable to a character variab...

How to convert a formatted SAS Date variable to a character variable?

Re: how to divide two proc tabulate tables through each other without ...

Re: how to divide two proc tabulate tables through each other without ...

how to divide two proc tabulate tables through each other without usin...

Re: How to sample so sample includes 10% target variable = true and 90...

How to sample so sample includes 10% target variable = true and 90% ta...

Re: EMINER Decision Tree Analysis when only a SMALL proportion of data...

EMINER Decision Tree Analysis when only a SMALL proportion of dataset ...

Does E Miner take a sample of data when constructing decision trees wi...

Re: why are my macros resolving with blank spaces preceding them?

why are my macros resolving with blank spaces preceding them?