About jaredp

jaredp · ‎11-25-2013

data _null_; package_no=1234; package_char = put(package_no, z7.); put package_char; run;

jaredp · ‎11-04-2013

Could it also be that SAS is trying to convert the non-blank values in SIC?

jaredp · ‎11-04-2013

Not having SAS at home means working on this challenge would mean I'd have to work on the weekend.

jaredp · ‎10-30-2013

I'm barely familiar with what you are doing above. I haven't got into the rule building stuff yet. But I did notice, in regards to your first question, that perhaps "call" is different from "Call" in the Rule column? Another guess I have is that one is a verb and the other is not? I can't answer your second question, but I am interested in the answer. My gut feeling is that order shouldn't matter. If it did, then I'd think that the rules are not fine tuned enough? But perhaps the order of rules is a rule in and of itself.... Good questions...thanks for asking them.

jaredp · ‎10-30-2013

Just thought I'd add some thoughts to what a few posters have already touched on. One issue I see is that your example data is not a properly formatted CSV file. When I create CSV files, I encapsulate my free text fields with double quotations "like this". This tells most software that everything between the quotes is part of one column, regardless of what it contains. The only issue is when your free text field itself contains double quotes. This will cause software to cut off your data prematurely. One option is to change how the source data is generated. Instead of using a comma, use a set of characters as your delimiter which are less likely to appear in your free text field, such as #*#*# . You then split your data based on that delimeter as opposed to splitting on a comma. You could also try an approach similar to pradeepalankar . I personally like this approach. Get your first two and last two fields. Everything else is your third field. It sounds like you have flexibility for what your final data set will look like. With the code already suggested, you easily aim for this data set: Field1 Field2 Field3 Field4 Field5 001 12A cards UK 001,12A, Tues to Friday John rotates with Team, Perm 10800-1645 Sat Perm 8am starts Tue to Sat as of 22/02, cards, UK 002 12B HL UK 002,12B, Mon to Wed Marry rotates with Team, Perm 0800-1645 Sat Perm 8am starts Tue to Sat as of 22/02 Works in shift, HL, UK 003 12c HL UK 003,12c, Sat&Sun Paul rotates with Team, Perm 19000-1645 Sat,HL, UK 004 12D CC UK 004,12D, All day Joe rotates with PL Team, Perm 10800-1645 Sat 8am starts Tue to Sat as of 24/02 Works in shift, CC, UK

jaredp · ‎10-30-2013

Because you already know data mining principles, it shouldn't be too hard to pick up a new tool like SAS Enterprise Miner. I think Michele answers your question #1. As for question #2, organizations which have a lot of data, customers, or clients are looking for skills like the ones you have. They want insights from their customer data for a variety of reasons. One job title I see advertised which asks for data mining skills is called "business analyst". "Data scientist" is another job title that is appearing lately. I'd search job sites using those terms, plus ones like statistics, modelling, algorithm, predictive, analytics, data, machine learning, analysis, business intelligence, competitive intelligence... If you know the type of industry you want to work in, then obviously limit your search to those companies. To name a few organizations, I'm thinking any company with a chain of stores, banks, companies which move a lot of product on their ecommerce site, casinos, automotive companies, insurance companies. There are even some sophisticated non-profit companies or environmental groups you could work for. There are also many consulting firms which you could work for. One thing you might want to do is attend the SAS Global Forum and network, network, network. Get to know people who go there. Those connections you make will help you in your job search. Joining LinkedIn may also help you.

jaredp · ‎10-30-2013

I came across a decent blog post about oversampling. Perhaps it contains some answers you seek. http://www.data-mining-blog.com/tips-and-tutorials/overrepresentation-oversampling/ Also check out any links within the post.

jaredp · ‎08-20-2013

Hmmm..using the first 5 terms... One question that comes to mind is What if there is a shift in the use of one term for another, but they are synonyms? The approach might work with a growing synonym list? But this is no longer unsupervised. You can run, in tandem, the Text Topic and Text Cluster nodes. This will give you your Topics as well as generated SVD values. I'm not an expert with Singular Value Decomposition (SVD), but I have a strong sense that if you want to measure changes in your corpus over time, then a solution might be to use the SVD values (i.e., TextCluster_SVD1, TextCluster_SVD2...TextCluster_SVDn). This paper might have some similarities to what you want to do: http://www.scsug.org/SCSUGProceedings/2009/Liang_Xie1.pdf One can brush up on SVD here: http://ftp.sas.com/techsup/download/EMiner/TamingTextwiththeSVD.pdf and some nice insight here too: http://www.ling.ohio-state.edu/~kbaker/pubs/Singular_Value_Decomposition_Tutorial.pdf I'd love if you kept us informed about any solutions you apply.

jaredp · ‎08-19-2013

I just solved my own question just now. Will it count to mark this as the right answer? I went into the directory where SAS SA Workbench is installed. There is a "test_documents" folder with an example corpus. It looks like the corpus needs to be a zipped folder of XML files. Each document has the following format: <doc> <docid><![CDATA[filename .xml without extension]]></docid> <title><![CDATA[subject title here]]></title> <createtime><![CDATA[10/6/2008 10:00:00 AM]]></createtime> <body><![CDATA[blah blah blah yadda yadda yadda text text text]]></body> </doc> What sucks is that the SAS sentiment tools don't appear to build my corpus for me (unless I am missing something?). Instead, I have to joys of converting all of my text files into xml files with this format. I did manually change 5 of my .txt to .xml with the above xml structure. I was able to upload this successfully.

jaredp · ‎08-19-2013

I've installed the various Sentiment Analysis tools (studio, server and workbench). I've already created my training corpus and created a Statistical Model in studio. I've uploaded the model to the server. I am now creating a new project in Workbench. There is a tab where I specify my corpus and upload it. The upload fails every time with the error "Unable to upload file". The file I am uploading is a zipped folder of text files. Here are my guesses as to what may be happening: 1) the file is being uploaded to a folder which I (i.e. the web server or workbench user) may not have permissions to access. But what folder would that be? 2) perhaps the folder is not uploaded, but the contents read and placed into the MySQL database? 3) the file format is incorrect. I also tried zipping only the text documents. That did not work. Perhaps the formats of the files themselves are not acceptable. I have no clue how to proceed. Any suggestions are appreciated.

jaredp · ‎08-13-2013

Thank you so much. That did the trick! I was originally trying to work with a Do loop and macro, but then came across the Filevar option. For some reason, I couldn't quite understand if filevar required a path or just the file name. Thanks for clarifying it! Jared

jaredp · ‎08-13-2013

Sorry, my put statement should read "put paragraph"....

jaredp · ‎08-13-2013

I'm hoping someone can help me out. I need to export a single variable from each observation into it's own unique text file whose filename is comprised of 2 variables. My data is a bunch of character variables like the one below. I've simplified it a lot. In my real dataset, the paragraph variable is actually a bunch of text. data have; input id $ paragraph $ group $; datalines; 1 cA1 A 2 cA2 A 3 cA3 A 1 cB1 B 2 cB2 B 3 cB3 B 1 cC1 C 2 cC2 C 3 cC3 C ; run; The combination of the variables ID and GROUP are unique. I'd like this to be the resulting text filename. The content of each file is the paragraph variable. In the above dataset, I should end up with 9 files (A1.txt with content cA1, A2.txt with content cA2, etc...). (Whether or not the file ends in .txt does not matter). I'm on WinXP and was playing with something like the following: filename cc 'J:\SAS_PROGRAMS\STATISTICS FRAMEWORK\projects\Camper Satisfaction Sentiment Analysis\_test\name'; data _null_; set have; length fname $250; fname = "J:\SAS_PROGRAMS\STATISTICS FRAMEWORK\projects\Camper Satisfaction Sentiment Analysis\_test\" || TRIM(fname); fname = id || group; file cc filevar=fname; put comment; run; Removing the filevar option will make this write all obs to "name" text file. The filevar= option seems to do what I want BUT my computer is locked down and I can't write to the default path. I get this error: "Insufficient authorization to access C:\Program Files\SASHome\SASFoundation\9.3\1" Is there a way to change the path of where the file is stored when using the filevar option? (X command using chdir doesn't seem to work) Alternatively, is there a different approach? A DO-loop and text-file-writing-macro ? My data is about 6,000 observation so I am sort of concerned about performance (which is likely more an issue of writing that many files to a single Windows directory...) (I also just noticed that fname seems to have some blank spaces...)

jaredp · ‎08-02-2013

For the time being we are focusing on topics. Originally, I was unsure if the cluster analysis would be beneficial. At that time my data was wide (35 vars, 25 obs). But when I transposed the dataset to treat each document as a variable, I began thinking that clustering may reveal some common themes across the sections - this is one of the objectives of my analysis. Truthfully, to answer your question, I'd have to say "I don't know".

jaredp · ‎08-02-2013

I appreciate the follow-up. That's what I ended up doing was breaking things down by subsections. I get much better results this way. You hit the nail on the head with "Depends on your objective". Once I stood back to look at the main objectives, it became much clearer on how the data could be reshaped for analysis.

Online Status	Offline
Date Last Visited	‎04-07-2020 04:26 AM

Re: EM Decision Trees - Stratification or Not - Validation or Test?

Re: Tip: Use the Control Point Node for Simpler, Reusable Process Flow...

Re: How to convert 1 as one and 2 as two without Proc format

Re: Suggest other ways of creating a frequency variable

Re: Suggest other ways of creating a frequency variable

Re: Suggest other ways of creating a frequency variable

Suggest other ways of creating a frequency variable

Re: Churn Analysis with SAS Miner- Help

Re: SAS X Command won't produce log file on linux server

Re: What's the difference between the rule builder and content categor...

EM Decision Trees - Stratification or Not - Validation or Test?

Re: How to convert 1 as one and 2 as two without Proc format

Tip: Use the Control Point Node for Simpler, Reusable Process Flows

Re: Suggest other ways of creating a frequency variable

Re: Suggest other ways of creating a frequency variable

Re: Churn Analysis with SAS Miner- Help

Re: Data Preparation

Re: Text Categorization Rule

Re: Removing linefeeds

Re: Weekend Challenge

Re: Tip: Use the Control Point Node for Simpler, Reusable Process Flow...

Re: Database dropped leading zeros...

Re: Missing value for char variable-how to delete

Re: Weekend Challenge

Re: Text Categorization Rule

Re: SAS Import CSV file - Free text column - carriage return

Re: I'm intersted in Data Mining

Re: Oversampling in Enterprise Miner, Please Help...Thanks

Re: Text Mining small obs but large text

Re: Sentiment Analysis Workbench Corpus format

Sentiment Analysis Workbench Corpus format

Re: Per observation text file export

Re: Per observation text file export

Per observation text file export

Re: Text Mining small obs but large text

Re: Text Mining small obs but large text