8/31/2014

My First Microsoft ML Project in Azure

Today I entered the world of Microsoft Azure Machine Learning.  Posted a blog about it here:

http://www.bloomconsultingbi.com/2014/08/my-view-of-microsoft-azure-machine.html

So later today, I signed up for Microsoft Azure using the account for my employer.  It took about 5 to 10 minutes to create the account, set up a blob storage and then click on Machine Learning.

I attempted to do a complete project, based off the blog post from Sebastian Brandes, located here:

http://blogs.msdn.com/b/sbrand/archive/2014/07/22/tutorial-how-to-train-a-neural-network-with-azure-machine-learning.aspx

I took some screenshots along the way to document the steps:



Creating a Microsoft Azure account based off my company's MSDN subscription:



Created:


It gives a $100 / 30 Day free credit for signing up:


Here's the main site:


And Machine Learning:


Here's I'm creating a ML site:


It's procuring here:

And there it is, the first Machine Learning site on Microsoft Azure:


The main Machine Learning page:


And the main menu, with a variety of links:


And some sample projects to wet the appetite:


The Dashboard shows your usage:

And here's some setting for my workspace:

Shows that I have no Web Services created yet:

Here's where I clicked on a sample project from Machine Learning Samples:


Example #2:


Example #3 from Samples:


And another sample:



Here I'm uploading my own Data Set, called BreastCancer.csv, which I downloaded off the web:


Showing the progress of the upload, I really like the color schema of the web site as well:


Here's the actual project I'm going to work on, starting from scratch:


I've dropped my uploaded csv dataset onto the project canvas:


Some property info:
Adding the Columns component, which will be linked from Dataset by dragging the circle to the Project Columns component, very similar if you've worked with Microsoft SQL Server Integration Services (SSIS, formerly DTS Services 2000):


Here's I'm excluding the column Code Number as its a unique identifier and provides no added value to the project:

And here you can see the link from Data Set to Project Columns:



And some more properties, in the video I'm trying to duplicate, the return values of the Class, which indicates if the breast cancer was benign or malignant, is indicated by 2 for benign and 4 for malignant, so we're dividing by 2 here:


While replacing the field Class (overwrite basically):


And here we're subtracting 1, so (4/2)-1 = 1 or (2/2)-1=0, to give a 0 or 1 as final answer, which translates to most languages as False / True.  Just a different way of doing it I suppose:


Here we're cleaning the data, replacing any null values by removing the Row entirely:


Added this component, didn't change any of the settings:


Here's we're splitting the data set into 80% used to Train the Model and 20% used to test the Model, this is typical with Neural Networks:



Checking the name of the BlobStorage account, in order to place a CSV file there as output:


Because it prompts for Account name, key and directory path, suggest you prefix with "results/"


Here you can see the Project running:


And it completed successfully, after a few fall starts:


We then go to our BlobStorage to view the output CSV files:


Here's the first one, you can see, although I didn't tell it to send the column headers, the first field is 97.8% accurate which would indicate an expected value of 1 or True in that breast cancer was detected or as in the 0.05, would return a False or no cancer detected, and so on.


Here's the other file, row by row, you can see the first 0 or 1 column was the actual value in the original spreadsheet, the 0/1 next to it is what the predictive model said the result would be.  In most cases of this model, the two numbers matched, indicating a fairly accurate model.



And it appears no hours were used during this demo:


And here you can see the spike on Microsoft Azure where I performed the Machine Learning job and ran it, didn't take very long:


Here's the Experiment Dashboard view, with the latest project accessible:


And this shows the job history with a couple of failed runs, which I had troubleshoot and clean up, in my case the output directory had to have the prefix of "results/":


There's a ton of useful components available in Machine Learning.  Here's a few of the groups to choose from:






Quite a lot to learn to say the least.


So as you can see, from the vast number of screenshots, based on the blog post from Sebastian, I was able to duplicate to some degree the same steps.  However, I uploaded my data set to Azure using a Custom CSV file downloaded from this website, same as he did:

http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/

I now believe that there's a major difference between a Report Writer, a Data Warehouse Developer, combined to make a Business Intelligence Developer, which may or may not include Hadoop / NoSQL development.  And finally there's the Data Scientist, who works with Machine Learning and statistical algorithms to derive insights into the data.  And if he / she is using Microsoft Azure Machine Learning, he has the entire arsenal of solutions in a simple web browser.  It also reduces the pain of writing your own algorithms, custom code to interrogate the data and finally, to push the entire solution to production with a simple mouse click.

It does get more complicated.

Thanks for reading my blog post and I do have a lot more respect for what a true data scientist does after this en-devour.

You can read my latest blog post about Predictive Models here: http://www.bloomconsultingbi.com/2014/08/so-what-is-predictive-modeling.html

So What Is Predictive Modeling

If a user requests a report, we can get that in a relatively short time.
If they want to join desperate data sources into a Data Warehouse, we can do that too.
At some point, the user has to analyze the data.  In doing so, they bring in bias, perhaps skewed perspective and misreading of the data.
However, that's been the world of data for the past few decades for the average Data Professional.

We did have predictive models years ago.  I know from working at the bank approving loans.  My decisions were based on a score.  Where did that score come from?  It read in variables from the loan application in addition to the credit report.  It looked at age, number of years at residence / employment as well as number of revolving loans, installment loans, bad pay history and inquiries.  All those factors and more were sent to the model which returned a score.  The score had thresholds, if the customer exceeded the threshold, he/she was approved.  Otherwise decline.

However, that predictive model based the statistical chances of that customer repaying the loan in full.  There were no guarantees.  My boss always said, there's the chance the customer gets hit by a bus and the loan goes default.  The score is a probability of return payment.

So if we had predictive scoring models in 1994, what's changed.

A lot.  First off, the increase in data has blossomed.  Second, the tools to create models have entered the hands of the average person.  So no PhD required.  Third, with a little knowledge in Statistics, Data Professionals can now create end to end full life cycle applications in a short amount of time.

Based on the new offerings from Microsoft Azure, I could log on, upload a data set, create an application by simply dragging and dropping components onto the canvas, hook them up in such a way as to cleanse and transform the data, build a model, train the model by running a subset of the data through to create weighted averages, then run a real data set through, have it back propagate the data set to increase the accuracy, score the model, output the results for analysis.

Not only that, it will seamlessly create a web REST API, allowing programmers to send data to the EndPoint, have the model application score the data, and return a percentage of probability to the calling app.   What that means is the model is exposed for actual use by a number of calling applications.

What's an example of end to end model usage.  If I'm running a business, a person wishes to purchase a product, as a company we extend credit to customers.  I'd like to know if this applicant is worthy of credit.  So as the person applies online, I as a programmer, send the necessary information to the REST API, through code.  I send the pertinent info, the REST web service receives that info, sends it through the model, returns a score to the calling app and says this potential customer, based on their specific characteristics will be a good paying customer, so yes, extend a line of credit.  Or the opposite, do not extend credit because his / her profile matches a subset of people who did not repay.

So I could create a model to predict if a customer is credit worthy.  Without having to hire PhD Mathematicians or Statisticians.  In fact, I could to it all within a web browser in a relatively short amount of time.

And this is just the tip of the iceberg.  With the growing data sets available, the possibilities are endless.  But what makes it so unique is Microsoft Azure Machine Learning offers the entire package online, with single user login, collaboration, end to end solution with easy to use software and good documentation.

This is where the world of Data is heading.  I think its incredible.

My View of Microsoft Azure Machine Learning

Read a tweet today, Information of Things surpassed Big Data on the Hype Cycle.

Which is interesting, because my interest in Hadoop has fallen over the past few months.  I initially liked it because it was new territory, had value in handling unstructured data and large volumes.

Except I don't have unstructured data or large volumes.  And I never got to work on a project that had Hadoop.  As far as setting up and administering a cluster (Hadoop DBA) or developing HIVE or PIG or Flume or whatever.

It's like the best Christmas gift (or Chanukah gift) never opened.  And now every Tom Dick and Harry is entering the space which is getting a little too crowded for me.

However, my real interest has always been Artificial Intelligence and Neural Networks and Machine Learning.  So when I watched the following video on Microsoft Azure Machine Learning:

https://www.youtube.com/watch?v=tQojmRevsdE

It kind of sparked my creativity again.  So today I watched another video on the subject:

http://blogs.msdn.com/b/sbrand/archive/2014/07/22/tutorial-how-to-train-a-neural-network-with-azure-machine-learning.aspx

My basic takeaways are this.

They've removed the need to program advanced algorithms.  So no PhD in Math required.
They've removed the need to program essentially.

In that the entire process can be performed within a web browser.

First you upload your data set, or use an existing data set.  You simply add the data set to your project, start dragging and dropping and connecting steps into your workflow from the huge list of available options.

Which means you kind of have to know what each of the the widgets do.  As in Classification, Clustering and Regression for starters, but then some of advanced algorithms and what they do as well.

Microsoft Azure Machine Learning is tightly integrated into Azure so there's the single sign in, connectors to BloB Storage, Azure Tables and SQL Azure as well as Hive and Hadoop.

So those two factors, single sign on and web browser based are huge factors.

Throw in the full life cycle processing, no need to learn advanced Mathematics or Programming and I see this as the future.

Plus the ability to move to production within minutes, having it build a REST API consumable by c# code for programmers to send in data and receive a result based on the trained model, or you can send batches of data and receive a batch back.

However, I'll still need to get up to speed on deeper understanding of Statistics, how to interpret the results and what kinds of projects to work on.

Anyway you look at it, this stuff is awesome~!

8/30/2014

Machine Learning on Microsoft Azure

Reporting started out very basic.  Connect to a data source, place some fields on the screen layout, add some groups, header / footer, perhaps a few Sums, run it and you have a working report.

Great, you could send those report to people in Excel or PDF via Email.

Next, Business Intelligence and Data Warehousing allowed developer to pull a variety of data sources, consolidate into a model, either Star or Snowflake schema, load into a Cube, write some reports and allow users to consume.

Great.  Except these scenarios require an end user to interpret the data.

So what's next?  Data Science.  Machine Learning.  And if you want to get started with Microsoft Machine Learning, check out this great video:

https://www.youtube.com/watch?v=tQojmRevsdE

It will blow your mind.  Because it's web based, looks easy to use and can be ported to production quickly.

This is definitely cool.


8/29/2014

Endings and Beginnings

Today market the completion of development on my latest project.  Went through the final requests for the remaining reports.  Sat with a few business units and got sign off for every report.

Which means they should soon be in production.  Which means I may have to support the reports during go-live next month for a few hours.

Saying goodbye to clients is often overlooked in the world of consulting.  Over time you get to know the workers and interact with them daily for weeks on end.  It was sad for the project to end.  Although the owner of the company said they will have additional work soon and she requested for me specifically to do their reports, so that was nice.

Which means on to the next project, starting Tuesday.  I'll update the blog with new adventures.

Mob Rules

Rooting for the home team is built into our culture.

I'm from Tampa so I must root for the Rays, Bucs and Lightning.

But I also root for the Jewish people because that's my heritage.

Perhaps you root for the Catholics, Methodist or Presbyterians.

How about politics, what team do you root for?

You see, people identify with being part of a group.  There's strength in numbers.  You can blend in seamlessly without fear.

Only problem is, they tell you how to think, how to behave, they set the social norms.  Yay for the Republicans, I'm all for them.  Another guy says, Yay for the Democrats.  Others root for the independents.  And yet others root for nobody.

Either way, you fit nicely into a bucket.  And that bucket is how you build your identity.

So here's a radical thought.  What if you formed your beliefs based on your own thoughts.  You know, free thinking.  Instead of hiding behind some group to cheer for.  What if you originated your own thoughts.

Ah, that's too scary.  What if people don't like my thoughts.  I'll be an outcast.  Well, maybe the world needs a few more outcasts to stir the pot.  Because as we know, business as usual is taking us to some deep dark places.

Build a Winning Culture

Culture.  Every corporation has one.  Yet you can't see it.  You can only feel it.

I've worked for a variety of companies.  Some big, some small.  I've worked on team where we just got along, everyone was excited about the projects, we collaborated, joked, went to lunch together and had fun while we worked.

I've worked at some companies where everyone was against each other, knowledge was kept secret, people never helped each other, stabbing in the back, politics, silos of people working on the same team.

And there have been cultures in middle.

Who sets the culture?  Well, I think it trickles down from above.  If the CEO is a jerk, chances are the atmosphere will be a struggle.  If the CEO is cutting edge forward thinking, chances are the environment will be easy going.

I believe that you want a culture that promotes creativity, problem solving, suggest ideas, where people give 110% because everyone else is also.  Which means you cannot micromanage, you can not run the people into the ground, you can not promote every man for themselves.

We've all seen the reality shows where everyone is on the same team working against each other for their own survival.  That's not how it should be.  It drains people and the quality of work suffers.

If people are constantly in fear of losing their jobs, the quality suffers.  Which trickles down to the customers, who experience bad service, shabby products and will tend to move on.

You can measure all the numbers in the world for insight, but if you want to pick up sales, drive great products which people want to purchase and retain customers, you'd best work on creating a friendly work environment.

I believe this is often overlooked as many companies are so focused on profit and stock price and mergers and layoffs, we need to get back to the root of the problem, happy workers.  A happy worker will pay dividends over time.

It doesn't take a Data Scientist to figure this out.  It takes common sense.

8/28/2014

Closing Pitchers of Projects

If you're are fan of baseball, you're probably familiar with the 'clean up' pitcher.  These guys are brought into the game during the last few innings to close out the game.  Perhaps when the starting pitcher's has thrown too much

The same is true with consulting.  There are times when a consulting agency is brought in at the tail end of a project.  To assess what's already been started, and to 'land the plane' if you will.

And that's what my last project was all about.  The previous developer left for another position, so the project was in a good state.  Some of the ETL needed completion, some reports needed tweaks and some reports and dashboards needed to be done from scratch.

These are typically smaller projects.  I should be wrapping up that project tomorrow.

And so today we evaluated another smaller project, to figure out what's already been done and estimate time to complete.  Turns out it was the same developer who left the other project.  And because we did such a great job on the last project, the consulting firm offered us this one too.  So the code looked familiar, this one happen to be a lot bigger, sourcing from 5 different data sources, many dim/fact tables and the cubes appears to be solid.

Finishing up a project takes a lot of skill, because as you know, the last 10% of any project is typically the most difficult.  So as in baseball, sometimes it's good to have a 'closing' skill set in your arsenal.