9/30/2015

Microsoft OLAP Cubes 101

OLAP Cubes are still around.  I work with them every day.  For Microsoft, there are two flavors.  There's Multi-Dimensional Cubes and Tabular Model cubes.

Tabular is the newer flavor, it has some advantages.  Like easier to get a cube up and running in shorter time, it uses a new language called DAX, you can import models from Excel Power Pivot and I imagine the newer Power BI.  You can assign permissions to specific users and groups contained in Active Directory.  Basically you create Roles with specific permissions, then assign users to the Roles.  Create partitions.  And schedule refreshes from SQL Server SQL Agent to process the cube, for example, if you receive new data throughout the day, the cube will only display new data when refreshed.


Multi Dimension Cubes have been around before Tabular.  They are a bit complex, if you're coming from the traditional T-SQL world.  MDX is a beast of a language, I've worked with it for a few years and every time is a challenge.  Although some apps like SSRS have Wizards to help out.
 

 Once you have an OLAP Cube, you can build reports off it.

What type of reports can you build based off an OLAP cube?  Microsoft offers Power BI, Excel, SQL Server Reporting Services (SSRS), Performance Point and Power View.


The OLAP Cubes run in a different database than OLTP, called Analysis Services.  It contains the Cube, Dimension tables and Roles.  The IDE used to build cubes is typically the SQL Server Data Tools, also known as BIDS depending on the version, and you can use Visual Studio 2013 provide you install the Data Tools add-in.

Many times, you build a Cube based off a Data Warehouse.  A Data Warehouse has Dimension tables and Fact tables.  The dimension are typically things like, Person, Region, Warehouse, Date, (things) to slice the data with.  The Fact tables contain measures consisting of Sums, Average, Mins, Max, Avergages, they are almost always numbers.  And you can build hierarchies within the Dim tables, for example:


Country --> State --> County --> City
 

 That's how OLAP Cubes work.  Abbreviated edition.

9/29/2015

Effects of Artificial Intelligence

Here's a good article on Artificial Intelligence from distinguished Microsoft researchers.

http://cacm.acm.org/magazines/2015/10/192386-rise-of-concerns-about-ai/fulltext

When we were children, we had a fort in the back yard.  The kids would get together and hang out.  It was our little world tucked away from the bigger world.  In our world we could do what we liked.  A refuge for children to explore and grow.

As you get into the real world, you abide by the rules in school do what you're told.  The teachers and principle were the law, if they sent reports back to the home base, parents took action.  So although you were away from home for a good chunk of the day, somebody was still watching and monitoring your behavior.

Then we enter the workforce, the company and boss are there to make sure you performing up to expectations.  They don't call your parent, they let you go work somewhere else.

So you can see the progression of they system,  There's always someone watching you to ensure you maintain your behavior to align with acceptable norms.  If you don't comply, the school calls your parents, your employer fires you.

So the next possible iteration of the system is artificial intelligence.  An automated series of sensors that monitor every person on the planet 24 x 7.  It will scan your location, your actions, look for patterns and detect deviations.  

In addition to monitoring and surveillance, it can automate the enforcement of deviant behavior.  Once money becomes digital, they can simply freeze your account, your credit cards.  No need for electronic bracelets around your ankle, you have a cell phone.  

In addition, they can automate the police force, by militarizing robots to enforce rules, fight wars and monitor the citizens.  And lastly, these friendly robots could be doing your old job, while the majority sits home watching Jerry Springer and the Price is Right.  With no job, our disposable income won't be there, so fewer purchases, not sure how the capitalistic system of continuous growth can survive.  So what happens to the masses of unemployed people?

People suggest the Universal Wage, a social welfare to pay for basic needs.  Perhaps, although a system based on legacy workers in  a welfare system monitored continuously, reminds me of a modern day prison.

So there you go, a short story about the impact of artificial intelligence and potential effects.  If you ask me, I would like to  build a fort in the backyard, invite all the neighbors over, and talk about nothing, just like we did as children.

9/24/2015

What does a Data Scientist Actually Do?

When you hear of someone being a data scientist, you imagine the life similar to a major league baseball player.  Status, prestige, salary, pick of the jobs.

I guess what I'm wondering is once they get the job, with high salary and perks, what do they do all day?

Perhaps they report to some senior executive, work in a lab, have unlimited compute power at their disposal, with unlimited access and security to every piece of data in the org.

Or perhaps they report to an IT manager, have to sit in a cube, have to submit ticket request for access to data, perhaps wait days or weeks for permission, perhaps they aren't given enough computer power to do  their jobs, and maybe they're locked out of key pieces of data.

I guess it could vary depending on the org.

You can take the funniest comedian on the planet, walk up to them and say "be funny" and they may have difficulty being funny on demand.  Perhaps the same is true with data scientist, "where's my insights", you've been working on this project for a while, we'd expect our sales to have increased or reduced costs by now, are you sure you know what you're doing?

And what about project life cycle management.  Are data science projects scoped out like other projects, with agile sprints and continuous delivery?  Are there milestones and deliverable and phases to projects?  How does a data scientist log their hours, to specific projects?  How are project hours determined ahead of time?  And from who's bucket do the hours get charged to, Marketing, Sales, Finance, Overhead?

What if there mouse stops working, do  they enter a ticket into the Ticketing System?  Who creates the work environments, development, test and production?  Where is the source code saved to?  Who performs the database backups?  How does code get moved to production, especially if they work in an ITIL environment?

What if the data scientist gets embedded within a particular department?  Disconnected from the team of data scientist, do they still have periodic meetings with the team of data scientist to compare notes and leverage the knowledge pool of coding techniques and domain knowledge?  Is there a company Wiki for documenting the knowledge learned, or does the domain knowledge walk out the door when the data scientist leaves?

What if the data scientist is unable to discover a true insight?  How does a data scientist know which project to go after, easiest, move value, most exposure?  Who maintains the queue of experiments to work on, and who prioritizes the list?  Who does the quality assurance on the work, before moving to production?  Does the data scientist work with a business analyst or domain knowledge expert?

How many meetings per week does a data scientist attend, what are the meetings about, who attends the meetings?

What if the results of a data science project are completely wrong?  And management implements the recommended changes with bad downstream effects?  Can the data scientist be held liable?  Should Data Scientist purchase professional liability insurance in case they get sued?

These are just a few questions I  was thinking about regarding the sexiest job of the 21st century.  I still think the Data Scientist is the rock star of the workforce, because of the potential impact they have, the intelligence required to be successful and because they have tentacles into every department and every software application and potential external data sets.  The Data Scientist is the new "goto" person of any organization.  Data Scientist try to tame the wild beast of data explosion, rapid advances in technology and storage capacity and programming and algorythms and domain knowledge.  Beyond the hype, they still need to interact in a business with real people and real processes and keep pace with  rapid change as well as deliver results to an eager audience.

Quite a challenge.  I think it's worth it.

9/17/2015

[Video Blog] Neural Network Artificial Intelligence Microsoft AzureML Sample Experiment

In this video, we discuss Neural Network Artificial Microsoft Azure ML.  After logging on to the Microsoft AzureML platform in the Cloud, we review a sample experiment that deals with Neural Networks.  We see an example of a new language specifically for Neural Networks called net# and view some R code as well as the accuracy results of the experiment.




Here's a few reference links:

Neural Nets in Azure ML – Introduction to Net#

Multiclass Neural Network 

Train Model

Score Model

Evaluate Model

Sweep Parameters

Guide to Net# neural network specification language for Azure Machine Learning

TSV File Format

MNIST Link

My blog post on Neural Network Artificial Intelligence Microsoft AzureML Reference

Although I've been following Artificial Intelligence for a few years now, I'm still learning a lot.  Especially on the new Microsoft AzureM platform.

This video is intended to be an introduction level tutorial.

Thanks for reading~!

Neural Network Artificial Intelligence on Microsoft AzureML Reference

I've been interested in Artificial Intelligence for many years now.  Probably because it's a tough nut to crack.  And researches have been trying to figure it out since the 1950s.  It seems the technology has really made good progress recently.  Primarily due to the explosion of data, increased processing power and available technology to the layman programmers.

Here's a few links for getting started with Artificial Intelligence with Microsoft tools.


for the c# people, here's: Developing Neural Networks Using Visual Studio


And an article about Microsoft's offering of Machine Learning in Azure from 2014: Why Azure Machine Learning is a Game Changer

And my posts from 2013: My Latest Crush - Artificial Intelligence


[Video Blog] Azure Machine Learning Web Services Connect via Excel & Visual Studio

Here's a video for Azure Machine Learning Experiment.  In this video, we explain a sample Experiment, create a Predictive Experiment, save as a Web Service.  From there, you can access the Azure Machine Learning Web Service via Excel Add-In or Visual Studio c# (or Python or R).  Microsoft actually provides the sample Excel workbook (although you must have the ML Add-In already installed) as well as sample source code for your project.

Here's the video:



Here's a few URL's for reference:

Azure Machine Learning Excel Add-In Download
https://azuremlexcel.codeplex.com/


Data set in video
http://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.data

Thanks for watching~!

9/15/2015

[Video Blog] Microsoft Azure Upload Files to Storage Account Mount in HDInsight Hive



This video explains the process of uploading text files to Microsoft Azure Service Account using a Utility called Azure Storage Explorer.  Once ingested, we use an existing Azure HDInsight Hadoop cluster with Hive to mount the text file into an External Hive Table. 



The Video got cut off at the end, sorry for the abrupt ending...

Here's a few of the links from the video:
 

Use Hive and HiveQL with Hadoop in HDInsight to analyze a sample Apache log4j file
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-hive/
 

Create and load data into Hive tables from Azure blob storage
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-data-science-hive-tables/
 

Azure Storage Explorer
  
HDInsight: Hive Internal and External Tables Intro

Thanks for watching~!

Microsoft Azure Machine Learning ML and Excel ML Add-In Demo

Here's a new video explaining the Microsoft Azure Machine Learning ML technology.  First, I explain the types of algorithms hosted by Azure ML.  Then we log on to Azure, and review a sample Experiment in moderate detail.  And finally, I explain the Excel ML Add-In and how to install.




Here's a few links to get started:

Microsoft Machine Learning Gallery 
https://gallery.azureml.net/

Choosing an Azure Machine Learning Algorithm

http://blogs.msdn.com/b/jennifer/archive/2015/09/08/choosing-an-azure-machine-learning-algorithm.aspx

Machine learning algorithm cheat sheet for Microsoft Azure Machine Learning Studio
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/

Azure Machine Learning Excel Add-In
https://azuremlexcel.codeplex.com/

After you download the file, extract to your computer.  Open cmd from the Start --> Run, run as Administrator, then find the following directory, for 32-bit Office / Excel:
C:\Windows\Microsoft.NET\Framework\v4.0.30319

Then enter the following command, changing the directory path of your downloaded files (change the bold string below to match your file path):

RegAsm.exe /codebase C:\Users\jbloom\Desktop\BloomConsulting\MLAddin\Release\AzureMLPrediction.dll /tlb:C:\Users\jbloom\Desktop\BloomConsulting\MLAddin\Release\\AzureMLPrediction.tlb

9/14/2015

[Video Blog] Microsoft Azure Portal HDInsight Hadoop Cluster

Here's a few video's where I discuss Microsoft Azure Portal HDInsight.  First we create the HDInsight Hadoop Cluster by applying a few settings in the Azure Portal.  Second, we log into the Virtual Machine and poke around.

Video 1 of 2:



Video 2 of 2:


Thanks for watching~!

9/12/2015

My First Blog Post at Experfy

In addition to blogging here, I've also started blogging at a site called Experfy, a Harvard Innovation Lab that specializes in Big Data, Analytics and Business Intelligence projects.

Here's the link to my first post: https://www.experfy.com/blog/rise-of-the-data-driven-culture title "Rise of the Data Driven Culture"

Thanks for reading~!

Enjoy.

9/10/2015

Microsoft Azure SQL Server Database Power BI Desktop Video Series 3 Parts

Saw a Tweet this afternoon, to load Screencastify Lite, which is a plug in for Chrome, to allow screen captures into video's.

So I create a 3 part series on Microsoft Azure SQL Database, Connecting to Azure SQL Database from On-Premise SQL Server Management Studio, create two tables, load some data, and finally, pull data data into Power BI Desktop into PowerBI.com site.

Microsoft Azure SQL Server Database [Part 1]:



Azure SQL Server Database [Part 2]:



PowerBI Azure SQL Database [Part 3]:


Hope you enjoyed.  First video series, didn't know I had a Kermit voice and talk fast... who knew :)

Determining ROI of Data Initiative

The truth of the matter is that data is running wild and free in the bowels of organizations.  The goal of a Data Professional is to tame this wild beast.  And domesticate it.

For this to happen, one must get into the mind of data.  Figure out how it flows, what hidden message reside within, and how to extract such knowledge.

In doing so, one must become a Data Whisperer.

Any organization thinking to enter the data analysis space, which should be every org, must determine the cost of doing business.  As in Return on Investment.  Or ROI.

How is this accomplished.  There are "Fixed Costs".  As in Servers, Network, Software, Developers, maintenance agreements, etc.  Or host in the Cloud to leverage elasticity, backups, remote workforce, etc.  Some of the software is actually free, as in Open Source languages and Big Data storage.

Besides the fixed costs, you have many unknowns. Like time.  How long should the project last.  One way to get a handle on time, is to break out the project into smaller chunks, over time, Agile approach, rather than Waterfall.

What is the final product of a Data Initiative.  Convert raw material, Data, apply business rules, mash the data with other data sets, to produce refined diamonds of Insights.  What's the value of these Diamonds?  We don't know up front.  Let's say this piece of information, when applied, increase gross sales by 10%.  10% of a particular product could be billions.  How about saving costs, by implementing this change, we can reduce money spent by 20%.  Or change this process, save 30%.

As you can see, the final result of a Data Project can fluctuate greatly.  You can also see the potential can be great.  So your return on investment can be calculated up front to some degree, as in fixed costs of the project, although, that can be offset exponentially based on the findings and resulting changes.  Impact could be staggering.

9/07/2015

Re-Intro to Pentaho Data Integration Kettle

I've used the Pentaho Data Integration product, also known as Kettle a few years ago.

My Intro to Pentaho Big Data PDI Kettle

#Pentaho #Kettle #PDI CE Offering.

And this post, which is one of my all time most viewed blogs at 6,000 reads: Follow Up Post

So tonight I wanted to get re-acquainted the Kettle.  You can find the download and info page here.  And the project Jira here.  And Pentaho Data Integration (Kettle) Tutorial here.

Initiate the download:


Unpacked the Zip file:


Click on SpoonConsole.bat


And the Windows application opens:


The best part about using this free open source software from Pentaho are the Big Data features:


I opened a sample project called "test-job.kjb":


Project loaded in the IDE:


Modified one of the steps to include an Excel file:


Save, then Run / Execute the Job:


It produced a message box:



And the job ended:



Opened another sample, "Generate product data.ktr"


Save, Run & Execute:


The project generated an output file after creating a directory above in the Transformation directory called "Output":


And looking at the output file contents:


The data was generated randomly using the Java script function:

java;

var description=Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(4, "", "-desc", false);
var code = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(3, "PRD-", "", true);
var category = Packages.org.pentaho.di.core.util.StringUtil.generateRandomString(1, "", "", true);

var price = 150.00 + ( Math.random() * 1000 );

And mapping the field:


Pentaho Data Integration Kettle is a nice piece of software that's open source and works with Hadoop.  Lot's of nice features.




9/06/2015

My Intro to Azure Data Factory

If you're a Microsoft Data Professional and have an Azure account, you should check out the Data Factory features: http://azure.microsoft.com/en-us/services/data-factory/

First, there's quite a bit of extensive documentation available, here's an introduction to Azure Data Services: https://azure.microsoft.com/en-us/documentation/articles/data-factory-introduction/

Tonight, I logged on to Microsoft Azure account, created a Data Factory.  Attempted one of the Samples, which required an Azure SQL Server Database, which I created, along with a storage account for HDInsight.  Although, due to time restraints I did not complete the demo.

However, this is a great feature, cloud based, to pull data from Azure, On-Premise, HDInsight, Azure Machine Learning and a variety of other data sources.  Along with the ability to massage the data through Power Shell, JSON, Stored Procdures, APIs, Hive, PIG and a bunch more.

So here, you can see a Data Factory was created in Azure Portal:


And sample to work with:



It requires a Azure SQLDatabase and Storage:


So, let's create a SQL Database, to an already existing Server:


Set existing Server:


It's creating new SQL Database:


It's created:


Set the Data Factory to the new Database:



That's as far as I got.  Can't wait to get more time to experiment.

9/03/2015

Is a Living Wage Really the Solution to Automation and Job Loss?

Automation.  Job loss.  Starting to make its way into the mainstream.

How will everyone survive?  Oh, just give them subsidy payments, in the form of credits, another name for welfare.  How's that going to work?  Who's going to come up with the money to pay millions of people to sit home and watch Jerry Springer and the Price is Right?  What kind of lives would these people lead.  No potential for employment, ever, for the rest of their lives.  Giant sacks of potatoes.  Probably start drinking every day, or do drugs, or loan their bodies out for payment.

I don't think that's going to work very well.  And how is the economy going to maintain it's constant growth, if nobody has disposable income to blow of frivolous items that don't last, imported from other countries.

Lots of unemployment.  A society based on the welfare system.  Consumer spending plummets.  Corporate profits tank, fueling more layoffs, higher prices and lesser quality.

And what about the social unrest.  Do you think people will be content with this existence?  Lack of purpose?  Swimming in poverty?  With no upward mobility?

This plan makes a lot of sense.  There's not enough money to fund it.  It will cripple the economy.  And create a society of unemployed couch potatoes.

Perhaps we need to revisit this one a bit more.

Get Sh#t Done!