2/24/2016

Intro to AI and IoT by @SQLJon







Where Are All the Data Insight Success Stories - 10 Questions

If you're interested in the data space, there's no shortage of offerings.  From In-Memory solutions to embedded analytics to big data platforms and streaming data.  You can view the data from any touch point along the trail.  Consolidated data warehouses.  Self Service ETL and Dashboards.  Real time mobile analytics.  Whatever your data heart fancies.

Where are all the data insight success stories.  Real people. Solving real world problems.  On a daily basis.  I know they're out there.  With the amount of resources and people and time and energy put into the data space right now, surely there are people who leverage their data as an asset, to derive insights and apply changes downstream.

To increase sales.  Decrease costs.  And streamline processes.

You have data.  The data gets massaged.  Lands in a central place downstream.  People view that data.  Look for patterns.  Derive insights.  Apply changes or action.

That's the life cycle as we know it.  I know all the heavy lifting is occurring.  So here's a set of questions I'm curious about:
  1. How has the data changed your business?
  2. What actions were taken based on the data results?
  3. Would you have found these insights with traditional methods, like gut or intuition? 
  4. What was the return on investment (ROI) for your data investment?
  5. Did you have sufficient resources within your organization, or did you have to look outside the org?
  6. Did the technology stack change mid-way through the project engagement?
  7. How difficult was the entire effort, what were the pain points?
  8. Who owns your data initiative, IT, the Business or Chief Data Officer?
  9. Was the business involved throughout the project life cycle?
  10. Was your project a success?

2/23/2016

Hadoop Broke the Barriers of SQL and Data Warehouse

You may have read my post on Getting Bit by Hadoop.

The main theme was we were happy in our SQL world with 20 years of solid Data Warehouse technology.  Along came this new technology.  We gently placed it on the alter. And worshiped to it to solve all our problems.  After some years, the reality sunk in and it the hype dwarfed it's widespread adoption..

However, the fact that a new technology rose to fame is such short time is amazing.  The underlying concepts are still valid.  A mechanism to handle huge amounts of data, both structured and unstructured, in a distributed computing environment.

This opened the doors to an entire new industry.  And an entire new breed of developers and career opportunities.

It broke through the barriers and stranglehold of traditional SQL and Data Warehousing.  And made a name for itself.  Not an easy feat.

That set the foundation for new development.  As in Spark.  And a dozen other integrated frameworks bundled into the Apache Foundation.  Including security, streaming, machine learning, in-memory, data frames, integration with R and Python and Scala and other languages besides Java.

From the movie Moneyball, getting it in the teeth, the first one through the wall always gets bloody:



The bottom line, Hadoop was ahead of its time.  It paved the way for new technology, new ways to handle data and find insights.  And mostly, it put the software in the hands of everyday people.  Creating opportunity for the big guys as well as the small shops.

It's only going to grow from here.  Influx of developers.  Better adoption rates.  Ease of use.  Tighter integration.

Overall, I think the biggest contribution of Hadoop is to shine the light on the importance of data.  Data Driven companies are the future.  Integrating different data sources to derive insights is the new norm.  Building applications with the forethought of data, rather than afterthought, will propel business' into the future, as the other fall to the wayside.

Big Data has grown up.  Let's see where it goes from here.

2/19/2016

Not Getting Bit Again

Perhaps we need to assess the situation for what it is.

With the rise of Big Data, there were claims that it would solve all the world's problems.  Turn water into wine.  Save all your data.  Find insights.  Cure any problem.

I marched to that drum as well.  Put a lot of faith into it.  I don't think it turned out like we thought.

It's complicated.  It keeps changing.  Finding insights is difficult.  Inherent issues working with any data, regardless of size.

It's just another tool in the toolbox.  What can it do that traditional SQL databases can't?

Handle huge volumes of data.  Handle Unstructured and semi-structured data.  Serve as a central repository across large cluster of commodity servers.

The ecosystem has grown internally, with different iterations of this and that, some proprietary, and open source Apache projects.  It's become a mature ecosystem.

And fragmented.  As many orgs dipped their toes into the water, only to get burned as newer technology depreciated their entire projects.  A case of early adopters scorn.

And many options sprung forth to ease the pain.  Fully functioning applications to do heavy lifting ETL, importing and extracting data, monitoring, security, batch processing, streaming.  All good stuff.  Fragmented none the less.

And finding qualified developers was a problem.  Who are the main adopters of Big Data?  DBA's could care less about writing T-SQL let alone Java Map Reduce jobs.  SQL Developers most likely don't know Java either.  System Admin's usually don't code.  Nor do network admin's.  Or Business Analysts or Project Managers.  It's a new breed of developers.  And they need to know architecture, security, coding, data, best practices, ETL, business process' and business rules, networks, hardware, servers, administration, and on and on.  Who knows all this stuff.  College grads don't have the business experience, yet.  So the market didn't specify who the ideal developers were.  More of a shotgun approach.

And the vendors lined up offering proprietary offerings, each a bit different, to make things easier.

I would have given my right arm to program in Hadoop a few years ago.  Today, not so much.  Sure it would be nice to get the real world experience.  To be honest, I haven't been keeping up with all the latest stuff, too many twists and turns.

What technology am I focusing on now?  Artificial Intelligence.  Internet of Things.  Quantum Computing.  This stuff is exploding right now.  However, even with all the excitement and hoopla, I'm taking more of a reserved approach this time.  The claims running through the noise are similar to Big Data, will solve all the worlds problems, etc.  Yes, it will change things, for the better we hope, but will also introduce new issues and new complexity and continue to create a technology gap based on limited quality resources.  It will drive new markets, create new players and move things forward.  Rather than rehashing legacy code with newer frameworks, AI, IoT and Quantum Computing could potentially be real game changers.  And put us all out of work for good.

And there you have it~!

Here's a nice piece by industry leading analyst Howard Dresner, who coined the term "Business Intelligence", where he discusses the topic of Big Data, enjoy...

http://www.cxotoday.com/story/big-data-who-cares-one-third-companies-seriously-dont/ 

2/10/2016

Getting Started with Internet of Things Azure Platform

You can hear IoT through all the noise, it's really starting to make headway.  And since I've been a Microsoft developer for over 20 years, and since I have an Azure account, thought I'd get started with a demo.

https://azure.microsoft.com/en-us/develop/iot/



Click on the link: "Don't have a device? Configure your IoT Hub"



One thing you'll notice, right off the bat, is the support for c#, Java and Node.JS.  This pattern aligns with other tutorials we've seen on the the Azure platform, for example, the AzureML samples.

For this demo, we'll need Visual Studio 2015 and an Azure account.  Check.  and  Check.

Visual Studio 2015:


Microsoft Azure:



Clicking the "+" sign, we see a new feature, "Internet of Things":



Enter info:



Click "Create" to provision your new IoT platform on the Microsoft Azure cloud ecosystem of solutions:

And it's deploying:



About 5 minutes later, our IoT is ready:



And that concludes the initial setup of our first IoT application on the Microsoft Azure platform.  Amazing how easy that was.

Next, we need to build a c# console application to connect to this.  Open Visual Studio 2015, create "New Project" --> c# --> Console App --> Name:



Right click the project in Solution Explorer --> Manage NuGet Packages --> Package Source = nuget.org, check the box for Include prerelease, in the search box, enter "Microsoft azure devices'



Click Install, accept terms:



Installing software locally...

Turns out the refernce to Microsoft.Azure.Devices is not recognized, so need to download it from NuGet...




See both in the References now:



Now it recognizes our 2 new lines of code:



In Program.class added 2 lines and replaced with actual values from Azure IoT app we just created:

static RegistryManager registryManager; 
static string connectionString = "{iothub connection string}";

Add the following method to the Program class:

private async static Task AddDeviceAsync()
{
    string deviceId = "myFirstDevice";
    Device device;
    try
    {
        device = await registryManager.AddDeviceAsync(new Device(deviceId));
    }
    catch (DeviceAlreadyExistsException)
    {
        device = await registryManager.GetDeviceAsync(deviceId);
    }
    Console.WriteLine("Generated device key: {0}", device.Authentication.SymmetricKey.PrimaryKey);
} 
 
  1. Finally, add the following lines to the Main method:
    registryManager = RegistryManager.CreateFromConnectionString(connectionString);
    AddDeviceAsync().Wait();
    Console.ReadLine();
    
 Ran the console app in Visual Studio 2015:
 
 
And it ran without errors.
 
That's the first step of the 3 part series. 
This app reminds me of the Service Bus app I 
created a while back. I think step 2 actually 
uses Service Bus.
 
Our first IoT application using Visual Studio 2015 and Microsoft Azure IoT solution.
 
As always, thanks for reading.  

2/03/2016

#IoT Apps to Produce Reporting Metrics up to the MilliSecond

Reports are the lifeblood of any organization.

How many units shipped last week, month, year, inception to date.  Static reports.  Data warehouse reports.  Dashboards.  Self service reports.  Visualizations.

What we need are some real time reports.  Minute over minute updates.  How do we accomplish?  Through sensors.  Internet of Things.

Imagine the lightning fast metrics we could measure, from remote locations, in real time.

Ring.  Hello.  Uh, Bob, we noticed you weren't as productive the past 10 minutes.  Everything alright?  Uh, how do you know what I'm doing in another part of the country.  Don't worry about that Bob, please get back to work.

Or better yet, initiate a command to send an electronic jolt, that'll wake him up, just kidding.  Not really.

The Internet of Things can and will report on remote activity up to the second.  And that info is vital to many organizations.  For many reasons.

Just imagine all the new products and services to be offered.  Kind of reminds me of the apps built for the smart phones developers.  Maybe there's a potential market to spin off some type of quick IoT application, plug in, and go.  Kind of like lego blocks.  Just add pieces to an existing framework.

Dual purpose perhaps.  Once the vendor hardware is in place, with embedded sensors, operating system, application, network and connection to the central hub, perhaps add some more apps, like plug and play.

And perhaps have your app communicate with another IoT application, in a network grid of connected devices and sensors remotely talking back to the central hub along with other hubs.  Sort of like the entire earth lit up with synapses firing at random intervals, an invisible electronic network 24 hours a day, 7 days a week.

Lots of possibilities.

First Microsoft CNTK Project in Visual Studio on Windows (Part 2)

Microsoft open sourced and released their version of deep learning to GitHub last week.  I downloaded the source code, did the required installs and tried to get the examples to work.  After hours and hours, I had to step away.

Except that when I work on a problem, it's difficult to let it sit, unsolved.

So today, I downloaded the latest build of the project:

https://github.com/Microsoft/CNTK/releases

Install Instructions:

https://github.com/Microsoft/CNTK/wiki/CNTK-Binary-Download-and-Configuration

Going to try again.

Luckily, all the core installs are already on this laptop.


Apparently, my laptop has insufficient memory to handle the Python script.  No way around it.  Except to reduce the size of the array, by reducing the number of files to parse through.

Opening the Python file in IDLE, Python's Integrated Development Environment, it appears to be crapping out during the "Preparing train set...

So I changed the Range from 5 to 2, and it seems to run okay without errors:




 Done.





 It created a few files, Test.txt and Train.txt



So after re-compiling the new version of the Visual Studio project, adding the command line argument to the CNTK:

configFile=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10\01_Conv.config configName=01_Conv OutputDir=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10\Output


Ran the project, no errors this time:



Reviewing the Log file, it threw an error, couldn't file a file:

<<<<<<<<<<<<<<<<<<<< PROCESSED CONFIG WITH ALL VARIABLES RESOLVED <<<<<<<<<<<<<<<<<<<<
command: Train Test
precision = float
CNTKModelPath: C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10\Output/Models/01_Convolution
CNTKCommandTrainInfo: Train : 30
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 30
CNTKCommandTrainBegin: Train
About to throw exception 'error opening file './Macros.ndl': No such file or directory'

[CALL STACK]
    >Microsoft::MSR::CNTK::ThrowFormatted
    -Microsoft::MSR::CNTK::RuntimeError
    -fopenOrDie
    -::operator()
    -Microsoft::MSR::CNTK::attempt< >
    -Microsoft::MSR::CNTK::attempt< >
    -Microsoft::MSR::CNTK::File::Init
    -Microsoft::MSR::CNTK::File::File
    -Microsoft::MSR::CNTK::ConfigParser::ReadConfigFile
    -Microsoft::MSR::CNTK::ConfigParser::LoadConfigFileAndResolveVariables
    -Microsoft::MSR::CNTK::NDLBuilder::Init
    -Microsoft::MSR::CNTK::NDLBuilder::NDLBuilder
    -std::_Ref_count_obj >::_Ref_count_obj >


However, the file does exist.  So let's explicitly declare the ConfigDir:

configFile=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-

26\Examples\Image\Miscellaneous\CIFAR-10\01_Conv.config configName=01_Conv OutputDir=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10\Output ConfigDir=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10

Error:

<<<<<<<<<<<<<<<<<<<< PROCESSED CONFIG WITH ALL VARIABLES RESOLVED <<<<<<<<<<<<<<<<<<<<
command: Train Test
precision = float
CNTKModelPath: C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10\Output/Models/01_Convolution
CNTKCommandTrainInfo: Train : 30
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 30
CNTKCommandTrainBegin: Train
NDLBuilder Using CPU
Reading UCI file ./Train.txt
About to throw exception 'UCIParser::ParseInit - error opening file'

Need to specify the DataDir:

configFile=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-

26\Examples\Image\Miscellaneous\CIFAR-10\01_Conv.config configName=01_Conv OutputDir=C:\Users

\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous

\CIFAR-10\Output ConfigDir=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10 CNTKModelPath=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10 DataDir=C:\Users\jbloom\Desktop\BloomConsulting\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10

Looks like it ran, new error:

Validating --> OutputNodes.W = LearnableParameter -> [10 x 64 {1,10}]
Validating --> h1.W = LearnableParameter -> [64 x 576 {1,64}]
Validating --> conv3_act.W = LearnableParameter -> [64 x 800 {1,64}]
Validating --> conv2_act.W = LearnableParameter -> [32 x 800 {1,32}]
Validating --> conv1_act.W = LearnableParameter -> [32 x 75 {1,32}]
Validating --> features = InputValue -> [32 x 32 x 3 {1,32,1024} x *]
Validating --> featOffs = LearnableParameter -> [1 x 1 {1,1}]
Validating --> featScaled = Minus(features[32 x 32 x 3 {1,32,1024} x * {W=32, H=3, C=32}], featOffs[1 x 1 {1,1}]) -> [32 x 32 x 3 {1,32,1024} x *]
Validating --> conv1_act.c = Convolution(conv1_act.W[32 x 75 {1,32}], featScaled[32 x 32 x 3 {1,32,1024} x * {W=32, H=3, C=32}]) -> [32 x 32 x 32 {1,32,1024} x *]
Validating --> conv1_act.b = LearnableParameter -> [1 x 1 x 32 x 1 {1,1,1,32}]
Validating --> conv1_act.p = Plus(conv1_act.c[32 x 32 x 32 {1,32,1024} x * {W=32, H=32, C=32}], conv1_act.b[1 x 1 x 32 x 1 {1,1,1,32}]) -> [32 x 32 x 32 x 1 {1,32,1024,32768} x *]
Validating --> conv1_act.y = RectifiedLinear(conv1_act.p[32 x 32 x 32 x 1 {1,32,1024,32768} x *]) -> [32 x 32 x 32 x 1 {1,32,1024,32768} x *]
Validating --> pool1 = MaxPooling(conv1_act.y[32 x 32 x 32 x 1 {1,32,1024,32768} x *])About to throw exception 'Convolution operation currently only supports 1D or 2D convolution on 3D tensors.'

At this point, I reran the Python script, to convert the images to Legacy, hard coded the fmt value in the script and re-ran:


if __name__ == "__main__":
    fmt = 'legacy'
    trn, tst = loadData('http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz', fmt)
    print 'Writing train text file...'
    np.savetxt(r'./Train.txt', trn, fmt = '%u', delimiter='\t')
    print 'Done.'
    print 'Writing test text file...'
    np.savetxt(r'./Test.txt', tst, fmt = '%u', delimiter='\t')


I read something about the possibility, the path can not exceed a certain length, so moved the entire project folder to the C:\ Drive.

Modified the input parameters:


configFile=C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10\01_Conv.config configName=01_Conv OutputDir=C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10\Output ConfigDir=C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10  CNTKModelPath=C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10 DataDir=C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10  ModelDir=C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10 deviceId=0

Error:

<<<<<<<<<<<<<<<<<<<< PROCESSED CONFIG WITH ALL VARIABLES RESOLVED <<<<<<<<<<<<<<<<<<<<
command: Train Test
precision = float
CNTKModelPath: C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10/01_Convolution
CNTKCommandTrainInfo: Train : 30
CNTKCommandTrainInfo: CNTKNoMoreCommands_Total : 30
CNTKCommandTrainBegin: Train
NDLBuilder Using GPU 0
Reading UCI file C:\cntk\MicrosoftCNTK\CNTK-r2016-01-26\Examples\Image\Miscellaneous\CIFAR-10/Train.txt
About to throw exception 'CUDA failure 35: CUDA driver version is insufficient for CUDA runtime version ; GPU=-1 ; hostname=BLOOMLAPTOP ; expr=cudaSetDevice(deviceId)'



This is a documented error found here on the GitHub website for the project:

https://github.com/Microsoft/CNTK/issues/55

To summarize, I got passed the error of insufficient memory, by limiting the loop to parse files up to 2, instead of 5, so the data set was smaller.  However, I couldn't get passed the bug which is documented on the website.  And there you have it.  Thanks for reading~!

Original Post on CNTK: http://www.bloomconsultingbi.com/2016/01/first-microsoft-cntk-project-in-visual.html

First Project on CNTK in Visual Studio: http://www.bloomconsultingbi.com/2016/01/first-microsoft-cntk-project-in-visual.html

Root Cause Analysis