Automation to Remove Favoritism and Increase Consistency

The major advantage of automation, excluding increased profits and reducing costs, is the removal of bias in any interaction.

Remove favoritism.  Increase consistency.

Bias is built into every human encounter.  Discrimination is not always blatantly obvious.  It's more subtle, difficult to track.

And you can increase profit and reduce costs :)

Blockchain Technology Going Mainstream in 2017

Blockchain is a technology that's making headway in 2017.

Originally designed to disrupt the banking industry through an alternative currency called Bitcoin, the framework for logging transactions in a transparent distributed environment through the use of a write-only ledger system based on contracts which is highly scalable is being repackaged for bigger things.

It's designed to be crypto friendly, so banking institutions are leveraging this framework to post transactions in real time, thus, removing the middle man 3rd party and speeding up movement of funds instantaneously.  This has huge implications and has potential to disrupt an outdated system of money transfers.

Because the system is designed to run automatically without the need for administrators, the framework can be used in other industries besides finance such healthcare, retail, identity management, etc.

Blockchain seems gaining steam with support from some of the big vendors.

IBM sees the benefit of Blockchain, you can read about it here.  

Read the Docs here posted 03 November 2016

Get started here.

What is the Hyperledger Project (Enterprise class, cross industry, Open Source collaborative ecosystem, non vendor specific) here.

Actual .org site here.

Microsoft offer's a Blockchain as a Service on their Azure platform.

Ethereum Blockchain as a Service now on Azure Posted on November 9, 2015

Azure Blockchain as a Service update Posted on December 7, 2015

 AWS also has Blockchain capabilities here.

Blockchain has potential to become the backbone of future technologies.  For instance, Internet of Things.  IoT which are decentralized applications running via sensors could write mini bursts of information to a decentralized Blockchain.  This would create a legitimate audit trail, trigger downstream actions and give IoT the reliable security it needs.

I'm sure you'll be reading more about Blockchain as it becomes mainstream.  It's just about there.

Thanks for reading~!


First try at IBM Watson Analytics

IBM offers a wide array of Cloud products.  Today, I'm getting started with IBM Watson Analytics:


After a quick sign on and verification process, access within minutes.  30 day trial period.

Activated Account:
Email confirmation:

 Open IBM Watson Analytics:

Sample pre-loaded Dataset (CSV):

Ask Questions:

Typed a question:


Clicked on Bar Chart:

Added fields:



Visualization types:


Online Documentation:

Analytics Engine:


Import Complete!:

 New Data Source:

 Discover new data set from Airbnb:


So we got our feet wet on IBM Watson Analytics.  Created an account.  Viewed the default sample file.  Imported a data file from Analytics Exchange.  Didn't write a single line of code.  Completely interactive.  Based in the Cloud.  One nice feature, the ability to "Share".

This tool is designed for data interrogation for the Business User.  Seems like a solid product.

Thanks for reading~!


Temple of Babel, a Case for Open Data

How many active languages are there in the world?

Apparently, the all knowing crystal ball says "roughly 6500".  Who knows every single language?  I surely don't.  English is my main language, I tried learning Hebrew as a child, no luck.  Took 3 years of high school Spanish and two years in college.  Donde esta el bano?

Sure makes it tough to communicate clearly when nobody knows what everyone is saying.

Fear not!  We have technology that translates in real time, both spoken and written.

What about computer languages, why so many?  Iterations over time.  New features.  More precision.  Less memory usage.  Access to kernel level functionality.  Tons of reasons.

Within each language, are there mandatory design patterns?  Syntax, yes, patterns, no.

That's why most people prefer to write code, not maintain others.  What in the heck was this person trying to do in all this spaghetti code?  Who knows.

What about the world of data?  Are we mandated to use specific patterns?  Naming servers, databases, tables, schemas.  How about variable types, precision, international formatting.  It's a free for all.

How about the actual data?  We have free form text boxes on the front end, accepts any characters including hex and carriage returns and what have you.

What about address cleansing?  Do we mandate specific patterns.  If you've worked with address data for any length of time, you'll find out quickly how even the spelling of a city can vary drastically, let alone Rt  vs. Route or Street vs St. or MLK vs Martin Luther King.  It's maddening.

What we've done basically, is created a Temple of Babel, such that almost no two databases are alike.  How in the world can we unite datasets seamlessly when nobody follows the same patterns.

If anything, I would propose prefixing public data sets with a 5 part naming convention.  Organization : Server : Database : Schema : Table.  Then open up the database to real time querying remotely.  Either in real time JOINS across disjointed networks, or through the use of API calls or Web Services.

Similar to Web Services, you provide your Data version of the WSDL, so people know what data you are exposing, with 5 part naming convention, sample data, user access with generic credentials, etc.

Once orgs start opening up their data, we would no longer need to download CSV files, save to disk, ingest into database, and find a matching field to join on.  Simply make the call, JOIN your data set to theirs, run the query and get results in real time.

But why not automate that.  Have a centralized list of all public data sets, have your computer scan the list in real time through code, have it query the data without your assistance.  Then add that data to machine learning or artificial intelligence or internet of things, and have the entire thing run without the assistance of a human.  Have it create the reports, find the insights, create visualizations, viewable on smart phones or tablets or generate emails.

Have a new data set, add it to the list, soon people will be reading your data remotely, assuming you allow it.

We've got tons of data.  That data needs to be read, interrogated, aggregated, and interpreted to combine with your data to find those nuggets of insight.  Without common frameworks, we can never unite the data into a pool of world data.  Why not expedite the process, form some standards, a means to expose the data, have a list of available data sets and let the computer churn through 24/7.

We need to remove the data version of the Temple of Babel.  And create a platform for real Open Data.

What is NOT a Data Scientist Function

Data Science sprung up one day and set all us Report Writers and Data Warehouse people to the side.

Where is your PhD?  Where's your statistical analysis?  Where's your math skills?  Where's your Big Data skills?

Uh, I took some classes about 20 years ago.  I may have the book in my parent's basement, if they haven't thrown it out.

Data Scientist entered the arena and people took notice.

Fast forward a few years, where are we now in the Data Science hottest new profession?

Well, in my opinion, Data Science does have more to do with Statistical Analysis.  However, it must be mixed with Domain Knowledge, to produce insights otherwise unknown.

Big Data is NOT required for Data Science.  However, digging through Large Data Sets or Unstructured data could produce some nuggets of insight which were previously ignored due to lack of technology to interrogate.

Data Preparation is also NOT a requirement.  When I interviewed at Facebook a few years ago, the position was for Data Engineer, not Data Scientist.  The role of the Data Engineer I believe was to prep the data into a manageable file or set of files, for downstream statistical analysis. 

Data Warehousing and Report Writing is NOT a requirement.

Storytelling or Data Visualization or the ability to translate the finding to Executives using basic language IS part of the Data Science role.  If you can't convert the insight to action, what good is it.

A PhD is NOT a requirement either.  If you learned how the craft from an online course or looked over the shoulder of someone else, and can do the job, that's all you need.  And by doing the job, building models, dissecting the model using algorithms such as Clustering, Classification, Regression, Decision Trees, Neural Networks, etc. to produce accurate repeatable results.  It's rather difficult to pull a PhD out of your pocket after leaving college decade or two ago.

To summarize, in my opinion, interrogation of data using Statistical Analysis and presenting that insight to stakeholders who are authorized to take action are requirements to be a Data Scientist.  With that said, it may include Big Data or Unstructured data or Data Visualization tools and even data preparation.  Those tasks could be performed by someone on the Data Science Team or outsourced to individual people within the org.  Or a lone ranger Data Science may do parts or all, depending on resources and skill levels.

A Data Science person or team can provide skills and products previously unknown to the traditional Data Professional.  From where I stand, many shops today still depend on Data Professionals to do heavy lifting of their data needs.  And some shops are using Access and Excel or emailing files and downloading data sets off the intranet and lack formal Business Intelligence.

Data used to be an afterthought.  Now data is used for strategic decisions that impact everyday production, sales, marketing and survival of organizations.  Whatever the role: Data Scientist; Data Professional; Business Intelligence; Data Warehouse Developer; Report Writer; etc. data is now at the forefront of just about every business today.

First try with Amazon AWS

I've been interested to see some other cloud offerings.  So today I registered for an account with Amazon AWS.  You can sign up at no cost, although you must provide your info, credentials and credit card.  They call you to enter a pin number for validation.  Once registered and logged in, you're all set.

So here's a few screen shots of my discovery of Amazon AWS:

Different account types:

Main page:

Monitor spending:



6 Databases to choose from:

Data Warehousing:


Getting started with Redshift DW:

Redshift Documentation:

Amazon AWS Partners:

Tableau integration:

Connect Tableau to Amazon Redshift:


Big Data on AWS:

Amazon EMR:


My first impression is Amazon AWS offers a tremendous Cloud platform when working with data.  I didn't get a chance to poke around on some of the other features within the data space like machine learning.  And there's a ton of other offerings with hosting sites, micro services, static IPs, etc.

And there are many tutorials to get started with the technology of your choosing.  It would seem to me the first step would be to upload files (txt, csv, etc.) to S3.  From there, push/pull the data into a Database and / or Data Warehouse and / or Hadoop.

I'll need to research some common scenarios on best practice techniques.

So my first impression of Amazon AWS is a good one.

And there you have it~!  Thanks for reading...