8/28/2016

The Historical Buildup to Information of Things IoT



Looking back at history of enterprise computing, we began with the Mainframe computer.  It sat somewhere hidden from sight, and ran the programs and stored the data.  It was programmed by full time programmers who knew the business, the applications, the domain knowledge as well as the data.

Soon we ventured into "fat" client applications.  With the Business layer, Database layer and User Interface layer, combined into the applications that ran on a user’s desktop.

Next, we separated the business layer from the database layer from the user interface layer into client-server apps.  We created classes for each, bundled them into DLLs.  

Next we pushed the DLLs into a COM pool which could be accessed in memory and called from the client app.

Next, we hooked up the internet to the same DLLs, which gave them multipurpose and added value.

Next, we pushed those DLLs onto the web as Web Services.  Called from Servers or Web applications, using SOAP XML calls with security.

Next, we had single page web applications using JavaScript to do all the heavy lifting through connected components that magically flowed together in a complex set of file structures on the server.  Combined with AJAX type calls from the client, it seemed to duplicate our traditional client server applications from long ago.

So what's next?  Internet of Things or IoT.  Applications running out in the wild in a distributed ecosystem.  Apps run on devices such as sensors.  They capture data in real time, perhaps store locally, and then flush the contents back to the main server, typically the cloud.  These micro bursts of packets of information flow from anywhere, anytime. 

 

It uses a specific protocol which bundles the info into small packets that get pushed to the server, where a program is waiting patiently for one, thousands or millions of incoming messages per second.

That data typically gets written on the server perhaps to a Hadoop Big Data repository, or is Streamed to a relational database.  It can also be analyzed in real time and send alerts to other apps or other streams or triggers to instantiate other events or processes.

The IoT devices out in the wild are tracked and monitored by pinging them periodically to verify they still have a heartbeat as well as to push updates and hot fixes.  For all practical purposes they are disconnected until they send or receive information.

Because they are distributed applications disconnected from the main server, they require reliable connects, impenetrable security, standardized communication patterns, knowledgeable programmers on the back end to program, patch, maintain and replace down the road.

The devices need a small operating system, perhaps a small database or way to store the data such as XML, network cards, communication protocols and a way to stay powered.

What happens in five years when the batteries stop working.  Or the operating system is no longer supported.  Or the vendor goes out of business or acquired by another company.  Lots of things to consider.

Security is imperative and as we know, any device connected to the internet can be compromised.  Packet sniffing watching messages fly by, exposing holes in the sensor network downstream and other vulnerabilities.

What's next?  Perhaps smarter chips, than contain artificial intelligence machine learning, to process the information in real time without having to send packets back to the server.  

How about connected devices out in the wild?  Thermostats connected to the Garage door opener sensor connected to the smart phone.  A web of interconnected smart devices running 24/7 disconnected from the main server.  

Or a decentralized "Hub" that radio's serves as a negotiator to transmit multiple IoT devices all day every day.

 Wow.  Talk about automation and intelligent machines.  We are definitely entering new territory.


And there you have it~!

8/26/2016

Running Hortonworks 2.4 Sandbox on Hyper-V

I wanted to download the latest Sandbox of Hadoop 2.4 from Hortonworks, but the only available options were VMWare or Oracle.

I downloaded the Oracle VM Virtual Box version, and got it working, after adjusting the BIOS setting on local laptop.  Had to initiate a reboot and enable a setting.

http://hortonworks.com/downloads/#sandbox

But I really wanted to get it working on Windows 10 Hyper-V and there wasn't a download for that on the site.

So I found a webpage to convert to VHD file:

https://www.pythian.com/blog/converting-hortonworks-sandbox-run-hyper-v/

Open dos command prompt, navigate to the following directory (change accordingly!): cd C:\Users\jbloom\VirtualBox VMs\Hortonworks Sandbox with HDP 2.4

run the command: "C:\Program Files\Oracle\VirtualBox\VBoxManage" clonehd "Hortonworks Sandbox with HDP 2.4-disk1.vmdk" HDP2.4.vhd --format vhd



From there, found this site: http://www.eightforums.com/virtualization/41195-how-import-vhd-file-into-hyper-v.html

They instruct to create a VM not connected to anything, other than a Network Switch, which you can create.

It created the Virtual Hard Drive:


And a Virtual Machine:



Then Copied the .VHD file into the folder listed above:



Went into Microsoft Hyper-V Virtual Machine newly created, went to hard drive, re-pointed to the .VHD file, and started the VM, then attempted to Connect:


Continuing...


Error, error...


Going into the Network and Sharing Center, we see an existing Switch already created for Oracle, with same IP address:


So let's go ahead and change the Oracle one to a different IP Address (from .1 to .2)...


Then disabled and set the Hyper-V to 192.168.56.1:



Stop and re-start the VM:



 There was still a connectivity issue warning, however, was able to log into the Linux VM using Username & Password:




Decided to create a network bridge between the VM adapter and the Network Adapter:



An IPConfig indicates that the IP Address changed:


Modified the Network adapter checking the box "Enable virtual LAN identification":




Restarted VM, connection successful!!!



Once logged into the VM, 'ifconfig' to get the ip address:


Connect from Laptop (not VM):


So it looks like it's working.  One thing to note, although I originally connected to the VM via Oracle, changed the password there, when it got imported to Hyper-V, it remembered the new password.  Interesting.

And there you have it~!

8/14/2016

After Repetitive Jobs are Automated, are Middlemen Jobs Next?

People say you have to have a skill.  And continually learn.  To prevent automation taking your job.

The say repetitive jobs will be the first to become automated.

I'd say the next is anything to do with being a middle-man/middle-woman.

If your job consists of aligning person A with person B, there's a good chance the computer program can do your job.

When thinking about 'added value', how do you compare amongst your peers?  Because there's probably a computer algorithm that can outperform you and your peers.

In accuracy, timeliness, persistence and with fully documented audit trail.

The benefit of having a realtor for example, is they have access to the MLS (listed houses for sale).  You tell them your parameters, they type it in, display listings, drive you to the house, and if you like one, you sign a contract to buy one.  Then a bunch of stuff happens as the closing approaches, they take probably 3% of the purchase price, as does the seller's realtor.

During the boom, some realtors just planted a sign in the front lawn and had 10 buyers lined up.  I couldn't imaging much actual work performed.  And 3% of a house, multiplied by a lot of houses, I think some realtors made millions during the boom, although I hope they socked some away for the tough times.

Besides having access to the MLS, what added value does a Realtor provide?

Could a Realtor's job be automated?  Could a Recruiters?  Recruiters know where the jobs are and then they scour the land for a match and align the two, for a nice fee.

The list goes on and on.  If you line Person A up with Person B, what are the chances that Middleman role will be performed by an algorithm in the near future?  What are the benefits?  Cost savings, accuracy, faster, 24/7, on and on.

What do you think?  After repetitive jobs are automated, are Middleman (woman) jobs next?

8/12/2016

Some New Learning Opportunities for Microsoft Data Professionals

The world of data is growing.  Microsoft is setting the pace.  New features to applications.  New product offerings.

Here's a link from the Microsoft SQL Server Reporting Services Blog:

Introducing a new Github Repository and PowerShell scripts for Reporting Services

SQL Server is getting some good updates!

Power BI just celebrated it's one year anniversary, along with the continuous release cycle of cool new features!

Happy First Birthday to Power BI!

Celebrate Power BI's birthday with a month of Community events!

Here's a post on two combined technologies for Microsoft Business Intelligence, R and Power BI

R for the masses with Power BI

And the upcoming Microsoft Data Science Summit September 26-27 in Atlanta:

Join us at the Microsoft Data Science Summit

And a new version of SQL Server 2016.

And some cutting edge previews, Microsoft Azure Cognitive Services integrate  apps using artificial intelligence algorithms for vision, speech, language, and knowledge.

The pace of newness is awesome.  And allows us to deliver value with cutting edge technology to provide useful insights.

This train isn't slowing down anytime soon.

8/09/2016

My First Go at Microsoft Parallel Data Warehouse

I've had the opportunity to work on Microsoft Parallel Data Warehouse the past few months.  There are a few differences.  First, we use Visual Studio to connect to Dev, QA and Production, so no SQL Server Management Studio.  And there's a plug in for Team Foundation Server which is nice to check code into TFS.  Getting the versions to line up took a bit of effort.
 
Next, you don't right click and view table options like SSMS.  There's no select top 100, or look at thee scripts to generate Create, Alter, etc.  And when you create tables, the syntax is different.  And the reason is you are actually connecting to a network of interconnected SQL Server nodes on a hardware box.  And the data gets split and resides on specific nodes.  So when you issue a query, it's smart enough to know where the data lives, shuffles data around to get the complete data set.  All in all, it's supposed to be faster.  I haven't figured out how the backups are occurring but that's outside the scope of my assigned tasks.
 
And we pull the data from the PDW into an SQL Server Analysis cube which is partitioned and seems to load okay each day.
 
Overall, it's not too different where you are completely lost, you can sit down and get busy right away, assuming you learn the new syntax and such.  T-SQL doesn't change much, you can create temp tables and real tables on the fly, add statistics and query hints and specify the distribution options for performance increases.
 
I imagine the PDW was a pre-cursor to Hadoop with distributed nodes and parallel processing and polybase queries, as it can read data from traditional SQL Server and Hadoop together, which Azure has now.

Overall it's a solid product from Microsoft and glad to get some good experience.

Get Sh#t Done!