Hadoop Project

Switching gears slightly, working on a Cloudera Hadoop Big Data project.  Process PDF files using OCR, using Spark on HBase, then index and search using Solr, in Azure.

A nice juicy project.

The world of unstructured data.


Code Free At Scale Microsoft Azure Data Factory

Here's a great video on the latest Microsoft Azure Data Factory (ADF) Data Flows.  To summarize, it's very similar to SSIS On-Prem traditional BI.  And you have the "option" to develop "code-free" or you can write your own code.  As in call existing Notebooks with custom code.

There are existing patterns to flow data from variety of sources, apply business rules along the way, perform joins, fitters, blend with other data source, all out of the box.

Have a watch: https://channel9.msdn.com/Shows/Azure-Friday/Code-free-modern-data-warehouse-using-Azure-Data-Factory

This is available today in public preview (05/14/2019)

Just a quick side note observation, with the drag and drop look and feel, you look at AzureML and they look kind of similar.  What if, they added some of the AzureML features directly into Azure Data Factory, sort of a hybrid of Data Movement and Data Models combined.

You can sort of see the code-free model taking shape, allowing non developers the ability to get stuff done in the data space, both on the ETL and data movement along with adding AI to Power BI.  This will free up developers time to learn the business domain and removes the complexity of being a Data Scientist or Data Engineer.  This should help to evangelize the Data profession by removing the "barrier to entry" and proliferate the world of data.

Thanks for reading~!


From Traditional BI to Cloud BI Models - One Stop Shopping

Azure Data Factory is picking up steam for many reasons.  Our traditional SSIS is still alive and kicking, and can run in ADF to save time and effort in re-writing existing packages.  ADF is our cloud version of Orchestration, and making strides to become a one stop shop for Extract Transform and Load.

Data Flows provide this functionality and sort of duplicates some of the ETL behavior found in traditional SSIS, still in preview.

And Power BI's presence is seen as more companies leverage the power of web based portal to administer and provide reports.  And Power BI adds new features monthly and / ior weekly, keeping everyone on their toes, but also adding some really cool features.  It also has embedded feature to scale licensing by running within other applications.  So Power BI and SSRS are sort of morphing to some degree.

And Azure Analysis Services allows the Tabular Model flavor which runs on DAX instead of MDX.  And you can import Power BI reports directly into the model to scale at enterprise level.

If we look as Microsoft traditional BI which includes SSRS, SSIS and SSAS, there's a subtle shift underway to cloud based.  This makes perfect sense as "Cloud First" is the new mantra.

You can also see this pattern in IDE's as the new web based Visual Studio Online was recently announced not readily available just yet.

So if you think about it, developers will soon be able to work on any machine, anywhere and anytime.  How cool is that.  Not dependent on specific hardware or specific installation packages except as needed in real time.

You can almost see the similarities between the original mainframes that services all IT needs, and the Cloud which allows remote terminals to connect and run IT needs.  Full circle.  Yet loosely connected.  

It also mirrors the Utility company billing model with pay as you go and how much you consume.

So change is definitely in the air.

And so it goes~!

As always, thanks for reading!


Considering a Career in Data

The world of data has morphed considerably, with Big Data, Visualizations, Self Service ETL, NoSQL databases, Graph, Machine Learning, Quantum, Artificial Intelligence as well as Virtual Reality, 3D Printing, Gaming, Internet of Things, BlockChain.

No more static reporting with single query database embedded within the reporting GUI.

It seems data now requires knowledge of this, a bit of that, some of this, and that.

We've had table creation, and views, and stored procedures, mostly within the database GUI.

That's a given now.  Add in more layers.  Connectors to other things.  Security.  Permissions.  Gateways.  Building out solutions in stand alone components, that interact with other components.

The hardware is less concern now, but spinning up correct version of service is now a must have, licencing is also key.

Cloud vs. on-premise advantages and draw backs.

Automation is required as well.  Getting jobs to run at scheduled intervals, alert issues, built in quality assurance.

Another big topic is metadata driven applications.  Re-usable frameworks.

Handling different environments: Dev; Test; Prod.

Data governance.  Master Data Management.  Hippa.  PCI.

A resurgence of low code / no code WYSWYG web based apps built by non IT folks, doing some amazing things.

How about 3rd party vendor products, integrating into applications.

Report writers is no longer specialized enough.  Business Intelligence heading that direction.  More like data solutions, or data integration, or data architect solutions developer programmer lead.

You have to know your specific vendor software, plus the competitor's solutions, compare and contract benefits and downsides, plus the open source versions.

And all of that will change within three months or so.

And you have to know project life cycles and agile methodology sprints stand-up.

And version control software web based and client based.

And you have to know the coding web sites to troubleshoot issues.

And keep abreast of changes across the spectrum.

Including IDEs and development tools, and plug-ins and components, and web version cross platform.

And statistics software like R or Metlab or SSPS.

And deep learning, neural networks, speech recognition, vision, translate languages, predictors, anomaly detection, unsupervised learning.

And coding languages like python which is hot.  Or Java, or c#, or Julia.

And Notebooks.

And Hive, Pig, Scoop, Flume, Oozie, Ranger, Knox.

And APIs.

And protocols.

And compression.

And transfer of data rates across the network.

And ticketing software to track bugs and work.

And releasing software into production using change management frameworks.

And read blogs, magazines, journals, white papers, books, articles, documentation.

And networking.

And study for certifications.

And proper email etiquette.

And interviewing skills, both as candidate and how to interview prospects.

And writing resume's.

And presentations skills.  Power Points.

And documentation skills.  Drawing diagrams.

And learning domain knowledge across spectrum of industries.

And estimating projects.

And chat software.

And time tracking software.

And doing webinars.

So.  If you are tempted to enter the ever growing world of data, this list should give a brief indication of things to consider.  It's a long journey.  It never ends.  You really have to enjoy what you're doing.  And if you produce quality over time, there's a good change your career will flourish, and bloom.

Thanks for reading~!

Troubleshooting Ninja


Early Adopter to Technology, before IT was Mainstream

I remember the first time hearing about a new computer operating system.  My father who worked for IBM, mentioned something about having multiple windows open same time, can flip back and forth.  Seemed a bit odd, after working with PC-DOS since 1983.  They called it OS/2.

I also remember attending COMDEX, a vendor driven technology conference, my father had an extra pass, working for IBM in Atlanta.  The conference was exceptionally large, with vendors lined up from end to end of a conference building.  The one thing that stood out, because it was so wild and new at the time, was a version of Paint Shop.

Hundreds of people gathered around, while a guy with headphones and mic spoke about this cool new product.  You could see on the monitors, he had a picture, spliced off part and copied another, saved.  For those growing up on DOS, having such graphics was cool in itself, but the ability to lop off pieces of a picture and later, change colors, etc. was amazing.

Now a days you can do that one handed with a smart phone.  Back then, it was revolutionary.

It was around that time, I signed up for a computer class in Atlanta.  The class taught three products, Paradox (database), Quatro Pro (spreadsheet) and Word Perfect (document writer).  The instructor worked at Equifax during the day and taught at night.  That was the first time I learned of Copy-Paste.

I was living in a cheap apartment with a roommate and worked for a finance company.  We played tennis after work a few times per week in downtown Buckhead, at one of those mansions with a tennis court in the backyard.  One of the other players was a software developer and he would talk about the latest trends going on at the time.

I was moving towards working in IT, as software was not common career back then and since I'd worked with computers since age 12 or 13 with DOS, Basic, etc. it seemed like the way to go.  Except they had a downsizing at work, and I was offered a position back in Florida, so I moved.  I remember hearing on the radio during the drive down, it was the day Windows 95 was released officially.

After attending a c++ class at the community college, I was working in the IT department of a major bank, writing Crystal Reports, programming Visual Basic and Oracle.  We had a hundred people working on a project, the reports were always last to be discussed during our meetings.

Suffice to say, getting into IT was not a linear path.  Writing code was considered for geeks when I started out.  Now, computers are part of every job regardless.  Being that my father wrote code pre-assembly, on punch cards at IBM, its only seems like a natural fit to have a career in IT.  Just solving the same problems with newer technology.

And so it goes~!


Attach Value / Pair Data Attributes for Supervised Learning

Machine learning works best with data that is labeled called Supervised Learning.  The source data set has information describing the content of the data.

For example, you have a series of pictures.  Some of the pictures have cats.  Those pictures are identified through labels to train the model.

You can also train the model without labels, using Unsupervised Learning.  The data is not labeled ahead of time, takes longer for the program to learn the content of the data.

What if all data were labeled.

We have relational databases that store relational data.  Tables with columns with field types.  No descriptors to inform the viewer of that data, much of anything, unless the table or column has meaningful name.

What if every database table had a way to store attributes within the content of the data.  One way, add a list of value / pair descriptors attached to the table, and expose through Metadata.

For example, table called "InvoiceLineItems".  What are some meaningful descriptors in value / pair format.

Company | Fred's Deli and Tire Rotation
Creation Date | 01/01/2014
Table Description | Table contains data for invoice line items
Table Granularity | Table is at invoice Line Item grain level
Table Primary Key | InvoiceLineItemID
Table Primary Key Field Type | integer
Table Field Count | 22 fields
Foreign Key 1 Table Name | Invoice
Foreign Key 1 Name | InvoiceID
Foreign Key 1 Field Type | int
etc. | etc.

Data gets added to table, eventually extracted and sent to reporting tool.  Reporting tool can interpret the value / pair data, as its in common format, perhaps XML or JSON or what have you, and dynamically joins InvoiceLineIItem to Invoice table which joins to Customer and Address and Product on the fly based on interpretation of primary keys info contained in value / pair info, no need to guess as the foreign keys remove the guess work.

Basically you have a relational database that stores descriptive attributes about the data, self contained in value / pair data, built directly into the database, combining traditional database and NoSQL.

The data would be stored at Table level, yet could be queried within a traditional SQL Statement:

Select InvoiceLineItemID, ItemNumber, ItemAmount, ValuePair.TableDescription from dbo.InvoiceLineItem

In this example, the SQL statement pulls data directly from Value / Pair into the returned data set.

And when data gets ported to other systems and exposed for reporting, the reporting tool would interpret the Value / Pair data as the Metadata gets sent along also.  The reporting tool would know what attributes are available by reading the Metadata from Value / Pair without user intervention.

Take this a step further, with the machine learning example earlier, if all data have value / pair attributes stored at time of creation, most machine learning would become Supervised.  So more accurate models, trained faster with higher degree of probability.

Take a step further, you could have jobs that scraped the internet, have the application read in the data files on the fly, process, train model, interpret and then some action.  Imagine all sporting events open their data sets online, with rows of data, with metadata and value / pair attributes in real time.  Jobs would continuously scan for updates 24 / 7, without human intervention, and update accordingly.

Now add weather data, or stock information, or blockchain data, or commodity prices or exchange rates, or crime data or traffic data, the list is endless.

Tag the data during creation, append to value / pair data attached to database, with Metatdata layer for reporting tools to identify the data set, similar to WSDL, and have machines read and process the data unassisted.

That would speed up the artificial intelligence genre x times.  And maybe remove some of the human intervention to prep the data, and give us a lot more meaningful data.

What do you think?  Is this possible?


Where Do We Go From Here

What attributes allowed mankind to rise to the top of the food chain?

From Jungle dwellers to hunter gatherers in the plains, to farmers to city dwellers to emperors of the remote controlled 60 in plasma television with surround sound.

Maybe it's our primitive brains that assist with instincts, to protect in fight verses flight.  Maybe there's a snake under those leaves, better steer clear of that path.

Maybe it's our ability to align in groups.  Hey, that tiger looks mighty hungry, what do you say we team up and live to see another day.

Maybe it's our opposable thumbs.  We can build tools out of rocks in the shape of points to protect the tribe and find food.

Maybe it's our bi-pedal upright frames, let's see who can outrun that lion, don't have to be the fastest, just not the slowest.

Perhaps one or all of the above will suffice.

Or maybe its something else.  Ability to reason.  

Our brains are designed sort of like databases.  We can retain events and images for long duration, with fairly good memory recall.  So we are good at classifying objects on the fly in real time to determine potential threats.

Our brains can perform logical reasoning quickly, that object over there, where have I seen that, oh, it's a lion, if I remember correctly, they eat humans, that lion looks hungry, maybe I should climb up this tree until he moves along.  Observation + Memory Recall = Action

We can perform actions based on observation of our 5+ senses, scan our internal brain database to past memories, were they favorable or not favorable, and determine correct course of action.

One could logically deduce that as a probable answer to our question.  It's gotten us pretty far in life, scientific inventions, fine art works, music scores, and cinema, universities, placed a person on the moon.

So how do we progress upwards from here.

Logic is a great asset, yet not practical in daily living.  How so?  Here's an example.  Over eating, lack of physical conditioning, insufficient sleep, over drinking or drug use, gambling, and a variety of other items used in moderation, can be harmful if used in excess.  That's sort of common knowledge.  But then again, not everyone complies to the same degree, with varying effects.

So having logic is great, applying logic, not so much.

What about bias.  Bias is when people use personal preferences to determine outcome, sometimes the outcomes are not a true match, given a sample of 100 people, would they get the same outcome, maybe some outliers, but not percentage wise.

Logic is not applied at all times, personal bias gets thrown into the mix.  How then, are we supposed to create artificial intelligence to mimic or surpass human intelligence.  You'd sort of have to add routines or functions to alter the determined outcome by some weighted variable, and then toss a coin.

Humans have climbed to the highest spot on the hierachtical ladder on the planet, is that the final resting place, or can we climb higher.  Can we leverage computers to the next level, and if so, how would that be accomplished.  Will Quantum Computing give us the bandwidth to determine millions of potential outcomes in milliseconds in real life scenarios with real life ramifications for actions.  How would it handle the limitless exceptions.

Let's suppose we were able to do that.  What would be the next level of attainment.  How would existing society be transformed.  What inventions would be created to assist mankind to a better life.

Have we plateau'd as a civilization, condemned to watching re-runs of Gilligan's Island.  Or do we march forward, to the next rung of advanced society.

Where do we go from here.

Spices and Technology Transforming the World for 500 Years

In 1492 Columbus sailed the ocean blue.  Story has it, he was in search of faster route to the Orient to obtain spices.  I'm not sure if he's aware of this, but they have almost half an aisle dedicated to spices on aisle 7 at the supermarket up the street.  Spices are one thing not difficult to find for our generation.

What's our version of the "fastest route to spices".  One could argue that technology is our motor to innovation.  Technology can perform amazing things, crunch numbers, store data, provide instant access to communication, vehicle for online sales, distribution, delivery, you name it.  All those zeros and ones.

Finding a new route to the orient was a complete failure if you're keeping score at home.  Columbus did not find a shorter route.  What he found was a round world.  He altered our view of reality and we never went back.  A true paradigm change shift.

Same with technology.  It would be very difficult to go back to the old way of thinking before smart phones, before internet, before home computers, back to a time with covered wagons, steam engines, muskets and horses.  We have crossed the line in the sand with no intent of return.

Technology has slipped into every crack of existence.  From farming tools, to self driving cars, to purchasing goods and services from hand held phones to doctor visits online.  We have reduced the latency through instant communication with ability to broadcast across the globe with relative ease, basically without the slightest idea of the inner working of network packets and all the other low level details.

There are several technologies still in their infancy.  Virtual Reality, Quantum Computing, BlockChain, and Artificial Intelligence.  There's been some initial progress with sporadic increases and its just a matter of time before we see some giant leaps for mankind.  

These technologies will spawn new markets, unforeseen at current time, creating new vendors and new market shares and new players.  Anyone with the knowledge and understanding to work with these technologies can contribute and perhaps be the inventor of the next big thing.  It's a wide open field.  Anyone alive today or perhaps not born yet, could be our version of the next Columbus.  

Could that person be you?

And so it goes~!


Happy Everything!

Merry Christmas & Happy Hanukah & Happy New Years!!!

From the Blooms!