5/01/2015

Power Query Group Dataset Based on Multiple Indexes

I've been working on a Power Query report lately, and trying to match the logic from another report.

The thing is, the original report has a three level hierarchy.  And each of the column names are the same for each level.  So they stacked all 3 hierarchies together into a single data set, and then wrote some M code to figure out the relationships using the Table.Fill command.

I took a different approach in that the three layers were merged into a single row.  So you could have a record with only the top level hierarchy, or one with top parent an child, or top parent, child and grandchild.

So that makes it easier to manage.  Except for the fact that they have Excel calculations based on each level of the hierarchy.  That kind of poses a mind bender challenge, because when you flatten the table, now you have to have 3 of each of the fields, because they can be different depending on the level.

So I did just that, in some instances, I created three fields, like FeatureScope, UserStoryScope and TaskScope and that seemed to work okay in the Power Pivot reports.

But then for example, let's say you have a dollar amount, for $70.  And that $70 is tied to the parent record, and since the parent record gets repeated for every child and grandchild record, you could potentially have that $70 record appear 10 or 20 times so when you go to report on it, it adds $70 10 times which inflates the report values.

But I was on my own on this project and I had to find a solution.  Maybe some BI experts have already figured this problem out, but I had to find something quick.

So I found some code snips, added and let run.  It ran for hours and hours, not sure what it was doing, but I found another example, same thing.  Had to scrap both.

So I decided to add an index in Power Query M, first, I removed the dupes by doing a distinct, then sorted the list, and then applied an index to the list.

Then I created two more index, one starting at base 1 and one starting at base 0.  An then selected the unique index, the index starting with 0, and the fields I wanted to display the value one time per group.  and then created a second table with the index of one.  and then joined the two tables where index 0 = index 1 with a left outer join.

The first row, the outer table had null record, so that indicated it was the start of a new group, so we place the contents of the fields into a new column.

The next row, the index 0 did not equal the index 1, so that too was a new group, so move that content to the new columns.

The next row, the indexes did match so the group had not change, and if the group is the same we don't want to repeat the number multiple times, so we set the new fields to null.

And we go through the entire list, in a few seconds.  After, we take the results, and we join it back to the original table based on the unique id of the table, and we remove the columns we don't need.  And now we have all these new columns, with the data displaying 1 x, instead of having it repeat for every row of the group.


Here's a code snippet:

        ///////////////////////////////////////////////////////////
        // we need to create 2 indexes, one starting a "1", the other starting at "0"
        RemovedDupes = Table.Distinct(ChangedType1),
        SortRows = Table.Sort(RemovedDupes,{{"FeatureId", Order.Ascending}, {"UserStoryId", Order.Ascending}, {"TaskId", Order.Ascending}}),
        AddedIndex1 = Table.AddIndexColumn(SortRows, "TableIndex", 1, 1),
        AddedIndex0 = Table.AddIndexColumn(AddedIndex1, "TableIndex0", 0, 1),

       ///////////////////////////////////////////////////////////
       // we specify the first table with the "1" index which we will join to
       PrepareIndexFieldsLookup= Table.SelectColumns(AddedIndex0, {
            "TableIndex",
            "FeatureId"
        } ),

        ///////////////////////////////////////////////////////////
       // we specify the second table with the "0" index which we will join from
       PrepareIndexFieldsBase= Table.SelectColumns(AddedIndex0, {
            "TableIndex0",
            "FeatureId",
            "AvgTimeQA1",
            "AvgDeliveryTime1",
            "DevCycleTime1",
            "EstimatesToActuals1",
            "TotalEstimatedHours"
        } ),
       
        ///////////////////////////////////////////////////////////
       // we set the columns to join on
        KeyColumn0= "TableIndex0",
        ParentKeyColumn1= "1.TableIndex",
        JoinIndexTables2 = Table.Join(PrepareIndexFieldsBase,KeyColumn0,
            Table.PrefixColumns(PrepareIndexFieldsLookup , Number.ToText(1)),
                ParentKeyColumn1, JoinKind.LeftOuter),
       
        ///////////////////////////////////////////////////////////
        // reason for doing this, these fields should only be on 1 feature table,
        // so group FeatureId, when the group changes, we set the field to the correct value
        // otherwise we keep set it to null
        AddAvgTimeQA = Table.AddColumn(JoinIndexTables2, "AvgTimeQA", each if [1.TableIndex] = null
                then [AvgTimeQA1]
            else if [1.FeatureId] <> [FeatureId]
                then [AvgTimeQA1]
            else null),
        AddAvgDeliveryTime = Table.AddColumn(AddAvgTimeQA , "AvgDeliveryTime", each if [1.TableIndex] = null
                then [AvgDeliveryTime1]
            else if [1.FeatureId] <> [FeatureId]
                then [AvgDeliveryTime1]
            else null),
        AddDevCycleTime = Table.AddColumn(AddAvgDeliveryTime, "DevCycleTime", each if [1.TableIndex] = null
                then [DevCycleTime1]
            else if [1.FeatureId] <> [FeatureId]
                then [DevCycleTime1]
            else null),
        AddEstimatesToActuals = Table.AddColumn(AddDevCycleTime, "EstimatesToActuals", each if [1.TableIndex] = null
                then [EstimatesToActuals1]
            else if [1.FeatureId] <> [FeatureId]
                then [EstimatesToActuals1]
            else null),
        AddTotalEstimatedHoursAmount = Table.AddColumn(AddEstimatesToActuals, "TotalEstimatedHoursAmount",
        each if [1.TableIndex] = null
                then [TotalEstimatedHours]
            else if [1.FeatureId] <> [FeatureId]
                then [TotalEstimatedHours]
            else null),
       
        ///////////////////////////////////////////////////////////
        // select fields from the joined lookup table
        ResultsFromLookupValues= Table.SelectColumns(AddTotalEstimatedHoursAmount, {
            "TableIndex0",
            "AvgTimeQA",
            "AvgDeliveryTime",
            "DevCycleTime",
            "EstimatesToActuals",
            "TotalEstimatedHoursAmount"
        } ),

        ///////////////////////////////////////////////////////////
        // now join the base table (with all fields) to the lookup table (new fields)
        KeyBase= "TableIndex0",
        KeyLookup= "1.TableIndex0",
        JoinIndexTables = Table.Join(AddedIndex0,KeyBase,
            Table.PrefixColumns(ResultsFromLookupValues , Number.ToText(1)),
                KeyLookup, JoinKind.LeftOuter),

        ///////////////////////////////////////////////////////////
        // rename fields
        RenameLookupColumnA = Table.RenameColumns(JoinIndexTables,{"1.AvgTimeQA", "AvgTimeQA"}),
        RenameLookupColumnB = Table.RenameColumns(RenameLookupColumnA ,{"1.AvgDeliveryTime", "AvgDeliveryTime"}),
        RenameLookupColumnC = Table.RenameColumns(RenameLookupColumnB ,{"1.DevCycleTime", "DevCycleTime"}),
        RenameLookupColumnD = Table.RenameColumns(RenameLookupColumnC ,{"1.EstimatesToActuals", "EstimatesToActuals"}),
        RenameLookupColumnE = Table.RenameColumns(RenameLookupColumnD ,{"1.TotalEstimatedHoursAmount", "TotalEstimatedHoursAmt"}),


Well, anyway, that ran in a few seconds, with the correct results, and problem solved.

On to the next problem.

Thanks for reading.

No comments:

Post a Comment

We Interrupt this Broadcast