A Basic Lesson In Analytics

Iterative, Complete, Creative, and Recursive

Decision-First AI
Published in
7 min readDec 10, 2017

--

This article is a step-by-step tutorial in analytics. It is basic but complete. All you need to do this yourself is Notepad++ and Excel. The former is a free download available here. Without further ado let’s get started.

Step 1. Forming A Solid Question

Every analysis begins with a question. Hopefully a good one. Good analytic questions start with how. Given the other constraints of this article, we are going to cheat a bit. Our question:

“How can we use this article to best teach a basic and complete lesson in analytics?”

Step 2. Collect Data

As noted, this lesson is recursive. We are going to use the article itself as our source of data. That’s right — this article is the data. All you need to do is highlight the text and copy (ctrl-c) it. Step complete.

Step 3. Inspect & Clean The Data

This is where Notepad++ comes in. Open a new instance and paste the copied text inside. It will look like this.

A quick inspection reveals that this doesn’t look a lot like data. We are going to need to clean it up a bit. Our goal is reduce this string of language into a data table of individual words. To do this we need to remove (or replace) all the special characters. We will start with carriage returns. Notepad++ stores them as an invisible character at the end of every line.

First you will need to highlight that special character. Then using the Search menu, replace it with a single space. The result will create a single row of text as shown on the bottom of the image on the right.

Next we will likewise replace every punctuation mark and special character we can find. Finally, we will find two spaces and replace with just one. This should give us a very clean and simple string. Our data is not yet structured but it is clean. Mostly.

Take the results and paste them into the A1 cell of a new Excel spreadsheet. The result should look like the image below.

Step 4. Structuring The Data

This still doesn’t quite look like a data table. So let’s remedy that. First, we are going to use the Text to Columns button above to divide this string of words into individual words in unique cells.

Use the delimited option in the wizard menu and select space. The resulting output will fill each cell of the first row with the next word in the article. Highlight the entire row and copy it. Create a new sheet.

Click on the A2 cell and select paste special. Chose the paste special option at the bottom of the menu and then click on the transpose option.

The results are beginning to look like a real data table.

To complete our transformation, lets give the A column a header. Type Word into the A1 cell. In the B1 cell type position. In the B2 cell put a 1. In the B3 put a 2. Now select both cells and double click to copy the pattern down the B column.

We now have an actual data table with word and position. Congratulations, you have collected, cleaned, and structured data. There is just one more fix that we should make to start generating a little insight. Notice that our words are in both upper and lower case. We will want to add a new column ( C ), which we will label with Cleaned (Cell C1) and fill with a formula. In cell C2, type “=upper(A2)” then double on the bottom right portion of the box to propagate the formula down the row. The steps and result are detailed below.

Step 5. Use Excel’s Business Intelligence Tool To Start Analyzing

Pivot tables are one of the most basic analytic tools available, second only to the spreadsheet. In this step, we will create a pivot table and use it to do one of the most basic analytic functions — aggregation.

To create a pivot table, select all three columns an click on the Insert menu. The pivot table option is located in the first spot on the tool bar.

Click on PivotTable and then click OK. You need not worry about all the optional parts of the wizard right now. Excel will create a new tab and open the PivotTable Fields window. In it, you should see all three fields listed — Words, Position, and Cleaned. If you don’t, you likely didn’t highlight all three columns. Try again.

Drag and drop Cleaned to the Rows area in the bottom left. Then drag words to the bottom right values area. It should default to Count of Word. If it shows a different function like Average or Sum, use the drop down arrow on the right to change it to Count. Our table is ready! Or perhaps not…

You may notice a few special characters have stuck through. This happens. Now worries. You can easily find them and delete them on Sheet2. Then simply refresh your pivottable by right clicking anywhere inside it and selecting the refresh option (third one from the top).

Now we want to sort or pivottable. Click on the down arrow next to Row Labels and choose More Sort Options. From there select Descending and select Count of Words in the dropdown. This will sort our table from the words that appear most to the words that appear least.

If we had not created a Cleaned field, each variation of lower case and capitalized words would be grouped (or aggregated) separately. Using a Cleaned and Word field we actually protect that amount of intelligence we can garner from this data set.

Step 5. Graphing

Visualization is critical to analysis. This article couldn’t claim to be complete without build at least one graph. We will actually build two.

Select the pivottable and then the insert menu above. About two thirds of the way across the tool bar you will see a bar chart icon. Select it.

We now see a visual distribution of the words in our data set (article). Unfortunately, it is not very insightful. There is too much data to be useful. Time to filter. Let’s see what the 20 most common words to this article are.

Under the Row Labels drop down is an option called Value Filters, which we can set to 20. It is the maximum number of bars that are clearly meaningful to most people. You personally may feel like you can handle more than 20, but trust me, few clients or colleagues will agree.

The resulting table is much more insightful, if we really just wanted to understand the most common terms used in this article. Our objective however, was to teach a basic lesson in analysis. That, I believe we have accomplished. But we can certainly do more… next article.

For now, recognize that following this article you successfully collected, staged, structured, clean, organized, aggregated, graphed, and interpreted data. Not bad for less than 1200 words and roughly seven minutes. Thanks for reading and stay tuned for part II!

--

--

Decision-First AI

FKA Corsair's Publishing - Articles that engage, educate, and entertain through analogies, analytics, and … occasionally, pirates!