Guerrilla Analytics : A Practical Approach to Working with Data

Guerrilla Analytics : A Practical Approach to Working with Data

Enda Ridge

📅 Finished on: 2021-02-22

💻 IT
⭐⭐

Keep your working folder organized in a simple, predictable and intuitive way

Quite boring, unfortunately. I found it hard to finish and many chapters dragged. Ridge describes a consulting environment with dynamic projects, chaos and tight deadlines, but in a very vague way, with advice that is useful yet could have been written in ten pages. The rest is the usual “keep folders tidy, use version control, communicate with the team, do not modify the raw data”… fairly obvious. I did not find much, apart from some awareness around naming and documentation.

Notes

  1. Space is cheap, confusion is expensive. Keep previous versions archived, keep logs, do not delete anything because it might be needed
  2. Prefer simple, visual project structures over heavily documented and project-specific rules. Simple is better, nothing to add
  3. Prefer automation with program code over manual graphical methods. Command line and scripts are much more customizable and changeable than tools like PowerBI, prioritize them
  4. Maintain a link between data on the file system, data in the analytics environment and data in work products. You need to be able to trace every output and see the flow, so you can reconstruct it if needed.
  5. Version control changes to data and program code. Absolutely true, track everything
  6. Consolidate team knowledge in version-controlled builds. He talks a lot about the concept of a build, which is a byproduct that supports the work but is not the final deliverable. Think of them as the utils
  7. Prefer analytics code that runs from start to finish. Quite logical: run everything end to end without interruptions, which waste time. As you can see, it is basic stuff, but I learned to pay more attention to documentation and to keep a standard for others (or future me who might forget).