Brief summary
Things have come a long way since Kent Beck first wrote about test-driven development 20+ years ago: the languages we use, our deployment environments and the rise of low-code tools. Former Thoughtworker Saleem Siddiqui has just written a new book on TDD and joins our podcast team to discuss why this book —Ìýand subject —Ìýis more pertinent than ever.
Podcast transcript
Ìý
Rebecca Parsons: Hello, everybody. My name is Rebecca Parsons, the Chief Technology Officer for ºÚÁÏÃÅ. I'd like to welcome you to another edition of the ºÚÁÏÃÅ Technology Podcast. I'm one of our regular cohosts, and I am joined by another one of my regular cohosts, Neal Ford.
Ìý
Neal Ford: Thank you, Rebecca. Welcome, everyone. Today we are chatting with a ºÚÁÏÃÅ alum that I actually had the honor of being in the interview process for. Actually, I had known Saleem, Saleem Siddiqui is our guest today. I'd known Saleem because he and I both wrote a book about an overlapping subject way, way, way many, many years ago. A book about this IDE called JBuilder in the Java space. When I saw his name for the interviewer, it's like, "That name seems familiar." Then it's like, "Wait a minute, I know this guy. I need to recuse myself."
I became part of the friends of the hiree rather than one of the objective evaluators, but we're joined today by Saleem who's not with ºÚÁÏÃÅ anymore, but was with ºÚÁÏÃÅ for a number of years. He's written a book that we're going to talk about. Welcome, Saleem.
Ìý
Saleem Siddiqui: Thank you. Thanks for those kind words, Neal and Rebecca.
Ìý
Rebecca: The title of your book, Saleem, is . I guess my first reaction would be, "Really? In 2021, you're writing a book on learning test-driven development?" Discuss.
Ìý
Saleem: [laughs] Great, well. A great question to lead off with, Rebecca. Let's start with a tough one. Yes, it is 2021, late 2021 at that. A book with a title of Learning Test-Driven Development might seem at least a decade, if not two decades, too late in the coming. Yes, why now? Why a book with this title right now? There's a couple of different ways to approach this question. Somewhat facetiously, I guess I would concede to the argument that yes, it is a couple of decades too late. TDD: By Example, by Kent Beck was published in 2002. Coincidentally, my last book was also published that year.
I guess you could say, personally, I was due for one after a couple of decades of hiatus. More seriously, the reason I wanted to revisit the topic of test-driven development especially from a learning, from first principles perspective, was things have come a long way since of course, Kent Beck, with all the deference to him and his self-deprecating nature, he calls himself the rediscoverer of TDD. Of course, we owe a lot, and certainly, I owe a lot of gratitude to him for teaching me, but things have moved on. Languages have come a long way.
Just on that, the three languages that I use in this book, they did not exist. They did not exist for a good decade after Kent Beck's book. That was one motivation. The second was in the last decade, at least, there's a couple of revolutions that have happened in the way we write software, mostly because cloud is now ubiquitous. Everything is on the cloud including now code itself is written on the cloud with low-code and no-code platforms.
There is a tendency to not just developers who have learned programming in the last decade or so, but even people who have had longer careers, but have reinvented or reinvigorated their flavor for writing code in the last decade. There's a tendency for them to either under emphasize or completely ignore, often with prejudice, the practice of test-driven development. Those are my motivators. Of course, personally, having been involved in a few projects where I saw either a dearth of practice, practical knowledge or equally often enough, a reluctance to practice even when the developers knew how to do test-driven development.
A reluctance to practice it in production code. Those are some of the motivators that I realized that somebody had to step up and take the flack and write a book titled Learning Test-Driven Development and I figured, "Why not me?"
Ìý
Rebecca: I noticed too, and in fact, you mentioned it. This isn't test-driven development in Java or Scala or anything like that. You actually chose three different languages. Tell me about those language choices.
Ìý
Saleem: I mentioned that in the preface of the book as well. This might be a subtle plug for people to research the answer in-depth there, but in brief, there's a tendency that I've heard, building back on what I just said, especially because at least that's my analysis because Kent Beck not only rediscovered test-driven development, but he also created JUnit. His book was very much centered around JUnit and Java. There's a tendency that I've heard that, "Oh, TDD works only in that language or TDD does not work on my language stack whether that is OO or functional or frankly, anything other than the JVM, there's a tendency for developers to say, "Well, that's great it works in an academic setting and in that language, but not what I'm doing."
I went to some lengths to figure out also based on personal experiences I've mentioned, my last several projects were in one or the other of these languages. Of course, most projects are polyglot these days. The primary language for my last several projects were one of these three. Also, because I thought that despite their similarities the philosophies of these three languages Go, JavaScript, and Python were different enough. Certainly, their ages are different enough. Go is by far the youngest language of the three and Python and JavaScript are much older that the developer community, the software they write using these languages and of course, the nature of the languages, the features they provide are different enough that it would hopefully cast a wide enough net to catch as many developers out there to at least give this practice, this habit of writing tests first, driving your code to a test, giving it a shot. That was my motivation for essentially targeting three languages at once.
Ìý
Rebecca: You alluded to this, but I've had people say to me point blank TDD is irrelevant unless it's in an object-oriented language. You've indicated you disagree with that but why? How would you respond to that? That question, does it have to do with how you define test-driven development or what's your justification for that? I know what mine was, but I'm interested in what yours was.
Ìý
Saleem: I would love to hear yours, too, Rebecca. First of all, I want to be empathic to that, I guess, charge that "Hey, TDD does not work outside of an old context? Primarily because I would emphasize because JUnit, the first framework in the modern languages was in Java and Java is very much an object oriented language or since then, since Kent Beck invented JUnit, it has had a very rich tradition of functional programming features, but that came later.
JUnit itself by its very nature, you subclass a class, the test class writes your test. Then your test call or depend upon the system under test, which almost always is a class that you will create after you've written your test. I can empathize with that charge that "Hey, if I'm not working in a framework where I have a test class that is testing a class under test, which has public functions or methods, then what am I going to do with this?"
A fair point. However, going back to the basics, test-driven development and in a funny way, the name is just a tad bit unfortunate in so far as it has the word testing in it. If you look at the genesis of this and I go through that in my book, mostly in chapter 14, but also at the beginning, it's really not about testing. It's about improving the design of your code and improving your confidence in the code. Now, of course those statements are true regardless of the nature of your language, you could be writing SQL, which is neither object oriented nor testament. It's a query language and you could practically do test driven development. I think because the emphasis is on language and tooling, which historically have emanated from object oriented languages, and because of the original frameworks being in object-oriented languages, there is a tendency to associate that wrongly, but understandably with only object oriented languages.
My counter to that would be give it a shot, and approach it from a perspective as I started right off in from chapter one, the red-green-refactor triad. It is really having a failing test writing just enough code to make that test fast and then cleaning up refactoring. That triad makes sense in all languages, functional or not, or OO or not. That necessity for having better design and faster feedback, those things make sense in a variety of languages. That's how I would explain or conjure the argument. What's your reasoning, Rebecca? I'm curious to hear your side of the argument.
Ìý
Rebecca: It's actually quite similar to what you said at the end. I think what you said about the JUnit implementation and this is where it came from. I don't find that as compelling a defense. I guess where my empathy for the position comes from is as you mentioned the red-green-refactor loop, it has refactor, and so much of the literature around what are sensible refactorings. Up until Martin wrote the last edition of the refactoring book, most all of those were all about objects. How do I refactor if I am a function? That's where my empathy for that question lies. I go back to the same thing that you did, test-driven development. The test is the mechanism or the implementation to drive you to think differently about design and to drive you to focus your attention where it needs to be focused at the time.
That is true regardless of what language you're writing in. Although we hear this all the time, all the mainframe programmers "Oh, I've been doing this for 40 years," but there's a lot in terms of that red-green-refactor loop that I've done in the context of C and no one would ever imagine that pure C is in fact object oriented although you can write objects, and then you can write anything in it, actually.
The power of test-driven development is really on that focus of design and where I think we can send a broader message. One of the things that we've discovered, and Neal and I talk about this in the context of evolutionary architecture. If you think about the testability of something, very often you come up with a nice boundary around the something. I don't care if it's a module, or a function, or a component, or a whatever, if you can't clearly articulate what that thing does in the name of a test, then you probably don't understand what that thing does and you need to think about it more.
That's where I actually like the name test-driven development because it is focusing on the part of the value of the test. If you can't figure out how to test it, if you can't clearly name your test, then you probably don't know what you're testing.
Ìý
Saleem: That's such a great point, Rebecca. I want to pick up on that a little bit. People ask, "How is it? What's the magic? Why would test-driven development create a simpler design?" My answer to that is, after I say or somewhere around, I say, "Try it and see what you get." When people are curious or skeptical, I say, "Oh, look, how would you end up with a really complex piece of code with either high cyclomatic complexity or low readability? If you have such a piece of production code, it is inevitable that you would have an equally complicated, perhaps more so complicated test."
Then I ask, "Why on earth would you write that test first in the first place?" If you're starting with the test first, the test's complexity, out of necessity, would have to match the production code's complexity. If you don't know how to write the production code, if you are either ambiguous in your understanding or you have a good understanding, but you want to write oodles of production code, you'll have to write oodles of test code beforehand.
Therefore, that becomes the forcing function that you try to pace yourself or divide and conquer and you write decomposable smaller tests that drive your code and therefore reduce the cyclomatic complexity or whatever metric you're choosing on the production code side. That forcing function of having to write tests first, that is the secret sauce. That's how you will end up with simpler, better organized, better-factored code on the production side because nobody in their right mind would write a 100-line test method before they ran it or they write one line of production code. Nobody would do that. Self-discipline in all developers that I know would prevent them from doing that. That's the magic.
To your point about doing test-driven development for the purpose of improving production code, I quote Kent Beck's code which he made more than a dozen years ago in response to a question on Stack Overflow.
It's a powerful quote so I quote it in full in the book. He says, and people would be shocked, or would have been shocked unless they would have read his comment. "I don't get paid to write tests, I get paid to write production code." Then he correlates it by saying, "I write enough tests to give me the confidence that I need in my code." That puts the emphasis back squarely on quality of production code, the confidence you have, and again, tests being a vehicle towards that. They're not an end unto themselves. Nobody who does even TDD very diligently or perhaps even religiously would claim the tests are an end, they're not. They're means to the end. It just so happens they're very good means to the end especially if you write them first.
Ìý
Neal: You're very much preaching to the choir here because I did a talk many years ago called test-driven design, which is about the beneficial side effects that TDD has on design because one of your goals is to create small cohesive methods. Well, that's what testing does if you're trying to create something that passes the test. I want to go back to the polyglot nature of your book, and we brought up functional programming languages. I'm curious about why you didn't include a functional programming language in the mix. Of course, all of these are hybrids now, which all languages are. Why not something that leans more toward a functional program?
Ìý
Saleem: Yes, you're right that the languages are a hybrid. Python and JavaScript for sure have a very rich tradition of functional programming. Go, less so partly because it's younger, and Go is not fully object-oriented in the same way that for example, Java is anyways. It was not a hybrid in that sense though it has structs and notions of objects, but not inheritance in the way that your traditional languages have.
To your question about why not functional languages? Practical constraint, there's only so many languages you can tackle. Three is almost breaking some kind of barrier, I would imagine. Also, I didn't want to keep the solution singular as in the same solution is developed in three different languages. Now, could there have been a functional Pythonic or a functional JavaScript solution to the same problem? Probably. I also don't want to tax my readers too much. I go through this in chapter zero, how to read this book.
I realize even, especially because there is a mono repo for all these three languages. The languages, of course, are syntactically quite different even though the solution that is developed is to the same problem. I did not want to tax my readers so much that there was a functional solution to the same problem, or heaven forbid, a completely different solution in F# or whatever, some functional programming language. Then, of course, at that point, the accusation that I've heard of, and I have some sympathy to that, and "Hey, you've just glommed three books into one," that would have stuck a lot harder if I would have glommed on a functional solution to a different problem and tagged it on in alternate chapters, or every third chapter then people are saying, "Hey, he's just trying to sell three books in one."
Pragmatic reasons were the ones in there. However, I do address that point. In brief, I would say in the preface that, again, the point is not while the book is about practically writing code, and there's code repository to go with that, and every chapter is about code. The point in a very elevated way, I guess, if you step back from the book a bit, is not about the syntactical similarities or differences or the fact that these are OO or not, the point is not that. The point is to exhort developers to approach TDD with the discipline and passion that I believe Kent Beck did when he, in the preface to his book, or somewhere, he says that to the person whose book I read as a 20-year-old who said, "Punch your punch cards until your input tape, or output tape matches the input tape. Thank you, thank you, thank you."
Of course, what he's doing is thanking the person who put him online to writing JUnit and developing TDD as a discipline in the '90s, somebody who did some work perhaps back in the '50s. I hope without sounding too big headed that I can be that kind of inspiration for a new group of developers that regardless of the language whether you're doing functional programming, or as Rebecca said, even programming in a language that is neither functional or object oriented, you adhere to TDD as a discipline, as a practice, as a tool, as a means to an end.
That's what I'm going for but you're right. Functional languages are different enough, perhaps the next book in the series, somebody if they want to co-write with me, I'll be happy to, reach out to me, I'll give you a listen.
Ìý
Neal: I'm sympathetic, having written a book that's featured multiple languages. I think you probably stopped at the right number because the more you write about the more you have to support, but there's language syntax, but there's also common practice and usage within programming communities. How do you contrast so one of the common things in the functional world is REPL driven development rather than test-driven development? That would be one of the interesting things to address, and a lot of JavaScript developers now, because we have such powerful debugger IDEs, rather than write tests, they just go in and fiddle around in the REPL until they get the code right and paste it in there and go, what's the difference between that versus TDD?
Ìý
Saleem: That's a great call out, Neal, and I would say so first of all, I do address REPLs-- I believe it's Appendix B is where I mentioned a lot of REPLs for all three languages. Obviously, not a complete alternate to having an IDE or a development environment on your laptop because certainly online REPLs can be sluggish, depends upon your connection. If you're offline you can use them, but certainly, some kind of a REPL especially if you're using it on your workstation can be seen as an alternate, especially if it's in a browser of something like web application. Perhaps JavaScript and typescript developers do that perhaps normally, modify the code in their browser and see its effects. I've certainly done that myself. The distinction there is-- First of all, the commonality, let's talk about that. Fast feedback. If you're going for fast feedback, you're very much in the same spirit, or at least you're accomplishing one of the goals of test driven development. You're getting fast feedback, so there's nothing wrong with that.
The thing where it starts to defer is, and I address that in the CI chapter on continuous integration and in the last chapter, software is volatile. It's one of those funny things, I draw the imperfect analogy with buildings where test can be seen as scaffolding and the building is of course, the useful software, the production software. The difference is you take the scaffolding off the building once you're done. Can't do that with software, because software while is being inhabited and used is also being modified, usually not true for concrete and brick and mortar artifacts in buildings. You keep the tests around and of course, you throw automation on top of that. This is where you reap all the benefits of continuous integration, continuous delivery and deployment. If you don't have those automated tests, if all you've done is REPL magic, then what do you automate? That would be my exhortation or my words of encouragement that it's fine to use REPLs to get that fast feedback, but once you are there, you have achieved the confidence, how are you automating that?
Now, of course, automating some magic you did in REPL is probably going to be a lot trickier than writing a unit test, or as I would say, if you are doing that quick turnaround thing, quick and dirty in a REPL to get to the result, then turn around and write a test for it, because that almost to use another agile term is almost like a spike. You were unsure, you were learning something, now that you know it, now go back to test driven development proper, write a unit test, make it pass using the knowledge you've gained from your REPL. There's synergy between those two approaches. I don't necessarily see them as completely contrasting. I see them supporting each other.
Ìý
Neal: That's what I've always recommended in functional programming languages. It's perfectly okay to use a REPL, but that last step, a lot of teams skip because like, "Oh, I got my code right. I can move on," but it's the longevity that you have the regression ability because you're right there. You've got all the code there, you've got it. Just go ahead and copy and paste that into a test and now you can regression that decision for a long period of time. REPLs are a good example of something that wasn't around when Ken Beck first started talking about this and wrote its book.
What has changed? You've been doing TDD for a while, so what has changed in your approach given the new tools and frameworks? We've got incredible capabilities now compared to the very early days of JUnit. What tools and/or approaches have changed over time?
Ìý
Saleem: As I said, the movement of things to the cloud has certainly been, so not only REPLs obviously as a tool didn't exist, but we weren't developing on the cloud 20 years ago. Didn't even exist in that sense. That has changed, so my reaction to that and of course, it's not just that we have taken compute and store and our asychronous stuff with the cloud. People are writing code without having a development environment, frankly. All they're doing is opening up a browser and writing code in it, which is run somewhere else. You can write your AWS Lambda without really downloading any compiler because you can literally write it inside of a browser.
A lot of developers do that, and of course, the low code and no code platforms, general ºÚÁÏÃÅ Radar has had a couple of blips about to rank a nuanced position of what's the right way, what's the right scenario, or use case where those techniques are applicable.
Those are the things that have also cropped up since I guess, the first wave or that wave 20 years ago of test driven development. My reaction to these things have been, look, if you're doing low code-- which many of my colleagues, certainly junior developers do, if you're using these online cloud-based things to write code, not just to deploy, but just to write code. You may or may not have- likely do not have the fast feedback ironically, 20 years later that I had 20 years ago. What if you have a slow internet connection? What if you have an outage, does happen, right? Probably too soon to mention, but there was an outage of a large social media website a couple of weeks ago. These outages happened, there's been outages at AWS as well, so what are you going to do?
Also, if the only way you are going to test code is by deploying it for some environment, even if it's a test environment and then test. Well, that's by definition not fast feedback because you're probably written a lot more code even if it's in the context of one Lambda than you probably wanted to. What I have adopted and try to encourage others to do is to adopt TDD in what Kent Beck calls the inner loop of writing code. Coding is a lot of things, you talk to people, you spike, you do those things, but when you sit down and you want to achieve that flow in terms of having really quick feedback cycles, you want to have those feedback cycles in the order of seconds, literally as fast as your keyboard, you can type keys on your keyboard.
If anything interferes you tend to go out of your state of flow. How are you going to speed up or keep an optimal speed on that inner loop of programming? I have found that let's say you're writing something that's going to live as an AWS Lambda, and it's going to pull something from a S3 bucket and push it somewhere, you don't want to develop in that framework because almost certainly not going to give you that few seconds of feedback every time. What do you want to do? You want to make sure that the code you're writing that is doing the moving of the bits and bytes, that you can run in TDD locally. Has many ancillary side benefits, you don't really need an internet connection, you can work on the train, [unintelligible] use case. That's been a major motivator of mine, which obviously, that problem didn't exist 20 years ago so the need wasn't there, is to learn in that context where everybody wants to do everything on the cloud including writing their code, is to both teach myself and teach others that that may not be the optimal way to write code because you forego the fast feedback and of course, deploying code and then testing it sounds a little bit backwards.
Learning and teaching people those habits has been of change and of course, languages and tools have, I guess, exploded. That's the other thing that we didn't have. There's so many frameworks probably nowhere more so than in the no-js landscape where it seems like you have a framework every week, but also in other places where of course, Spring came out after Kent Beck's book, Rails came out after that. There's all these frameworks and there's discussion about how do I do TDD when I have this framework that gives me so many things out of the box? What does a Spring JUnit test look like? Or what does TDD look like in Spring? What does TDD look like in Rails? Those are interesting questions. They didn't exist 20 years ago. Learning that, practicing that and coaching others have been the modifications to my approach to TDD that I have done over the years.
Ìý
Neal: You mentioned cloud environments and the difficulties of testing there because you're basically doing integration tests at that point, not unit tests anymore. What are other places where it's just flat-out not applicable or practical to do what we would normally consider unit test? You talked about new environments, what about old environments like your ERPs and your tools like that?
Ìý
Saleem: Tooling absolutely does not support writing anything small and small is key, right? Small is a, I guess, placeholder for fast feedback. What we really need is blazing fast feedback. That almost inevitably means small, certainly, from a human aspect. You're not going to write lines and lines of code regardless of the language and then get quick feedback because presumably will take you a long time to write. We're talking seconds here. Inner loop programming is measured in seconds. If your language doesn't support that, unfortunately, or through happenstance, then you may be constrained by that. Your fastest feedback may not be on your inner developer loop, may not be the seconds that we aspire to, but maybe minutes or hopefully not that bad, tens of minutes. If it is that then you do the best you can, but your question where does TDD not apply?
We talked about one when we talked about REPLs. When you're learning something when you rapidly iterating over something, then the end result of TDD or one end result, or end goal of TDD, which is fast feedback can be accomplished by doing trial and error in a REPL so you're not doing strict TDD, but that's fine. You're learning something, you're experimenting so that's okay. As long as, as you said, Neal, you go back and once you're done learning and you find your answer, then you do the discipline of a test-first style.
The other place that I would say, and it's less tool-related, but more a situation is, and I'm going to go refer to the Cynefin framework. When you're in a state of chaos and you really are reacting to unprecedented events, then you're not going to do TDD. Again, if you have an outage like the one we had recently at a social website, I'm pretty sure they weren't TDD-ing when they were trying to bring that platform back up for the several hours that it was down. When you are working in a truly unprecedented circumstance, so an interesting one that we hopefully can talk about because it's been a few years, is AWS region East outage in 2017, which was funny because it was an outage of S3, which is foundational, the storage service, a lot of other services depend on it. It was so severe that somewhat paradoxically we can only smile in hindsight four years, five years later when the S3 servers went down, the health pages went down with it. Paradoxically, this was the most, I guess, cruel irony of it, the health pages were all green because they were showing an older version of the output, but everything was down because S3 was down. I'm pretty sure the really smart people that work at Amazon when they were fixing that they weren't doing TDD, because that's an unprecedented chaotic situation.
I use the word chaos in a very specific way as the Cynefin framework defines it. When you're there, you can't really plan, check, and act. You want to do something, you want to execute something because it's unprecedented. All of your plans are basically you have no playbook. When you are there when you are in that kind of a situation you want to restore so there are things that engineering teams and product teams will do in those situations that they might have never said. You never rolled back your database and stuff is done, you might roll back your database. You said you'd never kill user sessions, you might kill user sessions if you're trying to restore services. Of course, nothing in there would be like TDD. Of course, when you're out of chaos, when you're in a chaotic situation, your first order is to restore something, semblance of normalcy, but then hopefully, you have the discipline, the wherewithal and the management support to say, "What failed? What missing tests are there? How did we get into that chaotic situation? How can now TDD and other forms of testing help us to not get into that chaotic situation?"
Yes, in a chaotic situation, you'll probably not use TDD but once you're out of that chaos, similar to what you said, Neal, that once you're done experimenting in a REPL, you go back and then you do the discipline of many times and it's understandable human exhaustion and forgetfulness sets in, but teams don't do the last bit of work if they extricate themselves out of the chaotic situation, but then forget to do the discipline afterwards to not be in that chaotic situation again. I would say that's another scenario where you might not do TDD in the moment, but you certainly would be well served to do it afterwards.
Ìý
Rebecca: We talked a little bit about when you might not do it, let's talk a little bit about how it might evolve in the future. Are there aspects say of machine learning that might require a different perspective on test driven development?
Ìý
Saleem: Yes, that's a great call out although I'm little bit hesitant, Rebecca, to answer your question. As they say, the best way to prove yourself to be a fool is to try to predict the future. It's such a tempting question and you've just thrown it at me. I don't know, maybe I have just enough hubris to try to answer that. Specifically in the context of machine learning, I think when it comes to machine-learning and this is my take, and actually I want to hear back from the two of you what you think on this. Emphasis is a little bit askew, in my opinion, when it comes to machine learning and it's not so much on the data sets where I believe it belongs. It's on the algorithms.
Dr. Timnit Gebru who was with Google, she has done a lot of great work in what she has called datasheets for data sets. She has a background in integrated circuits and hardware where the problems is, hardware is obviously older than software, so the problem is well defined and there's existing IP or prior art there that when you buy an IC, you're not wondering how it might behave.
Now, you might use it, especially if it's like a micro process, you might use it for ways in which the designer of the IC never intended or never test it, but the IC will work because it's got a datasheet next to it with precise information about what the voltage is on different pins are, what you can expect to get output on a certain pin when you provide certain control input on another pin.
She has brought the same notion to data sets. This goes also to, of course, larger issues about abuse of datasets or misuse of datasets leading to different kind of biases, perhaps is in the algorithm that is biased, there's a misapplication of the dataset. I think and this is my take specifically to your question, Rebecca, about ML is I think there needs to be focus on and I will go ahead and call it perhaps unit testing our data sets there. Therefore when you're applying a data set, perhaps from a different domain, taking a data set from a facial recognition and applying it to agriculture or something totally different because it might have applicability. If that data set comes unit tested with specific datasheets, then we might have higher confidence. Again, to emphasize higher confidence is what we're aiming for. Then you might have higher confidence in whatever our ML spits out that it's a little bit hand-wavy, I do realize that. The emphatic part of my statement is I think the focus needs to be more on the data sets side of things rather than the algorithms. Is that resonating with either of you?
Ìý
Neal: Well, that it's actually similar to the famous William Gibson quote, "The future is already here, it's just not very evenly distributed," because a common practice in the closure world now is generative testing.
Rather than writing a set of unit tests, what they do is generate a set of assumptions about data transformations, and then run hundreds, or thousands, or millions, or billions of transactions, and then do statistical analysis over the result set to see if anomalies exist. There's a mutation testing is another popular framework that people use that go in and fiddle with values where you constants to see where boundary conditions and those kind of things are. I think there's some bright future in some of those innovative approaches like that.
Ìý
Rebecca: I would say that that resonates as well. One of the things I often think about when people are saying, "I want to do pattern recognition on my dataset." Unless you know a lot about that dataset, why do you believe there's a meaningful pattern? There may be a pattern, there may not be a pattern. Even if there's a pattern, why do you believe it's a meaningful pattern or a meaningful pattern in the context in which you want to do the recognition? That's something a unit test is never going to tell you. It might tell you that there is a pattern there, the meaningfulness of that pattern it's a higher-order concern, though the meaningfulness of the pattern.
I find it frustrating when so many people will say, "I've got all of this data and I'm sure there's all these wonderful insights inside." Why do you believe that? There may be. I'm not saying there isn't but we go into this, it seems, with the belief that obviously, there's a rich set of information that's just waiting there for me to find it. When in fact, it may be that the correlations or the systemic behaviors are just so complex that the pattern, we simply cannot discern what the meaning of that pattern is. There's no way that any kind of analysis, at least that I understand for the moment, is going to be able to disambiguate that, but we still try.
Ìý
Saleem: That's a great call out I think, and you're spot on. If you don't have a reason to believe that there are patterns, if you don't know a priori that there is a pattern and have a scientific basis for knowing or claiming that, then you're right, no amount of unit testing can make a pattern appear.
It may make it appear, it may still not have any semantic significance. It may be completely contrived that every fifth record displays some attribute. We might write a unit test to prove that, but what does that mean? It just might mean that we had a notion then we wrote a test then the test passes, but it may not reveal any insights about the data. That's a very good subtle point.
Ìý
Neal: Last question and this is something we often ask authors here. Why a second book after 20 years? Most authors are either pretty serial, or they're one and done, and you seem to split the difference, so. [laughs]
Ìý
Saleem: That's a great one. First of all, I hope my next one isn't two decades in the making, because I don't know if I'll have that longevity of career I can hope for or aspire for that, but I certainly can't plan on it.
I think there's really no good answer to that. I think it's a learning. The intermediate years, as you know, both of you know many of those years were spent at ºÚÁÏÃÅ and of course, the learning was so much so accelerated, so rewarding. That, at a personal level, certainly became something to noodle on in itself. It sounds very selfish that I was soaking in knowledge and not sharing it, but it was so much to soak in. That was, I guess, the personal reason.
The pragmatic reason, I guess, for that or hopefully, something that is not very as self-centered is, I didn't think that I would find test driven development of all things, frankly, as engaging to write a book about, and share about until the last several years as I said when I was just by happenstance, and this is pure luck as they say. If I say anything further, this might be my false attempt to find a pattern out of randomness or chaos. It just so happened my last several years were on projects where I was in close proximity not just with tech stacks that had these three languages but also developers from whom I learned a lot and with whom I shared a lot. Those insights just emerged in the last several years to say, hey, maybe there's a space to share with people, not just my personal passion about testing and development, but also how it can pragmatically move, hopefully, a new generation of developers. It was accident, to be very frank about it and the internal motivation to write a book but to your point, I do hope that the next height is nowhere near as long as the last one has been.
Ìý
Neal: Well, I'm glad you found something that you were excited and passionate enough about to go through the hassles of writing a book which as you know is no fun. Thanks for doing that and we hope to see one faster than 20 years.
Ìý
Saleem: I hope so too. Neal, I would be remiss if I didn't thank you publicly. Of course, you and I have chatted several times and you would recall the earlier conversations we had. I think it was pre-pandemic which sounds like a different world altogether now when I had this idea, I ran it by you, and of course, your insights and feedback were valuable as ever. Thank you for that and certainly, without your stable and leveling gaze, I don't think the book would've come out anywhere near as well as it did.
Ìý
Neal: Well, it's one thing to gaze at people but it's another thing to put in the work to write a book. That was the hard part of that whole equation. It was a great pleasure chatting with you again and looking forward to the book which should be out about the time this podcast comes out or in early release in any case through O'Reilly. Everybody go do a refresher on TDD.
Ìý
Saleem: Thank you, Neal, thank you, Rebecca.
Rebecca: Thanks for joining us, Saleem