How I wrote my scientific book

January 24, 2017 § 4 Comments

TL;DR: I wrote a 600 pages, 150k words scientific book. Took me almost 3 years. Tools: 11’’MBA, LaTex, SublimeText, Jupyter notebook, Illustrator, coffee. I wrote 1k word per day for many months.

(Estimated reading time: 15 min)

598 pages. 147 000 words. 365 figures. Over a thousand papers read, analysed, and distilled. Many pounds of chocolate. Litres of ice cream. Thousands of espressos. After more than 3 years of work, my book is finally out. This is obviously my biggest writing project ever and I gave it a lot of thoughts before getting started. Carefully chose my tools. Decided on a strategy to make sure I could focus on the content and finish on time. I also followed my progress throughout the writing. Now that it is over, here is the story behind the making to this book. If you are curious about my tools, strategy, and progress, sit down, grab a cup of coffee and go ahead.

The idea

For the past 10 years or so, my research interests have mostly revolved around the following question: what happens when you freeze a suspension of particles, and more generally, objects? This seemingly simple question turned out to be very complex and corresponds to phenomena encountered in situations as diverse as the growth of sea ice, the cryopreservation of cells, the freezing of soils, or the solidification of some metallic alloys, to name but a few. It has thus resulted in developments in many directions over the past century or so. My take on it, for many years, has been to use it to template porosity in materials, a process called ice-templating or freeze-casting. Although all these disparate phenomena share the same underlying principles, little attempts have been made to make connections between the fields.

A few years ago, I passed my habilitation. This French diploma is required to independently supervise PhD students (don’t ask). The habilitation includes both a manuscript, reflecting on your research so far, and a defense. I took advantage of it to begin a long reflexion an analysis of my research and the corresponding domains.

The habilitation took me about a full year to prepare. After the habilitation, I had thus an 30k words long manuscript, mostly focused on my research, and the idea of expanding it into a much larger project started to crystallise (pun intended). My work on freezing was becoming more multidisciplinary at this time and the idea of making connections between vastly different fields was very tempting. My last review paper was getting a bit outdated —the field had been very active these last couple of years.

At this point I received an email from a Springer editor asking if I had any book project in mind. The timing was just right and without thinking too much about it, I prepared a book proposal including a detailed outline and sent it to Springer.
My project was very positively evaluated by two external reviewers and we signed a publishing agreement on December 15, 2014. Publishing a scientific book is different from a regular book: you sign a contract before you even start to write the book. Which is both nice and scary, but I guess force you to finish the project in a given timing. I promised at this time 200 pages and approximately 180 figures. I was really far from what I would deliver two years later.

The strategy

The most important decision I made was, without any doubt, the strategy I adopted to write the book. Like many others, I practiced this approach for many years already: write everyday, no matter what. Adopting this writing routine was absolutely critical for a project of this size. This was clearly not going to be a 2 weeks/5000 words writing effort.

My routine was thus to write daily, no matter what, first thing in the morning—after coffee, though—and to write at least 1k word per day. If I hadn’t met my objective by 10AM, I would resume until the evening, as I was also very busy in the lab at this period, with my ERC starting grant going full blast.

In the evening, after the kids went to bed, I would do everything else: reading and analysing papers, collecting data, making figures, and so on. Unlike writing, I did not do this everyday. There were days where I was just too tired or had better things to do.

Home office with my Wacom tablet

Home office with my Wacom tablet

The second part of the strategy is the number one rule of writing: write first, edit later. I wrote about 70% of the book before I started to edit it. I also only started to work on figures after the manuscript was fairly advanced, otherwise I would spend too much time on them (I love preparing figures) at the expense of writing.

I was quickly comforted this strategy was good. Setting into a routine was absolutely essential to make steady progress. If you write 1k words/day, you already have 5 to 7k words at the end of the week. Let that sink in.

The tools

I always pay attention to the tools I use and this project was no exception. Ok, people who know me will probably say I am a bit obsessive about it. My main writing machine was a 11’’Mac Book Air. Working in full screen, this was the perfect writing machine. Being light, it was perfect to take everywhere with me whenever I travel, which I do a lot (once a week, on average). A lot of this book was therefore written in trains, airplanes, and airports, and various random places.

The second most important tool was my blank notebook. Whenever I had to review a new domain, take notes, or draft figures, the notebook was the best tool.

My blank notebook. Writing at the airport.

My blank notebook. Writing at the airport.

I wrote the book in LaTeX. This was so obvious that I did not even thought about it. LaTex is absolutely unbeatable for large, complex (scientific) writing projects. Springer provided a template, which was of course more convenient for them, but also for me. Having the template gave me an idea of the final output, which I appreciated a lot coming towards the end of the book, when deciding on the figures. I did not have to do any tweaking—one less excuse to procrastinate.

I used Sublime Text to write and edit. I bought a license a few years ago and it has probably been one of my best software investment for a long time (I use the Monokai theme, if you’re curious about it). The time it saved me is absolutely incredible. A few packages turned out to be very useful: LaTeXing($), which includes many useful latex functions/snippets, and AlignTabs, a must-have if you write tables and need to align cells. I also used LatexTabs, which is terrific to make new tables by copy/pasting a table from a spreadsheet. In addition, I defined 5 to 10 snippets to insert figures, citations, tables, and so on. I thus never had to type a single LaTex command during the entire writing of the book (except when I prepared the index, at the very end). The final version of the book, with 600 page, 365 figures, and over a thousand references, took less than a minute to compile on the MBA. Not too shabby.

To keep track of the 1k+ papers I read and analysed, I built a long spreadsheet which helped me sort them out into various categories. This was extremely convenient when working on sections dedicated to specific materials.

My spreadsheet to sort papers

My spreadsheet to sort papers

I have used Mendeley for many years now to organise my library. This was again a very useful tool, if just to automatically generate the bib file for LaTeX. I also used folders to keep track of which papers I analysed or not yet, etc.

The analysis of the data (processing conditions, materials properties) contained in the papers and its combination into something useful was also an important target of the book. I digitised many plots (because the values were not provided in tables) using the terrific GraphClick (OSX only. The development is not active anymore but it works like a charm on macOS Sierra), and made two kinds of source files from them. For plots I just wanted to reproduce, to make sure all the plots were consistent in the book, I exported a text CSV file with the data. To combine data from many papers, I made a (huge) spreadsheet with all the values, which I lated filtered to extract the series of interest. I used this strategy before.

I then used a single Jupyter notebook to prepare all the plots of the paper. This ensured that all my plots were consistent.
For schematics and drawings (my favourite part), I used a large Wacom tablet hooked to Adobe Illustrator. This was such a joy. I had to restrain myself from spending to much time on figures. For a few 3D schematics, I relied on Sketchup.

I also subcontracted some of the figure making. This one didn't made it into the book, though

I also subcontracted some of the figure making. This one didn’t made it into the book, though (close call)

Finally, all the files were stored in a Dropbox folder, which provided me a permanent backup (in addition to my external backups to hard drives). Again, a no brainer. It also allowed me to work from multiple computers: my work laptop and my home computer (a 27’’ iMac), which was very comfortable for pretty much everything. Dropbox saved me several times during the writing of the book. Mendeley is also installed on my two machines and all my papers are synced.

Last but no least: coffee, chocolate, and ice cream. And coffee. Did I mention coffee? I have an outstanding espresso machine at home (Sylvia Rancilio) with a professional grinder, which ensures me an excellent and consistent quality of coffee (I have not upgraded it yet with a thermocouple and PID, though). I am particularly keen of the Lucie Royale.

My coffee rig at home

My coffee rig at home

That’s about it for the tools.

The progress

Sticking to my strategy ensured that I made steady progress. Below is my progress during the entire preparation of the book. I started to write in June 2015 and wrote until October 18, 2016, (the day I sent the first version to Springer).

Progress of my writing. You can easily spot the holidays breaks.

Progress of my writing. You can easily spot the holidays breaks.

We can best see my progress when removing the days where I did not write from the plot.

Progress of my writing, removing days where I did not write

Progress of my writing, removing days where I did not write

I guess you noticed straightaway three stages here. The first period, where I wrote 2k to 5k words per day, is when I turned my habilitation into the first scaffold of the book and jotted down tons of notes and a very detailed outline of the book. Progress was therefore very rapid. I wrote 31k words in 9 writing days.

After this first stage, I let the manuscript rest for a while. There was 3 months period where I read and analysed hundreds of papers, made sense of them, and elaborated a more detailed structure of the book. No writing whatsoever (also: summer break), but a critical phase for the rest of the book.

The second stage, the longest, was when I wrote most of the book. One thousand words a day. Everyday. For months. You can notice that I really sticked to the plan. The only exceptions were holiday breaks, where I stopped writing, because family life and work/life balance. The book went from 34k to 110k words during this period.

The final book has approximately 147k words (without the bibliography). That’s a lot of words. This is equivalent to 10 to 15 review papers, or 30 regular papers. I wrote on average 4 papers per year in the last 10 years, so this was a lot more than my average. Again, this write-everyday strategy was absolutely necessary to complete this project on time.

The third stage, which corresponds roughly to one-third of the book, was the most difficult. I was too advanced in the writing to stick to my initial plan of 1k words per day.

My writing strategy: write first, edit later

My writing strategy: write first, edit later

I started to edit the text, prepare figures, reorganise sections, etc. This was essential to improve and polish both the structure and the content. I also updated the book with the most recent papers published during this period, which was easy as the structure was finalised, but a bit tedious, as many papers were published in this period. I had thus to redo a lot of reading and analysis at this time. Progress was slower during this period. The last few days on the plots are the days were I included all the permission-related text in the captions of the figures I reused. There are at least 2000 words in the book just to properly cite the source and copyright of these figures (more below on this).

Overall, I did much less editing than I thought (wished) I would. I removed approximately 16k words from the book (this is a rough estimate, see plot below), which is about 10%. On my standards, this is really not a lot. When I write the paper, 16% is probably the amount of text from the first version that is left in the final one. I could not do such an extensive rewriting here, or it would have taken another 2 years. Not an option.

Editing progress

Editing progress

The initial deadline to send the manuscript to Springer was June 2016. I could not meet the deadline. It took a bit longer and I eventually sent the final version at the end of October. The book is also three times longer than I initially estimated, so I guess it was a fair delay. The book is much more comprehensive than I initially envisioned. Although I did not asked, I wonder how many authors exceeds the initial deadline, and by how much? If anyone has any idea, please let me know.

The cool bits

Some things I more specifically enjoyed when writing the book. First and most important: learning about new domains and making new connections ! The idea was to cover many different domains where objects interact with a solidification interface, so I learned a lot while preparing the book. And yet, I still feel that I just cover the absolute basics of many domains. Overall, the book is still 60% materials science, 40% other than materials science (give or take).

I absolutely loved preparing the figures and in particular the drawings. I love the Wacom/Illustrator combo and spent way too much time polishing some of the figures (I am a bit obsessive when it comes to figures). I specifically enjoyed adapting old figures of crappy quality.

The original figure

The original figure

 

My new version of the map

My new version of the map

Rewinding the history of ideas (the opening of chapter 4) was terrific and instructive. Seeing how these ideas appeared and developed in very different domains and with radically different perspectives and methods was absolutely fascinating. I am sure that I missed important papers, though (please let me know). Reading old papers is fun. I wish I could write (and draw) like many of these people. When Stephen Taber explains in his paper that he did his experiments in the cold nights of winter 1914/1915 because he had no cooling device in his lab, and then had to give up for a few years because it was not cold enough, I ended up looking up the weather records in Northern Carolina at the beginning of the 20th century to determine when he actually did the experiments (I am still not sure). I am not certain about the reliability of the records I found, either, so I did not included them in the book.

The getting-started chapter, where I expose all the tricks to get started with freezing in the lab, was a joy to write. This was probably the easiest chapter to write (I wrote 1100 words in 1hr, my record) and it was really fun to do. I would not be surprised if this turns out to be the most popular chapter. I receive many emails (mostly from students) asking me for basic, practical advices on how to freeze. I hope this chapter will be helpful for them. I believe we need more method papers, the methods sections are generally too short (IMHO) and I saw nice initiatives in this direction from Chemistry of Materials recently, for instance.

It is embarrassing, but I have to confess that the preparation of the index was very satisfying. It took me 2 or 3 days. I hope the readers will find it useful. I like indexes in book. I made a poll on Twitter and everyone wanted an index. So I complied.

Finally, I learned about some of my bad writing habits. I will not list them all here, but fixing them was super simple with Sublime Text.

The annoying bits

Two things. First: getting permissions and writing credits for figures. Although asking for permission is now automated for most publishers (through the Copyright Clearance Centre), each of them has a different requirements on how to write the credits in the caption. How to cite the paper. Some want you to reproduce exactly the figure caption. Others do not specify anything on this. It took me a few days to collect everything and write all the credits. I had to give up on a few figures (in particular old papers), for which I could not get permission (paying $100 or so per figure was not an option).

Second: Springer does not use the Oxford comma style. I should have been more careful before signing the contract.

The difficult bits

Writing this book was not an easy endeavour, but a few things were nevertheless particularly difficult.

Sticking to the write-everyday routine in some demanding periods of the year (e.g., grant writing, or summertime) was tough. It is much easier to write in winter when it is dark early, believe me. Overall, it felt like a marathon. I tried not to write too much on a given day, to make sure I would not be tempted to skip my next writing session and break my pace. Sticking to the timing while maintaining the work/life balance was tough.

Keeping an eye on the literature while writing the book was demanding. The solidification of suspensions is a very active field these days (in particular in materials science) and I receive many Google Scholar alerts every week. I wanted the book to be as much up to date as possible at the time of submission. The most recent paper was published the day before I sent the manuscript to Springer.

The last 2 months were tough. It felt like it was almost over and yet there were tons of stuff to do. Editing the text. Adding the permission. Fine-tuning the figures. Preparing the index. Checking the quality of figures. I also started to question some of the choices made before, in particular on the third chapter (I am still not very happy with it).

To print or not to print for reviewing and editing? I tried to do as much as I could on the computer, but at some point, I had to print one version. That was a lot of pages, but I am better at spotting mistakes on hard copies than on the screen. I proofread this hardcopy three times, took tons of notes, and found a scary number of grammatical mistakes. It was also very useful to get a feel of the figures size and appearance.

Hardcopies. Felt sorry for the trees, but I'm really better at proofreading like this

Hardcopies. Felt sorry for the dead trees, but I’m really better at proofreading like this

Proofreading

Proofreading was smooth. I received the proofs 2 days before Christmas, which clearly was not the best timing. I was not surprised by the final output, since the LaTex template gave me a really good idea of the final product. Springer just checked the grammar, which was a bit disappointing. They did not fixed or edit any of the style. There were almost no corrections, meaning I did not made too many grammatical mistakes, which was thus satisfying. I took me just a couple of days to read everything again (without printing). I found a few typos and last minute changes to make, but not many.

The final product

I love books. I have books everywhere at home. This one is 598 pages long—a bit more if you add the TOC, index, etc.— and has 365 figures (one for each day of the year, in case anyone wants to make a calendar with it). I am now looking forward to receive the hardcopy and hold the object in my hands. And yet it feels small, as there’s so much I wanted to touch upon. It will be a nice addition to my personal library.

Misc thoughts

I am still not perfectly happy with the final results, in particular chapter 3 (the mechanisms behind the phenomenon). I had to submit a final version at some point, however. So, here we are. Maybe there will be a second edition one day, I will have plenty of time to think about how to improve it by then.

I wish I spent more time editing the text. I am obviously not a native English writer, and even though I think I can write something decent in (technical) English, I still have a huge margin of progress. My English is probably better today than when I started to write the book. I can tell, from reading, which parts I wrote first. As most readers will not be English/Americans, it’s probably fine. If the style upsets you, well, go write a 150000 words book in a foreign language and then we’ll talk. I just came across WriteFull and I wished I found this gem before.

Getting credits for figures was annoying, albeit mostly automated. Having to require permission to reuse my own figures was particularly frustrating. All publishers have different requirements for how to credit the original work and make reference to the copyright holder. A standard way of giving credits would be nice, but I don’t think this will happen any time soon.

The new figures I prepared for the book were all uploaded to Figshare first (except for the plots). I did this to ensure I retained the copyright, so that it will be easier for me and anyone else to reuse them. That’s a really neat idea that I stumbled upon a while ago (see also here) and I like it a lot. I also started to do this for papers too. Springer did not commented on this. That’s fine for them I guess, as the license is very clear, so it is not different form reprinting a figure from a previous paper. No visit to the Copyright Clearance Centre if you want to reuse these figures, thus.

I could not resist and placed a few Easter eggs throughout the books. Let see if someone sees them 😋.

I could not choose the cover, that’s a shame. It’s part of a book series, so it feels a bit bland and impersonal (I like the red color, though).

Wrap up

Overall, this was a really good experience. A bit exhausting, though, and I am really happy that the book is finally published. Had I to do it again, I would choose the same strategy and tools, except for the laptop. I just upgraded to the new 13″ MBP (2016) with a retina screen: I will never be able to come back to a regular screen. Ever. I would have chosen the 12’’MacBook (which is probably the ultimate writing machine) if all I needed was to write, but I need more horsepower. One thing I would do differently, though: I would probably try to go on sabbatical in a nice place to write the book and be able to concentrate full time on it. That would be less tiring (as well as a good excuse to take a break). But for now, there is too much exciting science going on in the lab!

I am now looking forward to see how the book is received and if I get any feedback from anyone (I hope so!). I wonder if anyone will send me chocolates, as skilfully suggested in the preface. Thanks for reading !

You can also follow me and send me comments on Twitter at @DevilleSy.

Advertisements

Three years of open access efforts: preprints are my future

December 3, 2016 § 1 Comment

Like many people, I started to boycott Elsevier 3 years ago. I went for the full boycott: I pledged not to publish any paper in Elsevier journals anymore, as well as not to review for any of their journals. I declined the first invitation to review on November 25, 2013. I am an established scientist with a wonderful permanent position at the CNRS, almost 60 papers already published, and enough grants secured for a few years. It was therefore much easier to do it for me than for someone trying to get a place in the sun.

Although I did not kept track every papers I decline to review, I probably declined to review 30 to 40 papers, give or take. Most editors (not all) quickly removed me from their reviewer database, so that I stopped receiving invitations from them. Others did not, so I kept declining and sending the same message for 3 years. I receive more papers than I can review anyway, so that it did not change my overall reviewing activity.

I had a small paragraph (I found one on the internet somewhere and adapted it. Can’t remember where, sorry about that) that I always sent to the editor whenever I declined a review, explaining why I did so. From the 40 papers or so I declined to review, I got feedback about my message only twice.

One editor-in-chief emailed me once, and was rather sympathetic to my cause (he was himself publishing some of his papers in open access journals). He told me he never had such straightforward and strongly worded comments on this topic before, even though some (many?) people discussed it with him. I understood that these people were scared from being blacklisted by the journal.

The other feedback was from an editor-in-chief I personally know … as he is my former PhD supervisor. Of course he disagreed with me, but we had a good discussion on the topic. I didn’t convinced him to resign from the journal.

Other than this: no feedback whatsoever. None. No one cared. As far as I can tell, it did not make any difference. And I assume that I was probably the only one to decline the review for these reasons (the materials science community is not exactly at the edge of open access efforts).

The other side of the boycott were my own papers. A few Elsevier journals are quite important in my domain. Before starting to boycott them, I published 13 papers in Elsevier journals (Acta Materialia, Biomaterials, Journal of the European Ceramic Society). The last paper I published in an Elsevier journal was in 2012. I did not published with them anymore after that, unlike many other scientists who pledged to boycott them. There are many reasons to break this boycott (mainly: not putting students or postdocs in a difficult position by excluding a relevant journal for their paper).

Whenever we had a paper ready for submission, we had to choose a journal. Although I always raised the question and explained my reasons, I never forced my co-authors to comply with my own choices. Their response was so-so. Most of them did not care too much, although they understood my point. We always found a good solution (in terms of journal). The two main issues discussed were (1) why just Elsevier, and not Wiley (I published 15 papers in the Journal of the American Ceramic Society, published by Wiley), Springer, etc. which are for profit publishers too ? and (2) the APC costs. On this last point, I was in a rather good position, having a large grants where APC are eligible cost. However, this also means not using this money for something else in the lab. As this fantastic grant is coming to an end, I may have to reconsider my opinion on this, though.

I also recently asked an editor to make my paper open access. YMMV, but hey, if you don’t ask, you’ll never know. In this case, I accepted an invitation at the condition of the paper being made open access, and the editor kindly accepted (the publisher agreed to make a few papers -which they deemed important enough- open access every year). Very nice (I have to write the paper, though). This will not work for most papers, although APC can sometimes be waived if you have good reasons).

I therefore experienced with a few open access journals, with various degrees of satisfaction. The open access journals I submitted to were either not for profit or society journals (PLOS OneScience and Technology of Advanced Materials, Materials, Inorganics), or mega open access journals from the big players (Scientific Reports, ACS Omega). We also published a few other papers in paywalled journals, and made the preprints available for them.

I did not spent too much on APC. I paid them for PLOs One (happily), Scientific Reports (not happily), and Science and Technology of Advanced Materials (twice. Reasonable APC), and that’s it. The APCs were waived in Materials (the paper was an invited review). We also had a feature paper in a paywalled journal that was made open to anyone (without actually asking, which was very nice). The APC of our latest paper (in ACS Omega) were reduced from $2000 to $0 ! A $500 transfer discount (the paper was rejected from another ACS journal), plus 2x$750 waivers offered by the ACS because I previously published two other papers in one of their journals (Langmuir). Overall, it was thus not a huge amount spent on APC during these three years.

Although I initially quite liked the idea of these mega journals, I have a different opinion today, after a few years of seeing what they published. In some of these mega journals, there is a lot of so-so, or frankly terrible papers (won’t name, won’t shame). In others (e.g. PLOS One), our community is not publishing, so I almost never found anything relevant in them (we published in PLOS One because I wanted biologists to see this paper, which was about antifreeze proteins. And they found it.).

Overall, I still believe in the value of journals, for the filtering they provide (or that authors provide by choosing to submit to them). Even though I use Google Scholar and the likes for keeping track of what is published (through keywords and alerts), I am also following a number of journals to see what the different communities are up to (e.g. Langmuir, Soft Matter, etc.). I cannot achieve this with the mega journals. There is just too much noise, and too many communities publishing in these journals.

Open access journals initially tried to differentiate themselves also by providing new services to the authors, such as altmetrics. However, this is not the case anymore today as pretty much all journals are jumping on the train (I like to know how many times my papers were downloaded, even though it is sometimes a bit depressing). In my own experience, it is difficult to tell if our papers received more attention because they were not behind paywalls, although I’d like to believe so. But hey, the idea it to make everything accessible. Who knows when and how a paper will be useful to someone and make an impact ? Nobody has any answer to this question (which is a good thing I believe).

In the meantime, preprints have attracted a considerable attention, and develop rapidly. Although physicist have used arXiv for ever, chemists (chemRxiv), biologists (bioRxiv), and many others (SocArXiv) are now joining the game, and journals are increasingly opened to preprints (of course). Elsevier now has a Romeo-green policy regarding preprints for its journals. As more and more people know about preprints, they also head to these servers when looking for the access to a paper they don’t have access to (search engines point to them, too). This is therefore a very cost-effective solution for making papers available right now. Feel free to argue in the comments below.

A number of other openness initiatives have also gained a lot of steam recently, besides papers. I am talking here about the data and figures, of course. I have become a huge fan of services like Figshare or Github. There is as much value (if not more) in sharing data and code (and giving them DOI to get citations and keep track of their use), than in just publishing a paper. Even if you are not convinced by this, just think about your h-index: people are more likely to cite your paper if you give them stuff (tools, data) they can reuse. Being an increasingly avid user of image analysis, we are now providing our codes (Python) whenever we publish a paper (2 papers so far, here and here, and more coming soon). The code is a Jupyter notebook with Python code and explanations inside, trying to explain as precisely as possible what we did so that people can check, replicate, reuse or iterate if they are interested. Based on the download counts, it proved almost as popular as the paper. This one was accessed 1589 times, and downloaded 219 times (while the paper itself was accessed 3064 times to date)! I was positively surprised by this. It also initiated a new project and collaboration on open data (in the pipe, be patient). I am certainly going to continue in this direction in a foreseeable future.

Besides the code and data, I found another very interesting use for Figshare (or anything similar you’d like): claiming the copyright of my own figures, so that their reuse (by yourself or someone else) is easy and does not depend on publishers. I started thus to upload a number of figures to Figshare (before submitting the paper). No editor has complained about this so far (I suspect editors actually like it since they like to have a clear view of which license is used). This is not very useful for simple plots: as long as you provide the data, they can be easily replotted in most cases. For complex plots or drawings and images that took a lot of time and efforts, I found this idea very exciting and incredibly simple to implement. It takes 2min per item to upload and tag it on FigShare.

Based on this analysis, where do I stand today?

  • Regarding the boycott to Elsevier: I will do my best to avoid them, but if the community we are targeting is publishing (reading) in an Elsevier journal, so be it. Like I said, Elsevier is Romeo-green on preprints, so we can make the paper available at no cost, and for me, that’s good enough for now. Our main criterion for selecting a journal is (and has always been): which community do we target ? Who do we think will be the most interested by our paper?
  • Reviewing: because nobody cared about my boycott in these journals, I am not declining reviews anymore (I am not accepting ALL reviews either, so don’t send me everything). There’s no reason I can’t kill papers like everybody else, right?
  • Whenever I give a talk, I always mention on my slides if the papers are open access. I see more and more people doing this. It raises awareness among those not convinced yet.
  • Preprints: yes, yes, and yes. This is now my number one criterion. If the journal does not allow preprints and is not open access, it will probably be a no-go. On the short term, I believe that preprints are the easiest way to make papers available at no cost (the cost of running arXiv is not negligible but the cost per paper is incredibly low, compared to the typical APC).
  • Data, code, presentations: Figshare ! I love it. We now always release the codes we developed, even if I am not a good coder (ahem). The feedback on the data/code we released so far was excellent. I also started to share the slides of my talks too, with a very good feedback.
  • Keeping the copyright of my own figures using Figshare (or something similar if you don’t like Figshare). I’ll try to do this as much as possible. I love the idea and its simplicity. Figshare items can be embargoed, so this is not an issue in principle if you have a super fancy paper coming up.
  • Mega journals of for-profit publishers: I most likely won’t publish with them anymore. Besides the APC issue (I am not going to pay $5k for a paper), I just found too much noise in these journals. It has become very clear that this is just another way of making money for them. Other mega journals: same reasoning applies.
  • Educate our students about the publishing system, so that they can make their own choices, knowing how it works. This will take a generation or two, so we’ll have to be patient.

Even if you do not want to pay to make your papers open, there is therefore a lot you can do today to make your papers and their code/data available. Even though it’s nice to see individuals fighting for this, I believe that the most efficient way to change the system is for the funders to require open access. The ERC does this now. Other funders are joining the trend. Even reluctant academics will change their habit, because they won’t have the choice. And this actually be done rapidly. The journals will have to adapt, somehow.

That’s my position today. Feel free to argue in the comments or on Twitter.

Everyone uses Sci-Hub

April 29, 2016 § Leave a comment

That’s the conclusion of a paper published today in Science. The server log data of Sci-Hub were analyzed, and the results laid out on a world map. It’s a surprise for no one that Sci-Hub is used everywhere in the world.

I was curious about my own experience, so I did the maths quickly, and used my Mendeley library for it. After converting my library to a csv file (thanks Jabref for that), I analyzed the listing with Pandas (Python). I counted all the papers added in my library since 2013.

I have a total of 850 papers for this period. I took into account the 40 most represented journal, which corresponds to 537 papers. My work is getting more and more multidisciplinary, so I read papers for many different domains. Hence a lot of journals.

The split is the following:

  • I had access to 318 papers (among which 39 are open access)
  • I had no access to 219 papers. That’s 41%.

Had I analyzed all the journals and not just the top 40, I suspect the actual figure to be even worse. Anyway.

So there you go. I work for the main research organization (the CNRS) of one of the richest country in the world, and I have access to only 60% of the papers I need to read. I don’t blame it on the CNRS of course. They do a pretty good job with the money they have. I have access to Elsevier, ACS, Nature, and Springer journals. That’s really not bad. What I miss the most are Wiley and RSC. I found the papers I was missing from various techniques, including emailing the authors, preprints (arXiv !), authors personal website, and a few other popular strategies.

Draw your own conclusions.

 

Snowflakes, engineered

March 10, 2015 § Leave a comment

"On the six-cornered snowflakes". Kepler's book on snowflakes.

“On the six-cornered snowflakes”. Kepler’s book on snowflakes.

One of the earliest scientific observations you may have performed as a kid may be that of snowflakes. Their delicate morphology, with multiple branches, has a unique appeal to the eye and can easily be observed with magnifying glasses. No wonder that snowflakes already caught the attention of scientists and poets for centuries. In the 17th century already, Johannes Kepler noticed their 6-fold symmetry, as well as their unique nature – not two snowflakes are alike.

Wilson Bentley’s photograph of snowflakes.

For a very long time, the only way to record the shape of snowflakes was drawing. If you ever looked at snowflakes under a magnifying glass, you can easily imagine how difficult it is to draw – notwithstanding that snowflakes tends to have a very short lifespan. In the early 20th century, Wilson A. Bentley was the first one to photograph snowflakes, systematically capturing thousands of unique snowflakes for over 40 years. His collection has proved to be incredibly valuable to investigate their morphology and is also a unique piece of art, if you ask me.

 

Libbrecht’s setup to photograph snowflakes in the wild.

Following on Bentley’s work, Kenneth Libbrecht, at Caltech, is dedicating his carrer to the study of snowflakes. Driven by both passion and science, he developed over the years a unique setup to capture images of natural snowflakes. There is still a lot to learn from snowflakes. Or there is actually not that much we understand about the growth of snowflakes and the physics behind it. One thing me know: when it comes to snowflakes, there is more than the 6-branches morphology that anyone will draw if you ask them. Depending on the conditions (temperature and supersaturation), you can get anything from needle to plates. The morphology of natural snowflakes directly depends on the conditions they encountered in the sky. As such, snowflakes can be seen as little messengers from the clouds, telling a very local climate story.

Snowflake morphology diagram. There’s more than 6-branches snowflakes !

Systematic investigations, required to understand the physics behind snowflakes, are thus notoriously difficult with natural snowflakes. Physicists have long been trying to grow artificial snowflakes in the lab, under controlled, reproducible conditions. The first attempts to grow such snowflakes used … rabbit hairs ! A well-controlled (at that time), one-dimensional object, suitable to trigger the nucleation of snowflakes.

The crystal on the right was subjected to periodic temperature changes that yielded a spider’s-web pattern of ridges and ribs.

The crystal on the
right was subjected to periodic temperature changes that yielded a spider’s-web pattern of ridges and ribs.

In a paper published on arXiv last week, Libbrecht describes a very unique microscope, designed to grow snowflakes under controlled conditions, and to record their growth in real time. The pictures are stunning, as usual. I have one of Libbrecht’s book of snowflakes collection on my desk, and peruse through it every once in a while. You should, too.

Engineered snowflake with a near-perfect 6-fold symmetry.

Engineered snowflake with a near-perfect 6-fold symmetry.

The most interesting thing to me are the time-lapse observations reported in the paper. By varying the supersaturations and temperature conditions, Libbrecht triggers, in a controlled manner, side branching events, effectively engineering the morphology of snowflakes. Increasing the supersaturation for a brief moment initiate the development of branches at the corners of the growing snowflakes. Several new branches are eventually created from each corner, each of them growing in a synchronized fashion, the conditions being homogeneous at the scale of the snowflake.

Noorduin's microscopic flowers grown under diffusion-controlled conditions.

Noorduin’s microscopic flowers grown under diffusion-controlled conditions.

This behavior reminded be of the beautiful microscopic flowers reported in Science  by Noorduin two years ago where, by varying the CO2 concentration, Noorduin was able to change the growth morphologies of its tiny flowers in a very controlled manner. In both cases, crystal growth occurs under diffusion-limited conditions and may thus share more than meets the eye.

Changing the morphology of flowers with a CO2 pulse.

Changing the morphology of flowers with a CO2 pulse.

Having a much better control of the conditions, the grown, engineered snowflakes have a much better 6-fold symmetry than their natural counterpart. Growing branches while falling through the windy sky is a tough job. The conditions vary constantly. By the time each snowflake reaches the ground, their infamous 6-fold symmetry is seldom preserved. By deliberately engineering the growth of his snowflakes, Libbrecht obtains new insights into the physics of snowflakes growth, which may certainly be valuable for our understanding of crystal growth. But I can’t help but see the sheer beauty of the highly symmetrical engineered snowflakes, too. Libbrecht may very well be the very first one to grow two identical snowflakes, ruining a long-standing belief that not two snowflakes are alike.


More:
Kennetch Libbrecht’s website
Gallery of Libbrecht’s snowflake photographs.

The Automated Academic

January 4, 2015 § 8 Comments

I muss confess: I am a lazy person. I hate to spend unnecessary time on tasks, in particular mundane or recurrent ones. When doing science, even though we are constantly exploring new ideas or novel methods, there is a fairly high number of recurring activities, from literature review to data analysis or writing. Being (barely) part of the computer native generation, I am very fond of any tools that can help me save time in my academic workflow (and improve the reproducibility of our science). Although I keep an open eye for new options, I have developed a relatively steady number of practices and tools over the years, which help me saving a lot of time and concentrate on the tasks I enjoy. So here they are, I hope you will learn new ones here.

TL;DR

Essential tools I use: Google account (Google Docs, Google Drive, Google Scholar), Feedly, IFTTT, IPython notebook, Twitter, Mendeley, Pandoc, ORCID.
Tasks I automated: literature survey, citations formatting, reading lists, data analysis, email, writing.

Disclaimer: I am not sure whether to consider myself as a geek or not. When it comes to automation, many options require an advanced control of the tools we use, aka, be a power user. Almost all of the solutions I have listed below have a very low entry barrier (except IPython, for which you need to be familiar with … Python !), and can be set up rapidly by anyone.

Literature review

Literature review is an essential activity of academic research. I have already covered the topic here, so here is a quick breakdown of the tools I use.

  • RSS feeds (free): to keep track of all new articles from a given number of journals. After the death of Google Reader, I set on Feedly (free). Good enough for now.
  • Google Scholar (free) alerts. Google Scholar has become an essential part of my workflow, to keep track of what is going on in the world of peer-reviewed science. The most useful part of it are the e-mail alerts. I set up a couple of search alerts, based on keywords relevant to my research. They come almost daily to my inbox. I only wish I could combine several alerts into one, which would help me reduce the number of emails I get.
  • Twitter (free). I am a Twitter addict. And one of the many reasons is that it helps me stumble upon new content in the world of science. Although plenty can be done with the basic Twitter tools (hashtags, lists, etc.), you can build a few more elaborated tools. A while ago, I set up a Twitter bot based on PubMed, which automatically posts tweets with link to new paper on a given topic. More explanations here. Twitter can also be combined with IFTTT (free) for a number of tasks. If you do not want to be involved with the Twitter API, you can do some basic tracking with IFTTT, such as automatically listing tweets with a given hashtag (15 tweets max per search), and saving the output to a Google Doc file. I just set up a number of tasks based on this, I will let you know how it goes in a while.

Reading

Reading is another essential, recurring task of my workflow. I read mostly two kinds of documents: peer-review papers, and articles. All my papers are automatically organized in Mendeley (free), thanks to the watched folder (I download every article in a specific folder, the content of which is automatically added to Mendeley). For all other articles, I tend to send everything to Instapaper (free), which I like a lot (removes all the clutter). This can be done directly from Feedly. With IFTTT, I can also send links in tweets I have favorited automatically to Instapaper.
To keep track of articles I particularly enjoyed or found relevant, you can automatically create a listing or liked or archived articles in Instapaper to Google Docs. Mostly future proof, I guess.
I also set up an IFTTT recipe to send to Instapaper links from tweets with a given hashtag (e.g #ipython).

E-mail

E-mail is like peer-review or democracy. It is the best solution until we find something better, and it’s quite clear that it’s here to stay. I work hard to be close to Inbox Zero, which I usually achieve. Rules are nevertheless a very powerful tool to automate the wonderful task of dealing with your email. Kind of obvious, but super efficient.

Backup

Duh ! If you do not backup your data, expect a slow, painful death in a near future. You will have deserved it.

Data analysis, file organization

  • Tags vs. folders. Should you organize your files ? Even though there are a lot of tagging solutions for the files out there (it comes with OS X), I still use folders. You can automate some of your files management with Hazel ($29) for instance. The only automation I use is the watched folder for updating my Mendeley library, as discussed above.
  • Data analysis. There are probably a number of recurring experiments in your workflow. And there is a good chance that you end up with CSV files containing your data. If that’s the case, it would be a good idea to get rid of Excel and move to IPython, and in particular the IPython notebook (free). @ajsteven130 turned me into an Python fan, and for me, there is now way back. I am just completed a project (i.e, a paper) for which I did the entire analysis in the notebook, and it is just too good. It is also a big win for reproducibility and sharing what you did. More here.
  • Getting values from plots. I use Graphclick (OS X). This little gem automatically extract the values from plots when you don’t have access to the raw data. Super useful, when compiling data from the literature, for instance. It hasn’t been updated for years, but does the job perfectly. Ridiculously cheap ($8).

Writing

Whether it’s papers, grant applications, or reports, we spend a fair amount of time writing. Even though papers do not write themselves, there are a number of things that can be automated to help you concentrate on the content.

  • Scheduling time for writing. This is not really an automation solution, but I settled on this routine a while ago (2 years ago, maybe ?). Whenever I have something to write, which is, pretty much, all the time, I block a dedicated amount of time in my day to write. No matter what. I am a morning person when it comes to writing, so I write 1h (or more) every day first thing when I get to the lab, and it makes a huge difference at the end of the week. Given the amount of writing I will have to deal with this year, I certainly plan to keep this approach. If 1h per day scares you, try 20 min. At the end of the week you will end up with 2 hours ! Big win.
  • Incorporating references in your writing, and formatting the references. If you’re not using a reference manager, you’re doing it wrong. Period. There are plenty of options out there, so you don’t have any decent excuse. I set on Mendeley many years ago, and am not planning to change since they gave me a shirt (private joke here). Bonus point for syncing my library (including PDFs) between the various computers I use.
  • If some part of you writing involved repetitive expressions, it might be a good idea to use a text replacement software such as textExpander ($35) and alike. I don’t. Yet.
  • Conversion. I am still chasing the « One file to rule them all » dream: one master file for all kind of outputs, from PDF to html, xml, and so on. I became of big fan of Markdown (a very simple markup language) for the first draft, and am seriously considering it as my master format, relying on Pandoc (free) for all the conversions.
  • Solutions for collaborative writing. As soon as you are not alone on a writing project, you have several options to collaborate. And no, emailing the files back and forth to your colleagues is not a suitable option. Depending on your colleagues and your geekiness level, you have many options, including Google Docs (excellent for comments and review mode), GitHub in combination with Markdown, Overleaf (free)(for LaTeX fans), etc. Bonus point for Mendeley for automatically populating your library with the references cited in the file you received and not in your library yet. Very useful.

Updating your CV

Most academics love (and are often asked) to have an up-to date list of their achievements. You have many options here. The solution with the lowest amount of efforts is to sign up for a Google Scholar account. It seems to become one of the standard today, along with ORCID (free). Bonus point for keeping track of citations to your work if you are addicted to metrics. Alternative solution if you need a PDF with a list of your papers: keep track of your papers in Mendeley, get a bib file from it with all your papers, and use this with your favorite LaTeX template.

Other

  • Password management. Automation AND safety. I use 1Password ($50) and it does the job perfectly.
  • Keeping tracks of things of papers you’ve reviewed. I just came across IFTTT (you have probably guessed that by now), and made a recipe involving Gmail and Google Drive. All incoming emails in my inbox with « review » in their title are listed in a Google Doc automatically. Tons of variations possible based on this workflow. Get creative.

Anything you use that I missed ? Let us know in the comments.

My twitter achievements

November 19, 2014 § 2 Comments

Tweeps, and in particular scientists, love discussing why they use twitter. They also usually discuss it… on twitter of course ! Trying to convince people already on twitter to use twitter is an interesting recursive situation, but people not on the network are very often dubious about the benefits. One of the question I got asked quite often is the following: can you give me some practical examples of things that happened because you were on twitter ?

Earlier this week, I was invited to a PhD viva at the college de France. The work (biomineralization of bone) was loosely related to my direct research interests (freezing and self-assembly) but brilliant, and I really enjoyed that day. The jury was eclectic and we had a good scientific discussion. While enjoying the post-viva champagne at the top of the roof -the view over Paris is truly outstanding.

Room with a few. Not bad.

Room with a few. Not bad.

I’ll take a position there any day, not even asking for an office, the terrace will be fine – I learned about the reason I was there on this day. When some of the work was published in Nature Materials last year, I tweeted about it, like I do when I see papers which I find of interest, and that tweet showed up on the altmetrics page of the paper. That’s how they realized I could be interested in participating to the jury.

That case was just one more example of things that happened to me through twitter. For the sake of giving simple, practical examples of similar situations, here is a quick summary of what I would call my twitter achievements:
– invited to a PhD viva.
– co-authored a review paper with authors I’ve never met in real life. The paper is on the verge of being accepted in a prestigious journal (fingers crossed).
– shared a few beers and nice meals in Paris with a few CNRS colleagues which I met on twitter.
wrote an op-ed in Le Monde (online edition) to discuss science communication in France and the use of social media.
wrote an article in Rue89 (a mainstream media in France, online only) on open access, following a comment I tweeted about one of their papers.

So there you go. Simple examples. Share yours in the comments.

Weekly readings #1

November 2, 2014 § Leave a comment

Here’s what I found on the scientific internet this week:

Where Am I?

You are currently browsing entries tagged with science at Sylvain Deville.