Quick data mining of my own library

August 23, 2012 § Leave a comment

Almost back to the lab. It’s been a good summer with the boys, mostly at home. Reading books, papers and blog posts when I had free time. Which does not occur so often with children less than 5 years old, as anyone in the same situation can testify.

A lot of heated discussion are occurring online now about open access and data mining.  While some benefits are straightforward in certain domains such as genetics or chemistry, this is a brand new world to explore. I came across the fascinating comments by Philip Ball on chematica, a network of the transformations that link chemical species. Chemistry is not really my cup of tea, and I don’t have any of the coding abilities, unlike prominent data miners like Peter Murray-Rust. One thing I have, though, is a Mendeley library stuffed with papers (over 1400 as of today). Since my main focus now is on this ice-templating thing, I have a bit more than 350 papers on this topic only.

In addition, I am also fascinated by issues related to presenting data, aka the visual display of quantitative informations , as described by Tufte, among many others. I’ve been playing with Wordle before , it’s all over the internet now. Wordle are beautiful clouds of keywords, where the size of the words relates to their occurrence in a list or a text. You have a good example with the display of keywords in the right column of the blog page.

Today, I did some quick and dirty analysis of my collection of papers. Exporting the Mendeley data to a bib file, I compiled lists of titles of the papers in my library. I used the freely available wordle website. The whole process was really fast, like 15 minutes or so. The first result I got is shown below (clik to enlarge).

Well, as you can expect, being interested in porous ceramic materials templated by ice crystals, these keywords are obviously dominating the wordle. In the upper right you can find “zirconia”, reminiscent of my PhD on the low temperature degradation of zirconia containing ceramics. This was in the pre-Mendeley years, I don’t have many papers left on this topic.

Things get more interesting if I restrict the analysis to the titles of the papers related to ice-templating. I got about 340 of them. I’ve followed really closely the ceramic domain, and much less the polymer field. Polymers are thus largely under-represented in the following analysis, although ice-templated polymers came first.

The first obvious observation is the absolute domination of “freeze”, “casting”, “porous” and “ceramics”. They are almost in every tile. So if you want to be original, don’t come up with a paper entitled “freeze casting of porous ceramics”. The other dominant keywords are “structure” and “properties”, which is a pretty good image of the current approach to the phenomenon. Freeze whatever you have and look at the structure and properties. Not groundbreaking, most of the time. But the underlying mechanisms are so complex that very few people are willing to tackle them. “Tissue” and “scaffolds” are pretty strong too, and tissue engineering have indeed been one of the main focus so far in terms of potential applications. “Ice” is less prominent than “freeze”, and reflects how people are currently describing the process, “freeze-casting” instead of “ice templating”. I am not a big fan of “freeze-casting”, since it was originally used to describe the processing of dense materials. Although pretty much everyone is doing porous materials, “freeze-casting” still dominates. “Ice-templating” exclude all solvents other than water, so it’s not perfect either.

I also did the same analysis compiling all the abstracts. This is much closer to mining the full text of the papers. The output is much more balanced.

“Pore”, “porous”, “structure” and “freeze” still dominates, but the relative occurrences of other keywords is much more balanced. Since people tend to report almost exclusively positive results, we got a lot of “increased”, “high”, “new”, “novel”, “potential” “significantly” and “significant”, better represented than “low” and “decreased”. “Defects” is noticeably absent, although it remains a major issue of the process. “Control” is missing from the wordle (well, not really missing, but it’s really tiny), a fair representation of the majority of the papers, where people exert no control whatsoever. Freeze and see.
“Properties” is relatively large, although people are almost exclusively looking at mechanical properties (hence the presence of “MPa”). People became interested only very recently in other properties, such as conductivity or piezoelectricity.

Regarding materials, “silica” and “alumina” are the only ones found here. A lot of room for testing other materials, and therefore other properties. “Water” and “camphene” are of similar size, as people are equally interested in both solvents.

Missing keywords are equally interesting. “Colloids” is hardly visible, although everyone is dealing with colloidal suspensions. Ceramists are usually talking about slurries instead of colloidal suspensions, which is why we get “slurry” and “slurries” instead. Maybe. I still believe we have a lot to learn if we look at the colloid science papers.

“Interface” is the other elephant in the room. The control of the process largely depends on controlling the interface, and is something that people have largely ignored so far.

Without digging too much into the details, this quick and simple analysis is very informative about the current state of the art. Having followed very closely the domain for the past 5 or 6 years, the keyword clouds obtained here are very representative of the current state of the art. I’d love to extend this analysis to the full text of the papers, although I will need different tools to do it. Maybe I should get an access to the Mendeley API. They are responding to over 100 millons calls to their database each month, they can surely afford a few more. In the meantime, I’ll try to apply the same analysis to a different domains, using Google Scholar or Scopus and Mendeley. More later if I’m successfull.

Funny coincidence, this month’s issue of Nature Materials was released today while I was playing around with this analysis. Check out the front cover

Summer readings

August 11, 2012 § Leave a comment

It’s really hot in summer, where we live. Usually the hottest place in France, actually. From mid-day to late afternoon, it’s usually better to stay inside, where it’s a lot cooler. A good period to read books. I read three good scientific ones lately.

H2O, A biography of water, by Philip Ball. He’s probably my favorite science writer, and I enjoy his frequent columns in Nature or Nature Materials, among others. He’s the one that taught me, following our Science paper, that ice has been used as a structural material… for planes ! This book is truly excellent. Philip Ball is giving us a grand tour of water, through history and the various domains of science, from chemistry to biology or geophysics. I particularly enjoyed the history of water through the centuries. Hi style makes it a joy to read, I could hardly put it down. Lots of gems like this one (maybe because I’m getting into antifreeze proteins lately):

If fish conducted scientific research, you might expect them to set up whole institutes devoted to studying supercooled liquids, since their very existence depends on this precarious state.

Design in Nature: How the Constructal Law Governs Evolution in Biology, Physics, Technology, and Social Organization, by Adrian Bejan and J Peder Zane.
 I wasn’t aware of the constructal theory until I read that book, and that was quite a fascinating read. The constructal theory is about how design in nature arise from a simple law, the constructal law, which is basically how stuff (mass, materials, ideas) flow. Design of things are evolving towards an always better flow. The authors are aiming high, applying their theory to pretty much everything you can think about, from lungs, rivers and trees to universities and animals. Although I don’t agree with all of their ideas, such as their claim about the very existence of trees (which are supposidely the most effcient way of moving water from the soil to the atmosphere), it was a stimulating read nonetheless.

Visual Strategies, A Practical Guide to Graphics for Scientists and Engineers, by Felice C. Frankel and Angela H. DePace.

This one is all about how to design figures or graphics to convey scientific ideas, whether it’s for a paper, a poster or a grant application. Beautiful illustrations and some interesting stories, but I found too many examples and too little theory. If you are not familiar with graphic design, it’s difficult to translate the examples provided into usefull lessons you can applied. A good book, still.

Where Am I?

You are currently viewing the archives for August, 2012 at Sylvain Deville.