5.2.0.1 Remove EXIF data, which can include GPS locations where the photos were taken
Imagemagick is used for this
5.2.0.2 Rename files to have nonsensical names
Using either date or gdate on linux or OSX respectively. gdate is part of the coreutils package
This tutorial is aimed at OSX users and assumes you have homebrew (brew) installed. Using it on a PC through the linux subsystem for windows should be easy. Replace brew with apt-get or similar.
5.2.0.3 Flag any files with very high luminance, for instance photos of labels, consent forms etc
This is also done with Imagemagick.
This script assumes that you’ve put a lot of pictures in the folder data/photos_for_cleaning
5.3 Libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mixtools)
mixtools package, version 2.0.0, Released 2022-12-04
This package is based upon work supported by the National Science Foundation under Grant No. SES-0518772 and the Chan Zuckerberg Initiative: Essential Open Source Software for Science (Grant No. 2020-255193).
Install Imagemagick (hash this line if you already have Imagemagick unless you want a long wait)
#system("brew install imagemagick")
Use imagemagick’s mogrify to make an in place copy of each file with the EXIF data stripped out. This has the benefit of making a new creation datestamp and md5 hash for each file.
Before running this on thousands of photos, we will test that this works by reading an EXIF metadata from a single file.
Let’s see what files are available in the folder data/photos_for_cleaning
You’ll see a load of data from the EXIF. If you want to capture anything, like the aperture settings etc, now is the time to do it. It should be simple to design a loop and function that will capture this info.
After applying the following command from ImageMagick
and you should now see an empty EXIF, i.e. nothing will be shown in the console. If you still see EXIF data, something went wrong.
To apply the function to all files in the folder, we won’t use the built in mogrify -strip data/photos_for_cleaning/*.JPG because it halts when it hits a problem. Instead we will wrap it in an R code with ‘try’.
randomid <-function(n =1,path,file) { a <-do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE)) id =paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))file.rename(from=file, to=str_c(path,id,".JPG",sep="")) }