My idea was to download all the NASA Astronomy Picture of the Day (apod) pictures However, I had a feeling that that would be a massive undertaking. In order to see exactly how much disk space it would take, I decided to take the first picture from each month, multiply it by 30, plot it on a graph, fit a function to it, integrate that function and I’d have a rough estimate of how much data I’d be dealing with. To access each of the web pages, I wrote a BASH script that took advantage of wget, a command line download manager.
#!/bin/bash
for ((i=1996; i<2012; i++))
do
for ((j=101; j<113; j++))
do
for ((k=101; k<132; k++))
do
#k=101
istr=${i:2:2}
jstr=${j:1:2}
kstr=${k:1:2}
echo "http://apod.nasa.gov/apod/ap$istr$jstr$kstr.html" >> url2011.txt
wget -r -l1 --no-parent -A.jpg "http://apod.nasa.gov/apod/ap$istr$jstr$kstr.html"
#done
done
done
If you notice, the commented out sections would be the date of the month, but I changed it to the first. (You may notice the strange variable modeling. Starting at 100 and using the last two digits was my quick and dirty was of getting the zero in front of the dates that needed it. ) After about 2 minutes of downloading, I had the first of each month in a directory labeled as that month after a little bit of bu action piped into a MatLab array, I had all the data necessary to find the total disk space required. I loaded my matrices into MatLab, plotted a graph and used the polynomial fit to turn it into a 7th order function.Using MatLab’s polynomial integrator I was able to determine the area to be around 10 gB of memory required for all 15 years worth of pictures. However enticing that may sound, I ended up getting only the pictures from the year 2011. To test the effectiveness of my method, I decided estimate the size of all the files in 2011 the same way. Again, I used a simple BASH script to get the file size and saved it to a matrix. Using the same technique described above, i estimated the total 2011 file size to be 512,233 kilobytes. To check, I summed the matrix to get a an actual total of 473,210 kilobytes which meant my experimental error was 7.62%, not bad.
I never ended up downloading more than the 2011 archives, but I was able to find some pretty cool images.
No comments:
Post a Comment