Archive for March, 2011

h1

Splitting files by line count on the command line

March 30, 2011

Two new command line tools I learnt of today. To count the number of words or lines in a text file there is the wc utility.

wc -l myfile.csv

Returns the number of lines in the specified text file. If you want to then split that into several smaller files you can use the split utility. By default this splits files by number of bytes, but with the -l option you can split by number of lines

split -l 100 myfile.csv

The new files appear in the current working directory named aa, ab, ac, ed etc

h1

Web scale resource idetifiers

March 21, 2011

Excellent blog post on Web Scale reosurce identifiers, Linked Data and REST: http://www.bbc.co.uk/blogs/radiolabs/2008/06/the_simple_joys_of_webscale_id.shtml

Good discussion afterwards as well.

h1

Print windows line endings with PHP

March 11, 2011

To print windows line endings when writing to a file using php you can use \r\n escape characters. e.g.

print "some line \r\n";

For unix/linux style line endings you just use \n;

h1

Find and replace with sed and find

March 7, 2011

To find all instances of a string in a file you can use grep like so

grep -n findstring somefile.txt

To replace all instances of a string with another string in a file you can use sed like so

sed -i 's/find/replace/g' somefile.txt

To find all instances of a string in all files in a directory you can use grep like so

grep -n findstring /some/dir/*

To replace all instances of a string in all files in a directory you can combine sed and find like so

find /some/dir -type f -exec sed -i 's/find/replace/g' {} \;

-n on grep makes it print off the line numbers of where it found the search term

-i on sed means inplace and means it will do the replacement in that file and save it. You can do -iold if you like and every altered file will then be backed up to .old first

-type on find means only regular files are returned.

-exec on find means that the command following will be executed for each found file. The {} items puts the filename into the command. All args after -exec are assumed to be arguments of the command until a semi-colon is found. The semi colon has to be escaped otherwise it will be seen as a command line end.

Simples!