Command Line Searches Using The find Command

Command Line Searches Using The find Command
Photo by Markus Winkler / Unsplash

This is another command that I think is very powerful but most people don't know how to use it to its full potential.  I decided to make this a written post since there are so many different parameters to use with the command and I was worried that making a video would not allow the viewer to capture the actual commands.

Why use 'find'?

While most people are used to using a GUI for conducting searches on a file system, you may sometimes find yourself in  a situation where all you have is the command line (like when you have connected remotely via SSH to a server on the other side of the world).  Enter the 'find' command.  Below are some common searches you would execute in a GUI but here we will get the same results using the 'find' command.

Finding files by name

One of the most common searches you perform on a directory is to look for files that have a similar pattern.  For example, if you wanted to find all the files that have the "*.txt" extension, you would type this:

% find . -name "*.txt"
./end_date.txt
./log_08022021.txt
./log_08012021.txt

Let's break that command down.  

% find - the actual command at the prompt.

.  - the directory that the find command will search.  You must provide a path for it to search and in Linux the "dot" represents the current working directory.

-name  - this argument tells the find command that you are searching by name.

"*.txt"  - this is the patter to search for.  In this case, we are looking for files with the extension "txt".

Finding files by type

Let's say you run the previous command looking for files with a certain name but instead of just files you also get directories with a similar name.

% find . -name "shake*"
./shakespeare
./shakespeare_bio.txt

If you really only want files with a particular name, you have to add the -type parameter like this:

% find . -type f -name "shake*"
./shakespeare_bio.txt

Likewise, if you only wanted to find what directories were under the current path, you would type this command:

% find . -type d
.
./shakespeare
./shakespeare/tragedy
./shakespeare/comedy

Notice that the first result is the "dot" which, again, represents the current working directory.

Finding files greater than a certain size

Now let's say you want to find all the files that are above a certain size.  You would run the command like this:

% find . -size +1M
./10-MB-Test.docx

The -size parameter tells find that you are searching only by size and the +1M tells it to look for any file that is larger than 1MB.  Note that the "+" sign is very important.  If you leave it off, the find command will only look for files that are exactly 1MB in size (recall that 1MB in Linux is 1048576 bytes).

% find . -size 1M 
./1MB_file.txt

Finding files by timestamp

What if you want to find files that older than a certain date?   The find command has a -newer parameter that you can use, you just have to provide if a reference file and use negative logic.  What do you mean a reference file?  Well, recall that find works with files so in order to determine if a file is older or newer than a date, it needs to compare all files to a file that was created on a specific date and/or time.  Fortunately, Linux allows you to create a file and specify its timestamp by using the touch command.  Say, for example, that you want to find all the files that are older than August 2021.  You first would create a reference file like this:

% touch -d "2021-09-01T00:00:00" end_date.txt

This command creates a file and sets its creation date to September 1, 2021.  Now you can use the find command like this to find any files that are older than the end_date.txt file:

% find . -not -newer end_date.txt 
./end_date.txt
./log_08022021.txt
./log_08012021.txt

Note two things about this command:

  1. The use of the -not parameter to negate the -newer parameter.  So, if you think in terms of negative logic, older == not newer.
  2. The reference file will always show up in the results precisely because it is not newer than itself.  Just remember to always ignore the first result if it is the reference file.

Sorting files by size

One of the most common operations you will perform is to list all files by their size.  In Linux, you can use the find command to do this but it will lead us to a tangent, the use of the -exec parameter.

The -exec parameter is powerful because it allows the user to use two commands in one.  Think of it like this:

  1. For every file that matches the criteria for the find command,
  2. Perform (execute) some action on it  

For example, say you want to get all the file information for every "txt" file.  You would use the find command like this:

% find . -name "*.txt" -exec ls -alh {} \;                  
-rw-r--r--  1 jaimevillela  staff   1.0K Aug 18 10:13 ./1KB_file.txt
-rw-r--r--  1 jaimevillela  staff   1.0M Aug 18 10:23 ./1MB_file.txt
-rw-r--r--  1 jaimevillela  staff     0B Sep  1  2021 ./end_date.txt
-rw-r--r--  1 jaimevillela  staff   256B Aug 18 11:30 ./log_08022021.txt
-rw-r--r--  1 jaimevillela  staff   2.9K Aug 18 11:32 ./shakespeare_bio.txt
-rw-r--r--  1 jaimevillela  staff   512B Aug 18 11:29 ./log_08012021.txt

Let's break down everything that comes after the -exec parameter:

ls -alh - lists all the information for a file.  The h parameter means the size will be in human readable form which is easier to read and shortens the size column.

{} - this represents every file that matches the find criteria (in this case, any file that has the "txt" extension).

\; - the command must be terminated by a semicolon and, because we are in a shell, the semicolon must be "escaped" by the slash ("\") or the shell will treat it as a control operator and we don't want that.

We now have the size for all the files we are interested in but now we have to sort them.  Unfortunately, the find command alone will not give us this information; we need to pipe the output of this command to another one:  the sort command.

To sort all the files by their size, the full command would look like this:

% find . -name "*.txt" -exec ls -alh {} \; | sort -k 5 -r -h
-rw-r--r--  1 jaimevillela  staff   1.0M Aug 18 10:23 ./1MB_file.txt
-rw-r--r--  1 jaimevillela  staff   2.9K Aug 18 11:32 ./shakespeare_bio.txt
-rw-r--r--  1 jaimevillela  staff   1.0K Aug 18 10:13 ./1KB_file.txt
-rw-r--r--  1 jaimevillela  staff   512B Aug 18 11:29 ./log_08012021.txt
-rw-r--r--  1 jaimevillela  staff   256B Aug 18 11:30 ./log_08022021.txt
-rw-r--r--  1 jaimevillela  staff     0B Sep  1  2021 ./end_date.txt

Let's break down everything after the find command:

| - this is the pipe character; it takes the results of the find command and passes them to the sort command

sort - hopefully, this is self-explanatory

-k 5 - this will perform the sort according to the 5th column of the input.  In this case, it is the 5th column that contains the file sizes.

-r - this will sort in reverse order (i.e. descending; by default, a sort is performed in ascending mode)

-h - sort according to human-readable numeric mode.  Because the results of the find command displayed the size in human-readable format, the sort should perform it's action accordingly

Conclusion

Whew!  That last example was tricky but if you review it a few times it will make sense.  In particular, the -exec parameter can be powerful if used to delete files but you have to be very careful because in Linux once you delete a file it's gone.  So do run through some examples of your own and I hope this helps you the next time you have to perform some searches from the command line.